![Pablo de Castro, Developer in Ciudad Real, Spain](https://assets.toptal.io/images?url=https%3A%2F%2Fbs-uploads.toptal.io%2Fblackfish-uploads%2Ftalent%2F369346%2Fpicture%2Foptimized%2Fhuge_31e639935efaaa47b8ab38575d0ce9e2-3f170f187a17bea35afe6b9d54cd6959.jpg&width=480)
Pablo de Castro
Verified Expert in Engineering
Machine Learning Developer
Ciudad Real, Spain
Toptal member since November 7, 2019
Pablo has extensive experience designing and building state-of-the-art machine learning systems in the context of computer vision and NLP and constructing modern data science workflows for the analysis of large quantities of data. Along with being a seasoned data-centric Python and C++ developer, Pablo has a PhD degree, and he developed new machine learning methods and analysis pipelines to increase the discovery potential of scientific experiments at CERN.
Portfolio
Experience
- Python - 11 years
- NumPy - 11 years
- Machine Learning - 9 years
- TensorFlow - 9 years
- C++ - 7 years
- Docker - 7 years
- PyTorch - 5 years
- Scikit-learn - 5 years
Availability
Preferred Environment
Git, Visual Studio Code (VS Code), Linux, MacOS, Python, TypeScript, Cloud, Machine Learning, Data Engineering, Kubernetes
The most amazing...
...thing I've worked on is a highly-cited innovative machine learning method to optimize end-to-end discovery potential of scientific experiments at CERN.
Work Experience
Software Architect and Product Manager
Reforestum
- Led and coordinated the product management and development efforts through a pivot of the startup focus from a voluntary carbon market marketplace to an enterprise data platform.
- Planned and deployed frequent product iterations from start to finish, including integrations with customers and external data sources.
- Contributed to the product and engineering efforts at many different levels, including customer interviews/research, product design, software engineering, and cloud infrastructure.
- Provided advice and support for the founding team through the strategic shift of the business direction.
Head of Machine Learning Engineering
Treelogic
- Provided technical leadership for a cross-disciplinary chapter focussed on designing, building, and improving solutions, products, and processes with data science, machine learning, and computer vision both for commercial clients and EU R&D projects.
- Led technical design and coordination of projects and proposals, co-defined technological stack and methodologies, and ensured technical excellence, technology transfer, and technical mentorship.
- Analyzed, designed, and planned solutions for improving productivity and existing processes in several industry sectors.
- Integrated and analyzed automatic transcriptions using cloud services for different video-conference providers for search purposes.
- Integrated natural language processing models for code summarization and variable misuse as services within a web application.
- Improved data, machine learning, and software development methodologies and infrastructure continuously to make technical teams more autonomous and productive.
Technical Advisor and Contractor
Reforestum
- Co-designed the data strategy and approach for a forest monitoring, verification, and reporting system for reforestation and conservation projects. This initiative aligned with one of the startup's core value propositions: transparency.
- Created a prototype of a monitoring system for large forest conservation projects based on the use of machine and deep learning on satellite imagery.
- Co-designed, executed, and validated a plan for integrating a monitoring system for the project in the production web application of a carbon credit marketplace.
- Supported and advised a startup in various other technical domains, such as software, data architecture, and cloud infrastructure.
Senior Data Scientist
Treelogic
- Created a production-level holistic system for video monitoring using deep learning and computer vision techniques.
- Created a software library and architecture to integrate custom computer video solutions for video processing seamlessly.
- Deployed multiple deep learning models in production using a mix of intermediate representation technologies such as ONNX, OpenVINO, and AWS Lambda/Batch.
- Designed and built a feasibility demonstrator of an end-to-end platform for the automatic detection of anomalies in electrical lines and towers using deep learning.
- Evaluated the viability of a system for public companies' default prediction using machine learning technologies.
- Designed, built, and integrated heterogeneous business data sources and provided a custom flexible solution to provide insights and visualizations to stakeholders.
- Designed and provided guidance on core data, machine learning, and deep learning competencies within the projects, mainly in the context of the visual perception module.
Marie Curie Early Stage Researcher
Instituto Nazionale di Fisica Nucleare
- Carried out terabyte-scale data analyses for the CMS experiment at CERN.
- Developed a new approach to apply deep learning techniques in the context of statistical inference in scientific experiments.
- Created several software libraries for data analyses at CERN in C++ and Python, with their corresponding tests and documentation.
- Wrote and published several scientific articles in peer-reviewed journals and presented at international conferences.
- Integrated a library for evaluating TensorFlow models within the existing C++ software infrastructure of the CMS experiment at CERN.
Project Associate
Instituto de Física de Cantabria
- Worked on several tera-scale analyses for the CMS Collaboration at CERN, deployed both in a distributed manner in the LHC GRID and computer clusters.
- Compared the performance of various supervised machine learning methods on a classification task based on simulated data from the CMS experiment.
- Developed a statistical method to calibrate the performance of a pattern recognition algorithm at the CMS experiment.
Summer Student
CERN
- Created a software library and graphical interface for the simulation of electrical currents inside a silicon detector.
- Carried out advanced data analyses of electrical signals within silicon detections.
- Built an interactive histogram exploration tool based on the IPython Notebook (now Jupyter) server-client infrastructure.
Research Intern
Instituto de Física de Cantabria
- Investigated the potential usefulness of semantic web technologies to represent and preserve scientific data analyses.
- Constructed an ontology for data analysis and preservation at the CMS experiment at CERN.
- Created various use cases prototypes for using RDF Schema, SPARQL and other related technologies.
Experience
INFERNO: Inference-aware Neural Optimization
https://github.com/pablodecm/paper-infernoCartographer: Mapper Algorithm Python Library
https://github.com/pablodecm/cartographerPart of the design of this library was reused in the now very popular Kepler-Mapper library.
PhD Thesis: Statistical Learning and Inference in Particle Collider Experiments
https://github.com/pablodecm/phd_thesisPractical Data Science for IoT (Lecture Series)
https://github.com/pablodecm/datalab_ml_iotSeveral practical use cases and exercises are included, such as a full end-to-end where a smartphone is used as a magic wand, collecting accelerometer and gyroscope data of labeled magic spells, and then training a spell recognition model based on our own sensor data.
Education
PhD Degree in Physics
University of Padua - Padua, Italy
Master's Degree in Physics
University of Cantabria - Santander, Spain
Bachelor's Degree in Physics
University of Cantabria - Santander, Spain
Skills
Libraries/APIs
NumPy, Scikit-learn, TensorFlow, PyTorch, Matplotlib, Pandas, OpenCV, FEniCS, Node.js, React, Accelerometers
Tools
LaTeX, Git, Atom, Node-RED, MQTT, GitHub, Terraform, Azure Kubernetes Service (AKS)
Languages
C++, Python, Markdown, JavaScript, HTML, RDF, Java, SQL, CSS, TypeScript, PHP
Platforms
Linux, Docker, Visual Studio Code (VS Code), Jupyter Notebook, Android, MacOS, Amazon EC2, Kubernetes, Azure
Paradigms
Parallel Programming, Test-driven Development (TDD), Continuous Integration (CI), Agile, Business Intelligence (BI), Agile Project Management
Frameworks
Flask, Qt, Express.js
Storage
MongoDB, NoSQL, Amazon S3 (AWS S3), Google Cloud Storage, Google Cloud
Other
Machine Learning, Statistics, Computer Vision, Artificial Intelligence (AI), Deep Learning, Data Analysis, Bayesian Statistics, Natural Language Processing (NLP), Large Language Models (LLMs), Full-stack, Web Development, Mathematics, Bayesian Inference & Modeling, Finite Element Method (FEM), WebSockets, Semantic Web, Ontologies, Object Detection, Generative Pre-trained Transformers (GPT), Advanced Physics, Cloud, Data Engineering, Programming, Data Science, Technical Writing, Scientific Computing, Topological Data Analysis, Dash, Business Proposals, Geospatial Data, Geospatial Analytics, GeoPandas, Raster Images, Satellite Images, Product Management, Lean Startups, Enterprise SaaS, Cloud Infrastructure, Data Analytics, Business Analytics, Machine Learning Operations (MLOps), FastAPI, Use Cases
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring