Pablo de Castro, Developer in Ciudad Real, Spain
Pablo is available for hire
Hire Pablo

Pablo de Castro

Verified Expert  in Engineering

Machine Learning Developer

Location
Ciudad Real, Spain
Toptal Member Since
November 7, 2019

Pablo has extensive experience designing and building state-of-the-art machine learning systems in the context of computer vision and NLP and constructing modern data science workflows for the analysis of large quantities of data. Along with being a seasoned data-centric Python and C++ developer, Pablo has a PhD degree, and he developed new machine learning methods and analysis pipelines to increase the discovery potential of scientific experiments at CERN.

Portfolio

Reforestum
Product Management, Agile Project Management, JavaScript, PHP, React, Python...
Treelogic
Python, PyTorch, Kubernetes, Dash, Business Proposals, Computer Vision...
Reforestum
Geospatial Data, Geospatial Analytics, Machine Learning, Pandas, GeoPandas...

Experience

Availability

Part-time

Preferred Environment

Git, Visual Studio Code (VS Code), Linux, MacOS, Python, TypeScript, Cloud, Machine Learning, Data Engineering, Kubernetes

The most amazing...

...thing I've worked on is a highly-cited innovative machine learning method to optimize end-to-end discovery potential of scientific experiments at CERN.

Work Experience

Software Architect and Product Manager

2022 - PRESENT
Reforestum
  • Led and coordinated the product management and development efforts through a pivot of the startup focus from a voluntary carbon market marketplace to an enterprise data platform.
  • Planned and deployed frequent product iterations from start to finish, including integrations with customers and external data sources.
  • Contributed to the product and engineering efforts at many different levels, including customer interviews/research, product design, software engineering, and cloud infrastructure.
  • Provided advice and support for the founding team through the strategic shift of the business direction.
Technologies: Product Management, Agile Project Management, JavaScript, PHP, React, Python, Lean Startups, Enterprise SaaS, Cloud Infrastructure, Azure, Terraform, Kubernetes, Continuous Integration (CI), FastAPI, Node.js, Express.js, Full-stack, Web Development, CSS, HTML

Head of Machine Learning Engineering

2021 - 2022
Treelogic
  • Provided technical leadership for a cross-disciplinary chapter focussed on designing, building, and improving solutions, products, and processes with data science, machine learning, and computer vision both for commercial clients and EU R&D projects.
  • Led technical design and coordination of projects and proposals, co-defined technological stack and methodologies, and ensured technical excellence, technology transfer, and technical mentorship.
  • Analyzed, designed, and planned solutions for improving productivity and existing processes in several industry sectors.
  • Integrated and analyzed automatic transcriptions using cloud services for different video-conference providers for search purposes.
  • Integrated natural language processing models for code summarization and variable misuse as services within a web application.
  • Improved data, machine learning, and software development methodologies and infrastructure continuously to make technical teams more autonomous and productive.
Technologies: Python, PyTorch, Kubernetes, Dash, Business Proposals, Computer Vision, Business Intelligence (BI), Agile, Artificial Intelligence (AI), Cloud Infrastructure, Azure, Azure Kubernetes Service (AKS), Terraform, Natural Language Processing (NLP), Continuous Integration (CI), Machine Learning Operations (MLOps), Data Analytics, Data Analysis, FastAPI, Large Language Models (LLMs), Full-stack

Technical Advisor and Contractor

2019 - 2021
Reforestum
  • Co-designed the data strategy and approach for a forest monitoring, verification, and reporting system for reforestation and conservation projects. This initiative aligned with one of the startup's core value propositions: transparency.
  • Created a prototype of a monitoring system for large forest conservation projects based on the use of machine and deep learning on satellite imagery.
  • Co-designed, executed, and validated a plan for integrating a monitoring system for the project in the production web application of a carbon credit marketplace.
  • Supported and advised a startup in various other technical domains, such as software, data architecture, and cloud infrastructure.
Technologies: Geospatial Data, Geospatial Analytics, Machine Learning, Pandas, GeoPandas, Raster Images, Satellite Images, Continuous Integration (CI), Full-stack, Web Development, JavaScript, CSS, HTML, Artificial Intelligence (AI)

Senior Data Scientist

2019 - 2021
Treelogic
  • Created a production-level holistic system for video monitoring using deep learning and computer vision techniques.
  • Created a software library and architecture to integrate custom computer video solutions for video processing seamlessly.
  • Deployed multiple deep learning models in production using a mix of intermediate representation technologies such as ONNX, OpenVINO, and AWS Lambda/Batch.
  • Designed and built a feasibility demonstrator of an end-to-end platform for the automatic detection of anomalies in electrical lines and towers using deep learning.
  • Evaluated the viability of a system for public companies' default prediction using machine learning technologies.
  • Designed, built, and integrated heterogeneous business data sources and provided a custom flexible solution to provide insights and visualizations to stakeholders.
  • Designed and provided guidance on core data, machine learning, and deep learning competencies within the projects, mainly in the context of the visual perception module.
Technologies: Matplotlib, NumPy, Pandas, OpenCV, PyTorch, TensorFlow, C++, Python, Docker, Computer Vision, Machine Learning, Kubernetes, Object Detection, Continuous Integration (CI), Data Analysis, Data Analytics, Business Analytics, Natural Language Processing (NLP), Large Language Models (LLMs), Artificial Intelligence (AI)

Marie Curie Early Stage Researcher

2015 - 2019
Instituto Nazionale di Fisica Nucleare
  • Carried out terabyte-scale data analyses for the CMS experiment at CERN.
  • Developed a new approach to apply deep learning techniques in the context of statistical inference in scientific experiments.
  • Created several software libraries for data analyses at CERN in C++ and Python, with their corresponding tests and documentation.
  • Wrote and published several scientific articles in peer-reviewed journals and presented at international conferences.
  • Integrated a library for evaluating TensorFlow models within the existing C++ software infrastructure of the CMS experiment at CERN.
Technologies: NumPy, PyTorch, TensorFlow, Scikit-learn, Deep Learning, Bayesian Inference & Modeling, C++, Python, Statistics, Machine Learning, Artificial Intelligence (AI)

Project Associate

2014 - 2015
Instituto de Física de Cantabria
  • Worked on several tera-scale analyses for the CMS Collaboration at CERN, deployed both in a distributed manner in the LHC GRID and computer clusters.
  • Compared the performance of various supervised machine learning methods on a classification task based on simulated data from the CMS experiment.
  • Developed a statistical method to calibrate the performance of a pattern recognition algorithm at the CMS experiment.
Technologies: NumPy, C++, Python, Statistics, Machine Learning, Advanced Physics

Summer Student

2014 - 2014
CERN
  • Created a software library and graphical interface for the simulation of electrical currents inside a silicon detector.
  • Carried out advanced data analyses of electrical signals within silicon detections.
  • Built an interactive histogram exploration tool based on the IPython Notebook (now Jupyter) server-client infrastructure.
Technologies: Finite Element Method (FEM), FEniCS, Qt, Python, C++, NumPy, Matplotlib, Data Analysis, Scientific Computing

Research Intern

2013 - 2014
Instituto de Física de Cantabria
  • Investigated the potential usefulness of semantic web technologies to represent and preserve scientific data analyses.
  • Constructed an ontology for data analysis and preservation at the CMS experiment at CERN.
  • Created various use cases prototypes for using RDF Schema, SPARQL and other related technologies.
Technologies: Ontologies, Semantic Web, RDF

INFERNO: Inference-aware Neural Optimization

https://github.com/pablodecm/paper-inferno
This project is a research publication and associated code that presents a new machine learning technique to carry out statistical analyses in scientific experiments leveraging existing deep learning technologies.

Cartographer: Mapper Algorithm Python Library

https://github.com/pablodecm/cartographer
A flexible, extensible, and fully Scikit-compatible Mapper algorithm implementation for topological data analysis.

Part of the design of this library was reused in the now very popular Kepler-Mapper library.

PhD Thesis: Statistical Learning and Inference in Particle Collider Experiments

https://github.com/pablodecm/phd_thesis
This project involved self-contained publication-level documents that delve into the details of how machine learning techniques are useful in scientific experiments such as those of the LHC from a principled perspective, highlighting their limitations and proposing innovative alternatives.

Practical Data Science for IoT (Lecture Series)

https://github.com/pablodecm/datalab_ml_iot
Since 2019, I have been teaching an annual 3-day applied data analysis laboratory focussed on data science for the internet of things (IoT) as part of the Master of Data Science program, jointly organized by UC-UIMP-CSIC.

Several practical use cases and exercises are included, such as a full end-to-end where a smartphone is used as a magic wand, collecting accelerometer and gyroscope data of labeled magic spells, and then training a spell recognition model based on our own sensor data.
2015 - 2019

PhD Degree in Physics

University of Padua - Padua, Italy

2014 - 2015

Master's Degree in Physics

University of Cantabria - Santander, Spain

2010 - 2014

Bachelor's Degree in Physics

University of Cantabria - Santander, Spain

Libraries/APIs

NumPy, Scikit-learn, TensorFlow, PyTorch, Matplotlib, Pandas, OpenCV, FEniCS, Node.js, React, Accelerometers

Tools

LaTeX, Git, Atom, Node-RED, MQTT, GitHub, Terraform, Azure Kubernetes Service (AKS)

Languages

C++, Python, Markdown, JavaScript, HTML, RDF, Java, SQL, CSS, TypeScript, PHP

Platforms

Linux, Docker, Visual Studio Code (VS Code), Jupyter Notebook, Android, MacOS, Amazon EC2, Kubernetes, Azure

Storage

MongoDB, NoSQL, Amazon S3 (AWS S3), Google Cloud Storage, Google Cloud

Paradigms

Parallel Programming, Test-driven Development (TDD), Continuous Integration (CI), Agile, Data Science, Business Intelligence (BI), Agile Project Management

Frameworks

Flask, Qt, Express.js

Other

Machine Learning, Statistics, Computer Vision, Artificial Intelligence (AI), Deep Learning, Data Analysis, Bayesian Statistics, Natural Language Processing (NLP), Large Language Models (LLMs), Full-stack, Web Development, Mathematics, Bayesian Inference & Modeling, Finite Element Method (FEM), WebSockets, Semantic Web, Ontologies, Object Detection, Generative Pre-trained Transformers (GPT), Advanced Physics, Cloud, Data Engineering, Programming, Technical Writing, Scientific Computing, Topological Data Analysis, Dash, Business Proposals, Geospatial Data, Geospatial Analytics, GeoPandas, Raster Images, Satellite Images, Product Management, Lean Startups, Enterprise SaaS, Cloud Infrastructure, Data Analytics, Business Analytics, Machine Learning Operations (MLOps), FastAPI, Use Cases

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring