Leonardo dos Santos Pinheiro
Verified Expert in Engineering
Machine Learning Engineer and Software Developer
Brisbane, Queensland, Australia
Toptal member since July 4, 2019
Leonardo is a machine learning engineer with 10 years of industry experience across the government, energy markets, finance, healthcare, and consulting. Leonardo is well versed in work with analytics, data engineering, and machine learning, specializing in the development and deployment of AI systems for computer vision, NLP, and recommender systems.
Portfolio
Experience
- Data Science - 10 years
- Statistics - 7 years
- R - 6 years
- SQL - 6 years
- Python - 5 years
- Generative Pre-trained Transformers (GPT) - 4 years
- Natural Language Processing (NLP) - 4 years
- Computer Vision - 3 years
Availability
Preferred Environment
Visual Studio Code (VS Code), Jupyter, Linux, PyCharm, Windows Subsystem for Linux (WSL)
The most amazing...
...thing I've done was to develop end-to-end machine learning pipelines for cancer detection—from data selection for labeling to deployment on Kubernetes.
Work Experience
Senior ML Engineer
Microsoft
- Created video editing functionality, including intent detection, commanding, Q&A, and summarization, using LangChain and ChatGPT with video transcripts.
- Integrated audio analysis models for noise canceling, audio classification, and voice activity detection in a video editor. Deployed models on the client application using ONNX Runtime and in Azure back end.
- Integrated the research img2vid model based on Stable Diffusion and ControlNet into the video editor. Worked on model profiling and resource optimization.
AI-first Software Engineer
Trilogy
- Developed chat and summarization functionality for the Jive CoPilot chatbot using OpenAI API.
- Tracked and fixed copilot-related bugs on both Jive Cloud and Jive Hop.
- Worked on the specification of AI pipelines to automate software engineering tasks such as test generation, documentation generation, and PR description.
NLP Expert via Toptal
Kwan Ting Chang
- Worked with the client to create a dataset for TTS. Helped with script generation, voice actor hiring, data distribution analysis, speech-and-text emotion analysis, and data preprocessing.
- Fine-tuned StyleTTS 2 model for text-to-speech with emphasis on capturing the diversity of emotional tones.
- Delivered a full TTS system for integration with a larger web application based on Python, PyTorch, FastAPI, and Docker.
Senior ML Engineer
HARRISON-AI
- Handled the end-to-end pipeline for cancer detection, including labeling with V7, model building for semantic segmentation using Hugging Face, and deployment on Kubernetes using FastAPI.
- Contributed to an evaluation system for the CT brain classification model. Built internal Python library for multiple hypothesis statistical testing.
- Led a series of transformer-based experiments for embryo selection to improve the performance of a production system based on Inception3D. The ViViT experiments led to a better model, which was later moved to production.
- Built a chat application based on semantic search using an LLM to aid clinicians in finding medical reports containing specific cases of interest. The application was used to perform case selection for image labeling.
- Contributed to a chatbot for retrieving medical cases based on semantic search using OpenAI ChatGPT and LangChain.
Senior ML Engineer
Jungle Scout
- Developed an MLOps system for automatic model evaluation and promotion for an eCommerce weekly model training pipeline using SageMaker, MLflow, Kedro, and Airflow.
- Productized deep learning models for time series eCommerce models based on PyTorch Lightning and SageMaker.
- Created a new model serving pipeline based on the MLflow model registry and SageMaker endpoints, with Lambda and an API gateway for scaling and Datadog monitoring.
- Created a model performance dashboard based on Plotly Dash and Redis. Deployed on Fargate.
Senior Software Engineer (Data and AI)
BCGX
- Used stereo vision and image segmentation on satellite imagery to aid an infrastructure company in vegetation management. The system was used to map the risk of vegetation encroachment with assets.
- Developed a Twitter analysis dashboard to measure tweet sentiments, a network of influencers, and visualize trends per tag/time to aid strategic designers in research.
- Developed a gradient-boosting model for activity classification using sensor data for a supply chain startup. The system was used to track illegal activity at different points in the supply chain.
- Built image classification models for crop recognition and crop pest/disease recognition for a farming startup. The system supported advisory for smallholder farmers in Southeast Asia.
- Built a recommender system for a cashback program startup, enabling personalization of content to drive engagement in the platform.
- Created a performance dashboard for a farming startup using Data Studio and BigQuery.
Senior Machine Learning Consultant
Servian
- Developed and deployed a churn model using gradient boosting for an insurance company.
- Developed and deployed a convolutional network for customer spending forecasting using TensorFlow, Ansible, Docker, ECS, DynamoDB, and PostgreSQL.
- Developed and deployed a text classification system using a convolutional model using TensorFlow and Spark.
- Designed a data science strategy for a major financial institution. Mentored junior data scientists.
- Explored a large corpus of insurance claims data using association rule mining, topic modeling, semantic similarity, and other text mining techniques.
- Created a open domain chatbot based on machine comprehension (Facebook's DrQA) using PyTorch, Flask, React, and DialogFlow.
- Assisted in the development of a person tracking system using Yolo v2 and Kalman filters for a major Australian retail company.
- Assisted with a markdown system based on demand forecasting using Facebook's Prophet and revenue optimization using mixed-integer linear programming.
Data Scientist
Mojo Power
- Developed and deployed a serverless linear model for load forecasting using Python, NumPy, and AWS Lambda.
- Created a proof-of-concept Hidden Markov Model for load disaggregation.
- Developed a model for credit scoring of energy customers.
- Developed and deployed an LSTM model for load forecasting using PyTorch.
- Developed dashboards for analytics reporting on energy usage using Tableau.
- Used topic modeling for exploratory data analysis of customer reviews.
- Worked on a PoC for solar panel detection on satellite images using Facebook's Detectron.
Quantitative Developer
Macquarie Bank
- Parsed and analyzed unstructured data of logs of order execution into SQL Server.
- Back-tested optimal execution strategies.
- Developed a Plotly dashboard to visualize market data.
- Tested and investigated new trading strategies.
- Tested machine learning algorithms for commodities trading.
Quantitative Researcher
Comissão de Valores Mobiliários
- Developed regulatory research studies using statistical modeling (estimation and hypothesis testing).
- Created market risk reports and visualizations with time series analysis and forecasting using R.
- Elaborated a risk monitoring system using Monte Carlo simulation and statistical estimation using Java.
- Developed a data warehouse to aggregate data related to market risk and development of BI reports using BusinessObjects.
- Led a data governance group to discover and catalog data sources across the whole organization.
Business Analyst
Brazilian Institute of Metrology
- Worked with operational teams to develop scripts to collect metrics from operational processes and automate reporting. Scripts were based on Python and PowerShell.
- Built Cognos dashboards to monitor business KPIs related to operational processes. Also made custom Jupyter Notebooks for additional data analysis.
- Created a simplified database/warehouse using SQLite to aggregate data from multiple spreadsheets and support live business dashboards.
Experience
Investment Funds Program
Education
Master's Degree in Applied Math
Getulio Vargas Foundation - Rio de Janeiro, Brazil
Bachelor's Degree in Management Science
Getulio Vargas Foundation - Rio de Janeiro, Brazil
Certifications
AWS Certified Developer
AWS
Skills
Libraries/APIs
Keras, Scikit-learn, XGBoost, Pandas, NumPy, SpaCy, OpenCV, TensorFlow, SciPy, Natural Language Toolkit (NLTK), Luigi, PyTorch, Flask-RESTful, Node.js, Tortoise TTS
Tools
Jupyter, Tableau, Plotly, H2O AutoML, GitLab CI/CD, Git, Amazon Elastic Container Service (ECS), Apache Airflow, IntelliJ IDEA, SPSS, Vagrant, AWS CloudFormation, Talend ETL, PyCharm, MATLAB, ChatGPT
Languages
Python, SQL, R, Scala, Bash, Julia, JavaScript, TypeScript, Java
Frameworks
Spark, Hadoop, Windows PowerShell, Django, Flask, AWS Serverless Application Model (SAM)
Paradigms
Scrum, Kanban
Platforms
Docker, AWS Lambda, Linux, Google Cloud Platform (GCP), Amazon Web Services (AWS), Apache Kafka, Visual Studio Code (VS Code), Amazon EC2, Azure
Storage
Amazon S3 (AWS S3), Amazon DynamoDB, InfluxDB, PostgreSQL, Microsoft SQL Server, MongoDB, Kdb+, Neo4j
Other
Data Science, Artificial Intelligence (AI), Dashboards, Agile Data Science, Machine Learning, Natural Language Processing (NLP), Computer Vision, Deep Learning, Generative Pre-trained Transformers (GPT), A/B Testing, Visualization, Statistics, APIs, Scraping, Analytics, Dashboard Design, Chatbots, SAP BusinessObjects (BO), Microsoft 365, Recommendation Systems, Windows Subsystem for Linux (WSL), Amazon RDS, ECS, Time Series, Optimization, Graphs, Speech to Text, StyleTTS, vosk-tts, Tacotron 2, OpenAI
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring