
Bento Collares Goncalves
Verified Expert in Engineering
Data Science Developer
Florianópolis - State of Santa Catarina, Brazil
Toptal member since February 17, 2021
Bento is a senior ML engineer and data scientist with expertise in the CPG, retail, aerospace, and healthcare industries. With a PhD in developing deep learning algorithms for satellite imagery analysis, he delivers high-impact AI solutions for Fortune 100 companies and innovative startups alike. Bento consistently translates complex data into measurable business value through production-ready ML systems by leveraging tools such as Bayesian modeling and explainability frameworks.
Portfolio
Experience
- Python - 11 years
- Data Science - 10 years
- Machine Learning - 8 years
- SQL - 5 years
- Artificial Intelligence (AI) - 5 years
- Deep Learning - 5 years
- PyTorch - 4 years
- Computer Vision - 4 years
Availability
Preferred Environment
GIS, Bayesian Statistics, Deep Learning, Machine Learning, SQL, Jupyter, Pandas, Scikit-learn, PyTorch, Python 3
The most amazing...
...project I've developed was a Bayesian model to find optimal bid prices for auction-based online marketplaces, which increased net revenue by 20% on an A/B test.
Work Experience
Pricing Data Scientist
Tropicana Brands - Main
- Designed and implemented sophisticated ML promotion optimization systems combining regression models with contextual bandits to improve promo ROI.
- Engineered custom explainability models that created attribution frameworks for year-over-year sales volume changes across top consumer brands.
- Developed comprehensive post-event analysis methodologies that quantified promotional effectiveness and drove data-informed strategy adjustments.
- Built robust ETL pipelines and compliance reporting systems to identify non-compliant retail locations and optimize test/control group selection.
Full-stack Mobile Developer
Rameez Mahmood
- Developed a lightweight computer vision algorithm that uses transitions from light to dark and vice-versa, strategic pauses, and comparisons with reference images to capture key pose transitions during Muslim prayer and keep track of prayer cycles.
- Designed a Bayesian hyperparameter tuning experiment that used 21 full-length example videos to tune pauses and thresholds to maximize the algorithm's accuracy in capturing the correct number of prayer cycles.
- Created a customized threshold function using a combination of a 3rd-degree polynomial and a sigmoid transform to efficiently compute matches between reference prostration images and upcoming prostration images on an iPhone.
Remote Sensing and Computer Vision Expert
WHALE SEEKER
- Developed whale detection algorithms for high-resolution satellite imagery that leverage downsampled aerial imagery to supplement a limited training/test set.
- Designed Bayesian search experiments with custom validation metrics to replace random search hyperparameter tuning, dramatically speeding up conversion during model search routines.
- Extended a large codebase, initially suited for aerial imagery, to support new types of input imagery, including panchromatic and multi-spectral high-resolution satellite imagery.
Machine Learning Engineer
PepsiCo Global - Main
- Developed a model to optimize bids on auction-based online marketplaces. The model combined a CatBoost tree ensemble and a Bayesian model to predict sales from marketing spending. Improved net revenue on Kroger by 20% on a 6-week-long A/B test.
- Designed a Bayesian diff-in-diff test for A/B testing based on an in-house Python package for Bayesian tests. Conducted A/B tests, from finding testing pairs that matched criteria stated by the business to monitoring status and summarizing results.
- Collaborated with the ML team to create the bid suggestion model, writing a clean software package with > 85% test coverage, concize configuration files, and containerization for CI/CD. The production-ready version is now running as Kubeflow dags.
PhD Researcher
Lynch Lab
- Designed neural network architectures for object detection and semantic segmentation in the context of seal detection in high-resolution satellite imagery.
- Created an ensemble approach for seal detection using CatBoost tree-based model to combine outputs from multiple CNNs into consensus predictions, outperforming human observers at seal detection.
- Applied similar techniques to several use cases in computer vision including penguin colony size estimation and sea ice segmentation in satellite imagery and whale detection in aerial imagery.
- Awarded twice through the Stony Brook Institute of Advanced Computational Science Junior Researcher Fellowship.
- Employed an array of custom-designed object detection convolutional neural networks empowered by NSF HPC machines to process a 500TB archive of high-resolution satellite imagery detecting seals.
- Published results as several publications in high-impact journals and conferences, including Remote Sensing of Environment, CVPR, and Remote Sensing.
AI Implementation Engineer
Offerfit
- Designed and developed an anomaly detection pipeline using a combination of isolation forests and population statistics from historical averages, comparing and contrasting the most unusual data points with the most typical data points.
- Implemented a feature drift validation pipeline that flags anomalous features using the Kullback-Leibler divergence from a past baseline as a criterion within Great Expectations.
- Calculated probabilities for reinforcement learning model recommendations for different RL agent types and exploration strategies to test new approaches on past data using importance re-sampling.
Machine Learning Engineer (Computer Vision)
WHALE SEEKER
- Developed computer vision pipelines to detect whales in the Arctic using a combination of regression and semantic segmentation CNNs.
- Contributed to the development of the project code repository, including refactoring and simplifying tasks within the pipeline and making sure the codebase grows in a modular way as we added new functionality.
- Built an improved validation pipeline to calculates performance metrics after mosaicing output, turning pixel-level metrics into instance-level metrics, which ultimately made model selection more connected with business needs.
- Researched state-of-the-art semantic segmentation and instance segmentation approaches to create a product view for the future.
Statistician
Laboratório Unimed Centro
- Won first prize in the annual company Hackathon of more than 120 teams. The pitch was an ML-based solution to automate medical bill auditing.
- Worked on feature engineering to detect patients with chronic diseases from a diverse portfolio of over 600,000 lives.
- Developed ML solutions to provide personalized healthcare plans to patients based on their profile and healthcare usage.
- Designed an autonomous medical bill auditing system given insurance usage backlog and final outcome of each bill.
- Mapped financial opportunities for savings on procedures and payments.
Data Science Fellow
Insight Data Science NY
- Created Birds of a Feather, a birding partner recommender system backed by public bird sightings records from eBird and a Siamese neural network encoder.
- Gathered all eBird observation records for the last 15 years in North America (> 300GB), compiling relevant data for each active user within 25 hand-engineered features that capture the user's birding style (>100,000 active users).
- Designed a web app front end for the project in Python with streamlet, which was hosted on AWS.
- Pitched a project demo in >10 Insight partner companies in NYC, including AB InBev, Bloomberg, and VIA.
Experience
Raka – Prayer Counter
https://apps.apple.com/us/app/raka-prayer-counter/id6449230994Using a small dataset with 21 complete Rakat cycles and the correspondent cycle count for a video as annotation, I developed a lightweight computer vision approach that can accurately capture key transitions within Rakat cycles to keep track of completed Rakats.
The approach combines intensity thresholds, pauses, and comparison with reference images to detect transitions into prostration during prayer. To calibrate model parameters, I employed a Bayesian hyperparameter search that converged on a solution that performed well across all labeled examples.
This model, now available on App Store, has the potential to be a valuable tool for assisting individuals, especially those with disabilities, in accurately performing and completing their prayers.
SealNet 2.0: Seal Detection with CNN Model Ensembles
SealNet 2.0 is an automated system that can detect seals. It uses one model to find potential seal habitats by identifying sea ice and several more models to find the seals themselves.
The system achieves a precision of 0.806 at 0.64 recall in a robust, undisclosed test set, outperforming two human experts and the older version of SealNet. It achieves this improvement by focusing on images of sea ice only, fine-tuning its settings with the help of high-performance computing, and refining predictions based on statistical analysis.
Even a simplified version of this system can improve the accuracy of seal detection by human experts. It could also help train new experts. However, like humans, the system struggles with rugged terrain. So, we must use statistical methods to adjust the seal population estimates it produces.
Penguin Colony Segmentation from Space with CNNs
https://arxiv.org/abs/1905.03313To teach our model how to identify penguin colonies, we used the Penguin Colony Dataset, which includes over 2,000 images from 193 colonies. Due to a lack of detailed labeling of these images, we've developed a method to learn effectively from less precise labels.
We used a system that could sort out data unsuitable for this learning process. The learning process is trained using a specific calculation that can learn effectively from less precise labels. Our tests have shown that this less-precise labeling can significantly improve the model's performance. The model's accuracy in identifying penguin colonies increased significantly when we included these less precise labels in its training, improving IoU from 42.3% to 60.0% at a held-out test set.
Education
Ph.D. in Ecology and Evolution
Stony Brook University - Stony Brook, NY, USA
Bachelor's Degree in Biology
Federal University of Rio Grande do Sul - Porto Alegre, RS, Brazil
Brazil Science Without Borders Fellow in Ecology and Evolutionary Biology
The University of Kansas - Lawrence, KS, United States
Skills
Libraries/APIs
Pandas, PyTorch, Scikit-learn, OpenCV
Tools
Jupyter, GIS, Google AI Platform, Apache Airflow
Languages
Python 3, Python, SQL, R, Swift
Platforms
Jupyter Notebook, Oracle, Amazon Web Services (AWS), Google Cloud Platform (GCP)
Frameworks
LightGBM, React Native
Paradigms
Siamese Neural Networks, ETL
Other
Machine Learning, Deep Learning, Statistics, Research, Computer Vision, Data Science, Geospatial Data, Bayesian Statistics, Experimental Design, Data Visualization, Predictive Analytics, Artificial Intelligence (AI), Bayesian Inference & Modeling, Software Engineering, Google BigQuery, Algorithms, Professionalism, Front-end, Supervised Machine Learning, Reinforcement Learning, Contextual Bandits, Recommendation Systems, Convolutional Neural Networks (CNNs), Software, Image Recognition, Data-driven Marketing, Mobile UX, Slurm Workload Manager, Ensemble Methods, Videos, Geotechnical Engineering, Pricing Models, Visualization, Data Engineering
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring