Bento Collares Goncalves, Developer in Florianópolis - State of Santa Catarina, Brazil
Bento is available for hire
Hire Bento

Bento Collares Goncalves

Verified Expert  in Engineering

Data Science Developer

Florianópolis - State of Santa Catarina, Brazil

Toptal member since February 17, 2021

Bio

Bento is a senior ML engineer and data scientist with expertise in the CPG, retail, aerospace, and healthcare industries. With a PhD in developing deep learning algorithms for satellite imagery analysis, he delivers high-impact AI solutions for Fortune 100 companies and innovative startups alike. Bento consistently translates complex data into measurable business value through production-ready ML systems by leveraging tools such as Bayesian modeling and explainability frameworks.

Portfolio

Tropicana Brands - Main
Data Science, Pricing Models, Data-driven Marketing, Python, Statistics...
Rameez Mahmood
React Native, Machine Learning, Image Recognition, Python, Algorithms...
WHALE SEEKER
Python, Deep Learning, Software, GIS, Predictive Analytics

Experience

  • Python - 11 years
  • Data Science - 10 years
  • Machine Learning - 8 years
  • SQL - 5 years
  • Artificial Intelligence (AI) - 5 years
  • Deep Learning - 5 years
  • PyTorch - 4 years
  • Computer Vision - 4 years

Availability

Part-time

Preferred Environment

GIS, Bayesian Statistics, Deep Learning, Machine Learning, SQL, Jupyter, Pandas, Scikit-learn, PyTorch, Python 3

The most amazing...

...project I've developed was a Bayesian model to find optimal bid prices for auction-based online marketplaces, which increased net revenue by 20% on an A/B test.

Work Experience

Pricing Data Scientist

2023 - PRESENT
Tropicana Brands - Main
  • Designed and implemented sophisticated ML promotion optimization systems combining regression models with contextual bandits to improve promo ROI.
  • Engineered custom explainability models that created attribution frameworks for year-over-year sales volume changes across top consumer brands.
  • Developed comprehensive post-event analysis methodologies that quantified promotional effectiveness and drove data-informed strategy adjustments.
  • Built robust ETL pipelines and compliance reporting systems to identify non-compliant retail locations and optimize test/control group selection.
Technologies: Data Science, Pricing Models, Data-driven Marketing, Python, Statistics, Visualization, Contextual Bandits, Machine Learning, Artificial Intelligence (AI), ETL, Data Engineering, Predictive Analytics

Full-stack Mobile Developer

2023 - 2023
Rameez Mahmood
  • Developed a lightweight computer vision algorithm that uses transitions from light to dark and vice-versa, strategic pauses, and comparisons with reference images to capture key pose transitions during Muslim prayer and keep track of prayer cycles.
  • Designed a Bayesian hyperparameter tuning experiment that used 21 full-length example videos to tune pauses and thresholds to maximize the algorithm's accuracy in capturing the correct number of prayer cycles.
  • Created a customized threshold function using a combination of a 3rd-degree polynomial and a sigmoid transform to efficiently compute matches between reference prostration images and upcoming prostration images on an iPhone.
Technologies: React Native, Machine Learning, Image Recognition, Python, Algorithms, Bayesian Statistics, Videos

Remote Sensing and Computer Vision Expert

2023 - 2023
WHALE SEEKER
  • Developed whale detection algorithms for high-resolution satellite imagery that leverage downsampled aerial imagery to supplement a limited training/test set.
  • Designed Bayesian search experiments with custom validation metrics to replace random search hyperparameter tuning, dramatically speeding up conversion during model search routines.
  • Extended a large codebase, initially suited for aerial imagery, to support new types of input imagery, including panchromatic and multi-spectral high-resolution satellite imagery.
Technologies: Python, Deep Learning, Software, GIS, Predictive Analytics

Machine Learning Engineer

2021 - 2022
PepsiCo Global - Main
  • Developed a model to optimize bids on auction-based online marketplaces. The model combined a CatBoost tree ensemble and a Bayesian model to predict sales from marketing spending. Improved net revenue on Kroger by 20% on a 6-week-long A/B test.
  • Designed a Bayesian diff-in-diff test for A/B testing based on an in-house Python package for Bayesian tests. Conducted A/B tests, from finding testing pairs that matched criteria stated by the business to monitoring status and summarizing results.
  • Collaborated with the ML team to create the bid suggestion model, writing a clean software package with > 85% test coverage, concize configuration files, and containerization for CI/CD. The production-ready version is now running as Kubeflow dags.
Technologies: Python, Machine Learning, SQL, Software Engineering, Bayesian Inference & Modeling, Artificial Intelligence (AI), Recommendation Systems, Data-driven Marketing, Data Engineering, Predictive Analytics

PhD Researcher

2017 - 2022
Lynch Lab
  • Designed neural network architectures for object detection and semantic segmentation in the context of seal detection in high-resolution satellite imagery.
  • Created an ensemble approach for seal detection using CatBoost tree-based model to combine outputs from multiple CNNs into consensus predictions, outperforming human observers at seal detection.
  • Applied similar techniques to several use cases in computer vision including penguin colony size estimation and sea ice segmentation in satellite imagery and whale detection in aerial imagery.
  • Awarded twice through the Stony Brook Institute of Advanced Computational Science Junior Researcher Fellowship.
  • Employed an array of custom-designed object detection convolutional neural networks empowered by NSF HPC machines to process a 500TB archive of high-resolution satellite imagery detecting seals.
  • Published results as several publications in high-impact journals and conferences, including Remote Sensing of Environment, CVPR, and Remote Sensing.
Technologies: Python, PyTorch, R, Data Science, Machine Learning, Computer Vision, Deep Learning, Convolutional Neural Networks (CNNs), Research

AI Implementation Engineer

2021 - 2021
Offerfit
  • Designed and developed an anomaly detection pipeline using a combination of isolation forests and population statistics from historical averages, comparing and contrasting the most unusual data points with the most typical data points.
  • Implemented a feature drift validation pipeline that flags anomalous features using the Kullback-Leibler divergence from a past baseline as a criterion within Great Expectations.
  • Calculated probabilities for reinforcement learning model recommendations for different RL agent types and exploration strategies to test new approaches on past data using importance re-sampling.
Technologies: Python, Google Cloud Platform (GCP), Apache Airflow, Data Science, Reinforcement Learning, Contextual Bandits, Recommendation Systems, Artificial Intelligence (AI), Google BigQuery, Data-driven Marketing

Machine Learning Engineer (Computer Vision)

2021 - 2021
WHALE SEEKER
  • Developed computer vision pipelines to detect whales in the Arctic using a combination of regression and semantic segmentation CNNs.
  • Contributed to the development of the project code repository, including refactoring and simplifying tasks within the pipeline and making sure the codebase grows in a modular way as we added new functionality.
  • Built an improved validation pipeline to calculates performance metrics after mosaicing output, turning pixel-level metrics into instance-level metrics, which ultimately made model selection more connected with business needs.
  • Researched state-of-the-art semantic segmentation and instance segmentation approaches to create a product view for the future.
Technologies: Python, PyTorch, Computer Vision, GIS

Statistician

2020 - 2021
Laboratório Unimed Centro
  • Won first prize in the annual company Hackathon of more than 120 teams. The pitch was an ML-based solution to automate medical bill auditing.
  • Worked on feature engineering to detect patients with chronic diseases from a diverse portfolio of over 600,000 lives.
  • Developed ML solutions to provide personalized healthcare plans to patients based on their profile and healthcare usage.
  • Designed an autonomous medical bill auditing system given insurance usage backlog and final outcome of each bill.
  • Mapped financial opportunities for savings on procedures and payments.
Technologies: Oracle, SQL, LightGBM, Google AI Platform, Jupyter Notebook, Scikit-learn, Pandas, Python, Predictive Analytics

Data Science Fellow

2020 - 2020
Insight Data Science NY
  • Created Birds of a Feather, a birding partner recommender system backed by public bird sightings records from eBird and a Siamese neural network encoder.
  • Gathered all eBird observation records for the last 15 years in North America (> 300GB), compiling relevant data for each active user within 25 hand-engineered features that capture the user's birding style (>100,000 active users).
  • Designed a web app front end for the project in Python with streamlet, which was hosted on AWS.
  • Pitched a project demo in >10 Insight partner companies in NYC, including AB InBev, Bloomberg, and VIA.
Technologies: Statistics, Machine Learning, Data Science, Jupyter, Professionalism, Algorithms, SQL, Python

Experience

Raka – Prayer Counter

https://apps.apple.com/us/app/raka-prayer-counter/id6449230994
Muslim prayer, or Salah, involves repeating specific cycles of movements and prayers, known as Rakats. Tracking these can be crucial for people with memory impairments or distractions. To help with this, we've developed a computer vision model that can accurately count these prayer cycles.

Using a small dataset with 21 complete Rakat cycles and the correspondent cycle count for a video as annotation, I developed a lightweight computer vision approach that can accurately capture key transitions within Rakat cycles to keep track of completed Rakats.

The approach combines intensity thresholds, pauses, and comparison with reference images to detect transitions into prostration during prayer. To calibrate model parameters, I employed a Bayesian hyperparameter search that converged on a solution that performed well across all labeled examples.

This model, now available on App Store, has the potential to be a valuable tool for assisting individuals, especially those with disabilities, in accurately performing and completing their prayers.

SealNet 2.0: Seal Detection with CNN Model Ensembles

Pack-ice seals in the Southern Ocean are good indicators of environmental conditions. Their size and widespread distribution make them perfect for monitoring through high-resolution satellite images. However, it's challenging to manage the massive amount of images by manually identifying the seals.

SealNet 2.0 is an automated system that can detect seals. It uses one model to find potential seal habitats by identifying sea ice and several more models to find the seals themselves.

The system achieves a precision of 0.806 at 0.64 recall in a robust, undisclosed test set, outperforming two human experts and the older version of SealNet. It achieves this improvement by focusing on images of sea ice only, fine-tuning its settings with the help of high-performance computing, and refining predictions based on statistical analysis.

Even a simplified version of this system can improve the accuracy of seal detection by human experts. It could also help train new experts. However, like humans, the system struggles with rugged terrain. So, we must use statistical methods to adjust the seal population estimates it produces.

Penguin Colony Segmentation from Space with CNNs

https://arxiv.org/abs/1905.03313
Antarctic penguins are vital for understanding the health of our environment, particularly in the face of climate change. We've developed a Deep Learning model to identify Adélie penguin colonies from high-resolution satellite images.

To teach our model how to identify penguin colonies, we used the Penguin Colony Dataset, which includes over 2,000 images from 193 colonies. Due to a lack of detailed labeling of these images, we've developed a method to learn effectively from less precise labels.

We used a system that could sort out data unsuitable for this learning process. The learning process is trained using a specific calculation that can learn effectively from less precise labels. Our tests have shown that this less-precise labeling can significantly improve the model's performance. The model's accuracy in identifying penguin colonies increased significantly when we included these less precise labels in its training, improving IoU from 42.3% to 60.0% at a held-out test set.

Education

2015 - 2022

Ph.D. in Ecology and Evolution

Stony Brook University - Stony Brook, NY, USA

2009 - 2014

Bachelor's Degree in Biology

Federal University of Rio Grande do Sul - Porto Alegre, RS, Brazil

2012 - 2013

Brazil Science Without Borders Fellow in Ecology and Evolutionary Biology

The University of Kansas - Lawrence, KS, United States

Skills

Libraries/APIs

Pandas, PyTorch, Scikit-learn, OpenCV

Tools

Jupyter, GIS, Google AI Platform, Apache Airflow

Languages

Python 3, Python, SQL, R, Swift

Platforms

Jupyter Notebook, Oracle, Amazon Web Services (AWS), Google Cloud Platform (GCP)

Frameworks

LightGBM, React Native

Paradigms

Siamese Neural Networks, ETL

Other

Machine Learning, Deep Learning, Statistics, Research, Computer Vision, Data Science, Geospatial Data, Bayesian Statistics, Experimental Design, Data Visualization, Predictive Analytics, Artificial Intelligence (AI), Bayesian Inference & Modeling, Software Engineering, Google BigQuery, Algorithms, Professionalism, Front-end, Supervised Machine Learning, Reinforcement Learning, Contextual Bandits, Recommendation Systems, Convolutional Neural Networks (CNNs), Software, Image Recognition, Data-driven Marketing, Mobile UX, Slurm Workload Manager, Ensemble Methods, Videos, Geotechnical Engineering, Pricing Models, Visualization, Data Engineering

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring