Daniel is available for hire

Daniel Beasley

Verified Expert in Engineering

Machine Learning Developer

Location

Amsterdam, Netherlands

Toptal Member Since

August 14, 2022

Daniel is passionate about data analytics and confident in solving problems with machine learning. In the past, he's worked on various machine learning problems, including computer vision, price recommendation, and spectral classification. His best quality in this area is developing practical solutions to business problems. If an 80% solution in a short amount of time, it may be worthwhile to implement it and tackle a new problem.

Calculus Statistics Data Analytics Data Analysis Data Wrangling Machine Learning Artificial Intelligence (AI)Data Modeling Data Mining Data Visualization Code Review Data Collection Mathematical Modeling Python Pandas Principal Component Analysis Computational Biology

Portfolio

Vinted

Data Science, Looker, Marketing Mix, Data Analysis, Impala

Nostics

Python, Google Cloud Platform (GCP), Jupyter, Machine Learning, Data Analysis...

Trivago

Management, Data Engineering, Data Science, Data Modeling, Data Mining...

Experience

Statistics - 12 years Pandas - 6 years Python - 6 years Data Analysis - 6 years Jupyter - 5 years Machine Learning - 5 years SQL - 5 years Scikit-learn - 5 years

Availability

Part-time

Preferred Environment

Jupyter, Python, PyCharm

The most amazing...

...model I've developed is a classifier to identify pathogens using spectroscopy. The project was end to end, and involved novel methods of analysis and ML.

Work Experience

Senior Marketing Data Scientist

2023 - PRESENT

Vinted

Updating payback calculation model for effective ROI calculation.
Developed Bayesian MMM models for understanding marketing efficiency.
Performed montly market reporting and communication with marketing managers.

Technologies: Data Science, Looker, Marketing Mix, Data Analysis, Impala

Data Scientist

2020 - 2022

Nostics

Implemented data science models for identifying and classifying pathogens like bacteria and viruses using surface-enhanced Raman spectroscopy.
Developed a 95% sensitive and 95% specific multiplex bacterial classification algorithm using a combination of principal component analysis (PCA), DBSCAN, and partial least squares regression and deployed it to the AI Platform in Google Cloud.
Created a custom dashboard using Dash and hosted it on Google App Engine, allowing our researchers to interact quickly with data.
Researched and experimented with techniques for analyzing high-dimensional spectral data, such as preprocessing, similarity measures, and signal extraction.

Technologies: Python, Google Cloud Platform (GCP), Jupyter, Machine Learning, Data Analysis, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Code Review, Source Code Review, Task Analysis, Google Cloud, ETL, Neural Networks, Biology, Large Data Sets, Data Manipulation, Data Extraction, Computational Biology, Data Collection, Pandas, Jupiter, Data Wrangling, PostgreSQL

Data Science Team Lead

2019 - 2020

Trivago

Led a cross-functional team of six data scientists and engineers developing data science solutions for features relating to price competitiveness.
Oversaw the engineering development of the weekend search functionality. This was a challenging feature as it bypassed the original Trivago search and let users search for trips in a variety of places and times based on their value and appeal.
Developed and implemented the Trivago Price Index, a user-facing scale to assess a given deal's value for money.

Technologies: Management, Data Engineering, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Technical Hiring, Code Review, Interviewing, Task Analysis, Team Management, Amazon Web Services (AWS), Google Cloud, ETL, Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data Collection

Data Scientist

2018 - 2020

Trivago

Developed an autoencoder and keypoint-based solution to de-duplicate image galleries and optimized the solution to evaluate 300 million pairs of images.
Trained and implemented a deep learning-based image quality score using TensorFlow and Amazon SageMaker.
Developed custom KPI dashboards using Impala and Hive.
Trained and deployed over 90% precise hotel-specific image tagging models using TensorFlow and AWS.

Technologies: Python, SQL, Apache Hive, Impala, Hadoop, Google Cloud Platform (GCP), Machine Learning, Data Analysis, Computer Vision, Convolutional Neural Networks (CNN), TensorFlow, Pandas, Scikit-learn, Amazon SageMaker, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Code Review, Source Code Review, Amazon Web Services (AWS), Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data Collection, Jupiter, Data Wrangling, PostgreSQL

Experience

Bacteria Classifier

For this project, I developed a 95% sensitive and 95% specific multiplex bacterial classification algorithm. Based on the high dimensionality of the data, it was necessary to use a variety of tools to classify the data effectively.

Principal component analysis was used to identify outliers in the data. From PCA, one can calculate the Q-residual and Hotelling's T-squared. Along with the Mahalanobis distance, these statistics make for effective high-dimensional outlier detection. DBSCAN was used to segment the high-dimensional space. This was necessary because some bacteria had two distinct signatures, which would confuse a classifier that assumes they are similarly distributed. Partial least squares regression was used on each DBSCAN cluster to further subdivide the high dimensional space. Altogether this led to a highly specific and sensitive classifier. I packaged the trained classifiers in Python and deployed it all to the AI Platform in Google Cloud.

Skills

Languages

Python, SQL, C++, R

Libraries/APIs

Pandas, Scikit-learn, NumPy, TensorFlow

Tools

Jupyter, PyCharm, Impala, Amazon SageMaker, Looker

Paradigms

Data Science, ETL, Linear Programming, Management

Other

Data Analysis, Calculus, Statistics, Probability Theory, Machine Learning, Artificial Intelligence (AI), Data Modeling, Data Mining, Data Analytics, Data Visualization, Technical Hiring, Code Review, Source Code Review, Task Analysis, Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data Collection, Jupiter, Data Wrangling, Mathematical Modeling, Physics, Optimization, Statistical Modeling, Data Reporting, Interviewing, Team Management, Computational Biology, Linear Optimization, Bayesian Statistics, Time Series, Quantum Computing, Stochastic Modeling, Data Engineering, Computer Vision, Convolutional Neural Networks (CNN), Clustering, Classification, Regression, Principal Component Analysis (PCA), Biology, Marketing Mix

Platforms

Amazon Web Services (AWS), Google Cloud Platform (GCP)

Storage

Google Cloud, PostgreSQL, Apache Hive

Frameworks

Hadoop

Education

2020 - 2023

Master's Degree in Mathematics (Probability and Statistics)

Vrije Universiteit Amsterdam - Amsterdam, Netherlands

2009 - 2014

Bachelor's Degree in Physics

University of Waterloo - Waterloo, Canada

Certifications

APRIL 2017 - PRESENT

Machine Learning Engineer

Udacity

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring