Daniel Beasley, Machine Learning Developer in Amsterdam, Netherlands
Daniel Beasley

Machine Learning Developer in Amsterdam, Netherlands

Member since August 14, 2022
Daniel is passionate about data analytics and confident in solving problems with machine learning. In the past, he's worked on various machine learning problems, including computer vision, price recommendation, and spectral classification. His best quality in this area is developing practical solutions to business problems. If an 80% solution in a short amount of time, it may be worthwhile to implement it and tackle a new problem.
Daniel is now available for hire

Portfolio

  • Nostics
    Python, Google Cloud Platform (GCP), Jupyter, Machine Learning, Data Analysis...
  • Trivago
    Management, Data Engineering, Data Science, Data Modeling, Data Mining...
  • Trivago
    Python, SQL, Apache Hive, Impala, Hadoop, Google Cloud Platform (GCP)...

Experience

Location

Amsterdam, Netherlands

Availability

Part-time

Preferred Environment

Jupyter, Python, PyCharm

The most amazing...

...model I've developed is a classifier to identify pathogens using spectroscopy. The project was end to end, and involved novel methods of analysis and ML.

Employment

  • Data Scientist

    2020 - 2022
    Nostics
    • Implemented data science models for identifying and classifying pathogens like bacteria and viruses using surface-enhanced Raman spectroscopy.
    • Developed a 95% sensitive and 95% specific multiplex bacterial classification algorithm using a combination of principal component analysis (PCA), DBSCAN, and partial least squares regression and deployed it to the AI Platform in Google Cloud.
    • Created a custom dashboard using Dash and hosted it on Google App Engine, allowing our researchers to interact quickly with data.
    • Researched and experimented with techniques for analyzing high-dimensional spectral data, such as preprocessing, similarity measures, and signal extraction.
    Technologies: Python, Google Cloud Platform (GCP), Jupyter, Machine Learning, Data Analysis, Spectroscopy, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Code Review, Source Code Review, Task Analysis, Google Cloud, ETL, Neural Networks, Biology, Large Data Sets, Data Manipulation, Data Extraction, Computational Biology, Data Collection, Pandas, Jupiter, Data Wrangling, PostgreSQL
  • Data Science Team Lead

    2019 - 2020
    Trivago
    • Led a cross-functional team of six data scientists and engineers developing data science solutions for features relating to price competitiveness.
    • Oversaw the engineering development of the weekend search functionality. This was a challenging feature as it bypassed the original Trivago search and let users search for trips in a variety of places and times based on their value and appeal.
    • Developed and implemented the Trivago Price Index, a user-facing scale to assess a given deal's value for money.
    Technologies: Management, Data Engineering, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Technical Hiring, Code Review, Interviewing, Task Analysis, Team Management, Amazon Web Services (AWS), Google Cloud, ETL, Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data Collection
  • Data Scientist

    2018 - 2020
    Trivago
    • Developed an autoencoder and keypoint-based solution to de-duplicate image galleries and optimized the solution to evaluate 300 million pairs of images.
    • Trained and implemented a deep learning-based image quality score using TensorFlow and Amazon SageMaker.
    • Developed custom KPI dashboards using Impala and Hive.
    • Trained and deployed over 90% precise hotel-specific image tagging models using TensorFlow and AWS.
    Technologies: Python, SQL, Apache Hive, Impala, Hadoop, Google Cloud Platform (GCP), Machine Learning, Data Analysis, Computer Vision, Convolutional Neural Networks, TensorFlow, Pandas, Scikit-learn, Amazon SageMaker, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Code Review, Source Code Review, Amazon Web Services (AWS), Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data Collection, Jupiter, Data Wrangling, PostgreSQL

Experience

  • Bacteria Classifier

    For this project, I developed a 95% sensitive and 95% specific multiplex bacterial classification algorithm. Based on the high dimensionality of the data, it was necessary to use a variety of tools to classify the data effectively.

    Principal component analysis was used to identify outliers in the data. From PCA, one can calculate the Q-residual and Hotelling's T-squared. Along with the Mahalanobis distance, these statistics make for effective high-dimensional outlier detection. DBSCAN was used to segment the high-dimensional space. This was necessary because some bacteria had two distinct signatures, which would confuse a classifier that assumes they are similarly distributed. Partial least squares regression was used on each DBSCAN cluster to further subdivide the high dimensional space. Altogether this led to a highly specific and sensitive classifier. I packaged the trained classifiers in Python and deployed it all to the AI Platform in Google Cloud.

Skills

  • Languages

    Python, SQL, C++, R
  • Libraries/APIs

    Pandas, Scikit-learn, NumPy, TensorFlow
  • Tools

    Jupyter, PyCharm, Impala, Amazon SageMaker
  • Paradigms

    Data Science, ETL, Linear Programming, Management
  • Other

    Data Analysis, Calculus, Statistics, Probability Theory, Machine Learning, Artificial Intelligence (AI), Data Modeling, Data Mining, Data Analytics, Data Visualization, Technical Hiring, Code Review, Source Code Review, Task Analysis, Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data Collection, Jupiter, Data Wrangling, Mathematical Modeling, Physics, Optimization, Statistical Modeling, Data Reporting, Interviewing, Team Management, Computational Biology, Linear Optimization, Bayesian Statistics, Time Series, Quantum Computing, Statistics for Networks, Stochastic Modeling, Data Engineering, Computer Vision, Convolutional Neural Networks, Clustering, Classification, Regression, Principal Component Analysis (PCA), Biology
  • Platforms

    Amazon Web Services (AWS), Google Cloud Platform (GCP)
  • Storage

    Google Cloud, PostgreSQL, Apache Hive
  • Frameworks

    Hadoop

Education

  • Master's Degree in Mathematics (Probability and Statistics)
    2020 - 2022
    Vrije Universiteit Amsterdam - Amsterdam, Netherlands
  • Bachelor's Degree in Physics
    2009 - 2014
    University of Waterloo - Waterloo, Canada

Certifications

  • Machine Learning Engineer
    APRIL 2017 - PRESENT
    Udacity

To view more profiles

Join Toptal
Share it with others