Data Scientist
2020 - 2022Nostics- Implemented data science models for identifying and classifying pathogens like bacteria and viruses using surface-enhanced Raman spectroscopy.
- Developed a 95% sensitive and 95% specific multiplex bacterial classification algorithm using a combination of principal component analysis (PCA), DBSCAN, and partial least squares regression and deployed it to the AI Platform in Google Cloud.
- Created a custom dashboard using Dash and hosted it on Google App Engine, allowing our researchers to interact quickly with data.
- Researched and experimented with techniques for analyzing high-dimensional spectral data, such as preprocessing, similarity measures, and signal extraction.
Technologies: Python, Google Cloud Platform (GCP), Jupyter, Machine Learning, Data Analysis, Spectroscopy, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Code Review, Source Code Review, Task Analysis, Google Cloud, ETL, Neural Networks, Biology, Large Data Sets, Data Manipulation, Data Extraction, Computational Biology, Data Collection, Pandas, Jupiter, Data Wrangling, PostgreSQLData Science Team Lead
2019 - 2020Trivago- Led a cross-functional team of six data scientists and engineers developing data science solutions for features relating to price competitiveness.
- Oversaw the engineering development of the weekend search functionality. This was a challenging feature as it bypassed the original Trivago search and let users search for trips in a variety of places and times based on their value and appeal.
- Developed and implemented the Trivago Price Index, a user-facing scale to assess a given deal's value for money.
Technologies: Management, Data Engineering, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Technical Hiring, Code Review, Interviewing, Task Analysis, Team Management, Amazon Web Services (AWS), Google Cloud, ETL, Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data CollectionData Scientist
2018 - 2020Trivago- Developed an autoencoder and keypoint-based solution to de-duplicate image galleries and optimized the solution to evaluate 300 million pairs of images.
- Trained and implemented a deep learning-based image quality score using TensorFlow and Amazon SageMaker.
- Developed custom KPI dashboards using Impala and Hive.
- Trained and deployed over 90% precise hotel-specific image tagging models using TensorFlow and AWS.
Technologies: Python, SQL, Apache Hive, Impala, Hadoop, Google Cloud Platform (GCP), Machine Learning, Data Analysis, Computer Vision, Convolutional Neural Networks, TensorFlow, Pandas, Scikit-learn, Amazon SageMaker, Data Science, Data Modeling, Data Mining, Data Reporting, Data Analytics, Data Visualization, Artificial Intelligence (AI), NumPy, Code Review, Source Code Review, Amazon Web Services (AWS), Neural Networks, Large Data Sets, Data Manipulation, Data Extraction, Data Collection, Jupiter, Data Wrangling, PostgreSQL