Data Consultant
2019 - PRESENT5S Technology- Developed algorithms to identify contract violations for airline unions via historical scheduling data. Translated violation decision trees to SQL queries and prototyped reroute identification model using keyword search.
- Deployed and maintained Argo workflow engine on EKS. Developed a database schema for analytics warehouse using DBT and deployed in Snowflake.
- Designed CI/CD system for Gitlab using Dockerized CLIs of pipeline tools and coached team members on usage.
Technologies: Python, SQL, Data Build Tool (dbt), GitLab CI/CD, Kubernetes, Docker, Amazon EKS, Snowflake, Ubuntu, Data Science, PostgreSQL, Data Analysis, Amazon S3 (AWS S3), Amazon EC2, ETL, AWS RDS, APIs, Data Pipelines, Analytics, Data Engineering, Kimball Methodology, Data Warehouse Design, Bash, Dimensional Modeling, Amazon Web Services (AWS)Machine Learning Engineer
2020 - 2021Twosense- Adapted an open-source tracking library to run and collect metrics on user-level and overall model performance via simplified API. Deployed a tracking server and web application using Docker on AWS.
- Refined model deployment scripts in Python. Unified file loading in a separate module to improve code readability.
- Developed a system using Python to re-evaluate production models upon retraining, enabling the comparison of model scores using the same test data set. Conducted simulations to prove ROI on the project in terms of improved model scores.
Technologies: Python, Data Science, PostgreSQL, Data Analysis, Amazon S3 (AWS S3), Amazon EC2, ETL, AWS RDS, Machine Learning, Data Pipelines, Analytics, Deep Learning, Data Engineering, Bash, Amazon Web Services (AWS)Machine Learning Engineer
2019 - 2019Simon Data- Built prototype for a client to automatically generate email segments based on product inventory, replacing a manual process that took hours per week for multiple people. Implemented a solution in the Django platform.
- Served as team lead for four data scientists. Coached team on best practices around Python testing and deployment.
- Pushed effort to simplify the manual reporting process for a client, including making SQL queries more performant and automating report delivery.
Technologies: Python, SQL, Data Science, Data Analysis, Amazon Athena, Amazon S3 (AWS S3), Amazon EC2, ETL, AWS RDS, Machine Learning, Data Pipelines, Analytics, Data Engineering, Bash, Amazon Web Services (AWS)Data Scientist
2017 - 2019Optoro- Embedded in the tech product team and built models to support the core dispositioning system, aiming to achieve the highest recovery for returned and excess inventory. Deployed XGBoost models via Python APIs.
- Developed a system to monitor and retrain models using Python, SQL, and Airflow.
- Led optimization of Airflow pipelines and education around best practices for the data science team.
Technologies: Python, SQL, Data Science, PostgreSQL, Data Analysis, Amazon S3 (AWS S3), Amazon EC2, ETL, Machine Learning, APIs, Data Pipelines, Analytics, Apache Airflow, Data Engineering, Bash, Amazon Web Services (AWS)Senior Data Anlayst
2016 - 2017Capital One Financial- Developed automated pipelines using shell scripting and Python’s Luigi library to generate Excel reports, including working with end-users to redesign reports to help them perform their tasks more efficiently.
- Created a scraper to download hundreds of files weekly from a legacy web application, which enabled my team to complete and pass an audit which we would have failed without the data.
- Served as lead analyst for the AML operations team. Researched and developed queries for identity at-risk assets and worked with stakeholders to design dashboards to track progress. Mapped legacy data with new data sources such as Salesforce.
Technologies: Python, SQL, Bash, Teradata, Dimensional Modeling, Luigi, Kimball Methodology, Hadoop, Data Warehouse Design, Amazon Web Services (AWS)