Elaine Ayo, Data Scientist and Developer in New York City, NY, United States
Elaine Ayo

Data Scientist and Developer in New York City, NY, United States

Member since February 9, 2022
Elaine has seven years of experience in the entire data product lifecycle. She can do statistical analysis and set up experiments, set up the necessary infrastructure, such as Kubernetes clusters and pipeline frameworks for data teams to run efficiently, and develop APIs for model prediction delivery. Currently, her focus is on providing new data teams with the infrastructure and processes to scale effectively.
Elaine is now available for hire

Portfolio

  • 5S Technology
    Python, AWS, SQL, Data Building Tool (DBT), GitLab CI/CD, Kubernetes, Docker...
  • Twosense
    Python, Data Science, PostgreSQL, Data Analysis, Amazon S3 (AWS S3)...
  • Simon Data
    Python, SQL, Data Science, Data Analysis, Amazon Athena, Amazon S3 (AWS S3)...

Experience

Location

New York City, NY, United States

Availability

Part-time

Preferred Environment

Ubuntu Linux

The most amazing...

...thing I've built is a whole data science infrastructure from scratch, including workflow orchestration, Python CI/CD, and data warehouse generation.

Employment

  • Data Consultant

    2019 - PRESENT
    5S Technology
    • Developed algorithms to identify contract violations for airline unions via historical scheduling data. Translated violation decision trees to SQL queries and prototyped reroute identification model using keyword search.
    • Deployed and maintained Argo workflow engine on EKS. Developed a database schema for analytics warehouse using DBT and deployed in Snowflake.
    • Designed CI/CD system for Gitlab using Dockerized CLIs of pipeline tools and coached team members on usage.
    Technologies: Python, AWS, SQL, Data Building Tool (DBT), GitLab CI/CD, Kubernetes, Docker, Amazon EKS, Snowflake, Ubuntu, Data Science, PostgreSQL, Data Analysis, Amazon S3 (AWS S3), Amazon EC2 (Amazon Elastic Compute Cloud), ETL, AWS RDS, APIs, Data Pipelines, Analytics, Data Engineering, Kimball Methodology, Data Warehouse Design, Bash, Dimensional Modeling, Amazon Web Services (AWS)
  • Machine Learning Engineer

    2020 - 2021
    Twosense
    • Adapted an open-source tracking library to run and collect metrics on user-level and overall model performance via simplified API. Deployed a tracking server and web application using Docker on AWS.
    • Refined model deployment scripts in Python. Unified file loading in a separate module to improve code readability.
    • Developed a system using Python to re-evaluate production models upon retraining, enabling the comparison of model scores using the same test data set. Conducted simulations to prove ROI on the project in terms of improved model scores.
    Technologies: Python, Data Science, PostgreSQL, Data Analysis, Amazon S3 (AWS S3), Amazon EC2 (Amazon Elastic Compute Cloud), ETL, AWS RDS, Machine Learning, Data Pipelines, Analytics, Deep Learning, Data Engineering, Bash, Amazon Web Services (AWS)
  • Machine Learning Engineer

    2019 - 2019
    Simon Data
    • Built prototype for a client to automatically generate email segments based on product inventory, replacing a manual process that took hours per week for multiple people. Implemented a solution in the Django platform.
    • Served as team lead for four data scientists. Coached team on best practices around Python testing and deployment.
    • Pushed effort to simplify the manual reporting process for a client, including making SQL queries more performant and automating report delivery.
    Technologies: Python, SQL, Data Science, Data Analysis, Amazon Athena, Amazon S3 (AWS S3), Amazon EC2 (Amazon Elastic Compute Cloud), ETL, AWS RDS, Machine Learning, Data Pipelines, Analytics, Data Engineering, Bash, Amazon Web Services (AWS)
  • Data Scientist

    2017 - 2019
    Optoro
    • Embedded in the tech product team and built models to support the core dispositioning system, aiming to achieve the highest recovery for returned and excess inventory. Deployed XGBoost models via Python APIs.
    • Developed a system to monitor and retrain models using Python, SQL, and Airflow.
    • Led optimization of Airflow pipelines and education around best practices for the data science team.
    Technologies: Python, SQL, Data Science, PostgreSQL, Data Analysis, Amazon S3 (AWS S3), Amazon EC2 (Amazon Elastic Compute Cloud), ETL, Machine Learning, APIs, Data Pipelines, Analytics, Apache Airflow, Data Engineering, Bash, Amazon Web Services (AWS)
  • Senior Data Anlayst

    2016 - 2017
    Capital One Financial
    • Developed automated pipelines using shell scripting and Python’s Luigi library to generate Excel reports, including working with end-users to redesign reports to help them perform their tasks more efficiently.
    • Created a scraper to download hundreds of files weekly from a legacy web application, which enabled my team to complete and pass an audit which we would have failed without the data.
    • Served as lead analyst for the AML operations team. Researched and developed queries for identity at-risk assets and worked with stakeholders to design dashboards to track progress. Mapped legacy data with new data sources such as Salesforce.
    Technologies: Python, SQL, Bash, Terradata, Dimensional Modeling, Luigi, Kimball Methodology, Hadoop, Data Warehouse Design, Amazon Web Services (AWS)

Experience

  • TJI Jail Population Data Warehouse
    https://github.com/texas-justice-initiative/jail-population-reports/tree/main

    The objective of this project is to automatically convert PDF reports from the Texas Commission on jail standards into tabular data and load that data into a modern data warehouse.

    TJI will use this data for various projects on our website, but this repo can be used to spin up an independent version of this data processing pipeline.

    The project has two parts; a Python data scraping and OCR processing tool and specifications for an analytics warehouse based on that data.

  • Data Engineering Infrastructure Setup

    I designed and implemented pipeline infrastructure for processing free text documents into tabular data. In addition, I set up Argo workflows on the AWS EKS cluster using Terraform. I also set up CI/CD pipelines using Gitlab to automatically build/deploy Dockerized pipeline steps. Finally, I built extractors, parsers, and loaders using Python.

  • Data Warehouse Design/Setup

    I built an analytics warehouse for the client using DBT, with automatic deploys via GitLab CI/CD for data schemas and documentation websites. I also advanced the usage of DBT, including tests, macros, and centralized documentation, allowing the data analyst to update/deploy the warehouse via GitLab. I also utilized Snowflake's advanced loading capabilities via S3 and set up the warehouse permission scheme.

Skills

  • Languages

    Python, SQL, Bash, R, Snowflake
  • Paradigms

    Data Science, ETL, Dimensional Modeling, Kimball Methodology
  • Platforms

    Amazon EC2 (Amazon Elastic Compute Cloud), Amazon Web Services (AWS), Ubuntu Linux, Docker, Kubernetes
  • Storage

    Data Pipelines, PostgreSQL, Amazon S3 (AWS S3)
  • Other

    CI/CD Pipelines, Data Analysis, Data Engineering, Data Building Tool (DBT), AWS, Machine Learning, APIs, Analytics, Statistics, AWS RDS, Applied Mathematics, Machine Language, Probability Theory, Deep Learning, Argo CD, Data Warehouse Design, Terradata
  • Libraries/APIs

    Luigi, Box API
  • Tools

    Apache Airflow, Terraform, GitLab CI/CD, Amazon EKS, Amazon Athena
  • Frameworks

    Hadoop

Education

  • Master's Degree in Statistics
    2013 - 2015
    Georgetown University - Washington DC, USA

To view more profiles

Join Toptal
Share it with others