Kyle Chakos, Developer in Madrid, Spain
Kyle is available for hire
Hire Kyle

Kyle Chakos

Verified Expert  in Engineering

Data Engineer and Developer

Madrid, Spain

Toptal member since February 28, 2023

Bio

Kyle has 10+ years of experience in data and machine learning engineering. He has worked at companies of various sizes, but primarily startups, collaborating with teams with almost no data infrastructure and helping them expand or update their architecture into something scalable. With a background in mathematics and engineering, Kyle is also uniquely set up to help data scientists get their projects into production in a scalable way while ensuring accuracy and effectiveness.

Portfolio

RatedPower
Snowflake, MySQL, Databricks, Microsoft Power BI, Prefect, APIs, Spark, Jira...
Appex Group, Inc.
Snowflake, Business Intelligence (BI), Data Warehousing, Query Optimization...
Sweetgreen
Amazon Web Services (AWS), Python, Apache Airflow, Docker, CircleCI, EMR, Spark...

Experience

  • SQL - 10 years
  • Data Engineering - 10 years
  • Python - 10 years
  • Amazon Web Services (AWS) - 10 years
  • Data Analysis - 10 years
  • Data Pipelines - 10 years
  • Apache Airflow - 5 years
  • Snowflake - 4 years

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Python

The most amazing...

...thing I've accomplished was a 1,000x time improvement in a machine learning model by utilizing fast Fourier transforms to replace the built-in Pandas method.

Work Experience

Lead Data Engineer

2024 - PRESENT
RatedPower
  • Wrote a comprehensive data roadmap for the team, including upgrading the current system to use more modern tools, designing more useful dashboards for the teams, and redesigning the event tracking system.
  • Created a proper analytics database for the company to use.
  • Managed an extension to an existing data product, extending the product reach outside of the United States and into Europe.
  • Implemented data protection practices in line with GDPR.
  • Processed large GIS datasets to calculate wind and solar power potential.
  • Evaluated the existing data infrastructure and provided feedback about how to improve the architecture and follow best practices.
Technologies: Snowflake, MySQL, Databricks, Microsoft Power BI, Prefect, APIs, Spark, Jira, GDPR, AWS Lambda, Amazon RDS

Snowflake Data Engineer

2023 - 2024
Appex Group, Inc.
  • Rewrote code to handle errors more robustly while simultaneously reducing the complexity of the codebase.
  • Upgraded the Airflow instances to integrate with AWS more seamlessly.
  • Wrote new ingestion pipelines and worked with analysts to ensure the data suited their needs.
  • Worked with their marketing team to ingest new data sources and help process new data into existing metrics.
Technologies: Snowflake, Business Intelligence (BI), Data Warehousing, Query Optimization, Python, Redshift, Data Warehouse Design, Amazon RDS, Data Build Tool (dbt)

Senior Data Engineer

2020 - 2023
Sweetgreen
  • Automated data ingestion from various sources with Airflow, Amazon EMR, AWS Kinesis, AWS Lambda, Python, and Snowflake.
  • Rearchitected historical data pipelines to utilize more modern methods and provide proper alerting, moving from Java, Scala, and Redshift to Python, Airflow, and Snowflake.
  • Managed a team of consultants to complete the automation and redesign of our CCPA data pipeline.
  • Monitored, maintained, and designed data infrastructure in AWS S3, EC2, EMR, and ECR.
  • Assisted in data discovery and implementation of machine learning algorithms.
Technologies: Amazon Web Services (AWS), Python, Apache Airflow, Docker, CircleCI, EMR, Spark, AWS Lambda, Redshift, PostgreSQL, Snowflake, Amazon S3 (AWS S3), Scala, Java, California Consumer Privacy Act (CCPA), ETL, Statistical Analysis, MySQL, CI/CD Pipelines, Agile, Amazon SageMaker, Data Pipelines, SQL, Data Engineering, Amazon DynamoDB, Terraform, Amazon Elastic MapReduce (EMR), Database Migration, Amazon RDS, Amazon Aurora, Amazon Redshift

Senior Software Engineer, Database

2019 - 2020
Ticketmaster
  • Automated the quality testing of newly trained models using Scala and Python.
  • Created a framework to launch models into a production environment with Java, Kafka, and AWS.
  • Architected and implemented feedback loops to relieve third-party dependencies with Python.
  • Designed and programmed tooling to give visibility into the model output using Python, AWS, and Slack.
Technologies: Apache Kafka, Kafka Streams, Java, Python, Amazon Web Services (AWS), Scala, Amazon SageMaker, ETL, Statistical Modeling, Machine Learning, Snowflake, PostgreSQL, CI/CD Pipelines, Agile, Docker, Amazon S3 (AWS S3), California Consumer Privacy Act (CCPA), Data Pipelines, SQL, Data Engineering, Terraform, Amazon Elastic MapReduce (EMR), Amazon RDS

Data Engineer

2018 - 2019
Creative Artists Agency
  • Designed and implemented ETL processes using Python, Azure Data Factory, MongoDB, and MySQL.
  • Converted bulk processing systems to a streaming model using Python.
  • Created various views in MySQL for data scientists and business analysts.
  • Launched data science models into production and assisted in identifying and debugging errors in R.
Technologies: Azure, Python, MongoDB, MySQL, R, Statistical Modeling, ETL, JavaScript, PostgreSQL, CI/CD Pipelines, Agile, Docker, California Consumer Privacy Act (CCPA), Data Pipelines, SQL, Data Engineering, Amazon DynamoDB

Data Engineer

2017 - 2018
Glo
  • Developed personalized recommendations with machine learning using Flask and Python.
  • Collaborated with business analysts in researching KPIs for user retention using Redshift and Python.
  • Managed and monitored releases to production with Rancher, New Relic, Scalr, and AWS.
  • Architected and managed tables and ETL processes in Redshift, PostgreSQL, MySQL, and Airflow.
Technologies: Python, Redshift, Amazon Web Services (AWS), PostgreSQL, MySQL, Apache Airflow, Statistical Analysis, Statistical Modeling, Machine Learning, ETL, CI/CD Pipelines, Agile, Docker, Amazon S3 (AWS S3), California Consumer Privacy Act (CCPA), Data Pipelines, SQL, Data Engineering, Database Migration, Amazon RDS, Amazon Aurora, Amazon Redshift

Data Engineer

2013 - 2014
UberMedia
  • Analyzed data sets for relevant trends and potential to increase profit.
  • Sorted users into audiences based on application usage.
  • Extrapolated application and audience association based on data collected from social media.
  • Improved a machine learning bidding system by enhancing runtime and click accuracy.
Technologies: Data Analysis, Python, Amazon Web Services (AWS), EMR, Statistical Analysis, Statistical Modeling, Machine Learning, ETL, PostgreSQL, Agile, Docker, Redshift, Amazon S3 (AWS S3), Data Pipelines, SQL, Data Engineering

Experience

Senior Capstone Project

Worked with Shell International Exploration & Production and other seniors to improve their drilling techniques. We devised a machine learning model that analyzed data in real time and compared it with previous drilling data; we identified the type of rock being drilled through and provided recommendations on how to dig through the rock faster and safer.

We accomplished this by utilizing a mixture of the Gaussian model to identify the rock and statistically comparing the identified cluster to other similar clusters to provide better recommendations. All of the code for this project was written in Python.

Fraud Detection

This machine learning model was used to detect fraudulent credit card usage within 0.5 seconds of the request for purchase. In addition, we added A/B testing for the model deployment, enabling its continuous improvement.

I was in charge of setting up and architecting the back end of this service, which primarily utilized Kafka and Java to ensure everything ran quickly. Our machine-learning models were deployed using Amazon SageMaker.

Automation of CCPA Deletion and Access Pipeline

A process to quickly undertake all CCPA deletions and access requests within 30 days of receiving the request. The process was previously hosted on a Google Sheet and required three people to complete. We utilized OneTrust to collect and track requests and integrated them with the API. This enabled us to successfully delete all user data and retrieve it when requested without human intervention.

Education

2009 - 2013

Bachelor's Degree in Mathematics

Harvey Mudd College - Clarmont, CA, USA

Skills

Tools

Apache Airflow, CircleCI, Kafka Streams, Amazon SageMaker, Terraform, Amazon Elastic MapReduce (EMR), Microsoft Power BI, Prefect, Jira

Languages

Python, SQL, Snowflake, Java, JavaScript, Scala, R

Paradigms

ETL, Agile, Business Intelligence (BI)

Platforms

Amazon Web Services (AWS), Docker, AWS Lambda, Apache Kafka, Azure, Databricks

Storage

PostgreSQL, MySQL, Redshift, Amazon S3 (AWS S3), Data Pipelines, Database Migration, MongoDB, Amazon DynamoDB, Amazon Aurora

Frameworks

Spark

Other

California Consumer Privacy Act (CCPA), Data Analysis, Data Engineering, Amazon RDS, CI/CD Pipelines, Amazon Redshift, Statistical Analysis, Statistical Modeling, Machine Learning, EMR, Data Build Tool (dbt), Data Warehousing, Query Optimization, Data Warehouse Design, APIs, GDPR

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring