Kyle Chakos, Developer in Madrid, Spain
Kyle is available for hire
Hire Kyle

Kyle Chakos

Verified Expert  in Engineering

Data Engineer and Developer

Madrid, Spain
Toptal Member Since
February 28, 2023

Kyle has 10+ years of experience in data and machine learning engineering. He has worked at companies of various sizes, but primarily startups, collaborating with teams with almost no data infrastructure and helping them expand or update their architecture into something scalable. With a background in mathematics and engineering, Kyle is also uniquely set up to help data scientists get their projects into production in a scalable way while ensuring accuracy and effectiveness.


Appex Group, Inc.
Snowflake, Business Intelligence (BI), Data Warehousing, Query Optimization...
Amazon Web Services (AWS), Python, Apache Airflow, Docker, CircleCI, EMR, Spark...
Apache Kafka, Kafka Streams, Java, Python, Amazon Web Services (AWS), Scala...




Preferred Environment

Amazon Web Services (AWS), Python

The most amazing...

...thing I've accomplished was a 1,000x time improvement in a machine learning model by utilizing fast Fourier transforms to replace the built-in Pandas method.

Work Experience

Snowflake Data Engineer

2023 - PRESENT
Appex Group, Inc.
  • Rewrote code to handle errors more robustly while simultaneously reducing complexity of the code base.
  • Upgraded the Airflow instances to integrate with AWS more seamlessly.
  • Wrote new ingestion pipelines and worked with analysts to ensure the data suited their needs.
Technologies: Snowflake, Business Intelligence (BI), Data Warehousing, Query Optimization, Python, Redshift, Data Warehouse Design

Senior Data Engineer

2020 - 2023
  • Automated data ingestion from various sources with Airflow, Amazon EMR, AWS Kinesis, AWS Lambda, Python, and Snowflake.
  • Rearchitected historical data pipelines to utilize more modern methods and provide proper alerting, moving from Java, Scala, and Redshift to Python, Airflow, and Snowflake.
  • Managed a team of consultants to complete the automation and redesign of our CCPA data pipeline.
  • Monitored, maintained, and designed data infrastructure in AWS S3, EC2, EMR, and ECR.
  • Assisted in data discovery and implementation of machine learning algorithms.
Technologies: Amazon Web Services (AWS), Python, Apache Airflow, Docker, CircleCI, EMR, Spark, AWS Lambda, Redshift, PostgreSQL, Snowflake, Amazon S3 (AWS S3), Scala, Java, California Consumer Privacy Act (CCPA), ETL, Statistical Analysis, MySQL, CI/CD Pipelines, Agile, Amazon SageMaker, Data Pipelines, SQL, Data Engineering, Amazon DynamoDB, Terraform, Amazon Elastic MapReduce (EMR)

Senior Software Engineer, Database

2019 - 2020
  • Automated the quality testing of newly trained models using Scala and Python.
  • Created a framework to launch models into a production environment with Java, Kafka, and AWS.
  • Architected and implemented feedback loops to relieve third-party dependencies with Python.
  • Designed and programmed tooling to give visibility into the model output using Python, AWS, and Slack.
Technologies: Apache Kafka, Kafka Streams, Java, Python, Amazon Web Services (AWS), Scala, Amazon SageMaker, ETL, Statistical Modeling, Machine Learning, Snowflake, PostgreSQL, CI/CD Pipelines, Agile, Docker, Amazon S3 (AWS S3), California Consumer Privacy Act (CCPA), Data Pipelines, SQL, Data Engineering, Terraform, Amazon Elastic MapReduce (EMR)

Data Engineer

2018 - 2019
Creative Artists Agency
  • Designed and implemented ETL processes using Python, Azure Data Factory, MongoDB, and MySQL.
  • Converted bulk processing systems to a streaming model using Python.
  • Created various views in MySQL for data scientists and business analysts.
  • Launched data science models into production and assisted in identifying and debugging errors in R.
Technologies: Azure, Python, MongoDB, MySQL, R, Statistical Modeling, ETL, JavaScript, PostgreSQL, CI/CD Pipelines, Agile, Docker, California Consumer Privacy Act (CCPA), Data Pipelines, SQL, Data Engineering, Amazon DynamoDB

Data Engineer

2017 - 2018
  • Developed personalized recommendations with machine learning using Flask and Python.
  • Collaborated with business analysts in researching KPIs for user retention using Redshift and Python.
  • Managed and monitored releases to production with Rancher, New Relic, Scalr, and AWS.
  • Architected and managed tables and ETL processes in Redshift, PostgreSQL, MySQL, and Airflow.
Technologies: Python, Redshift, Amazon Web Services (AWS), PostgreSQL, MySQL, Apache Airflow, Statistical Analysis, Statistical Modeling, Machine Learning, ETL, CI/CD Pipelines, Agile, Docker, Amazon S3 (AWS S3), California Consumer Privacy Act (CCPA), Data Pipelines, SQL, Data Engineering

Data Engineer

2013 - 2014
  • Analyzed data sets for relevant trends and potential to increase profit.
  • Sorted users into audiences based on application usage.
  • Extrapolated application and audience association based on data collected from social media.
  • Improved a machine learning bidding system by enhancing runtime and click accuracy.
Technologies: Data Analysis, Python, Amazon Web Services (AWS), EMR, Statistical Analysis, Statistical Modeling, Machine Learning, ETL, PostgreSQL, Agile, Docker, Redshift, Amazon S3 (AWS S3), Data Pipelines, SQL, Data Engineering

Senior Capstone Project

Worked with Shell International Exploration & Production and other seniors to improve their drilling techniques. We devised a machine learning model that analyzed data in real time and compared it with previous drilling data; we identified the type of rock being drilled through and provided recommendations on how to dig through the rock faster and safer.

We accomplished this by utilizing a mixture of the Gaussian model to identify the rock and statistically comparing the identified cluster to other similar clusters to provide better recommendations. All of the code for this project was written in Python.

Fraud Detection

This machine learning model was used to detect fraudulent credit card usage within 0.5 seconds of the request for purchase. In addition, we added A/B testing for the model deployment, enabling its continuous improvement.

I was in charge of setting up and architecting the back end of this service, which primarily utilized Kafka and Java to ensure everything ran quickly. Our machine-learning models were deployed using Amazon SageMaker.

Automation of CCPA Deletion and Access Pipeline

A process to quickly undertake all CCPA deletions and access requests within 30 days of receiving the request. The process was previously hosted on a Google Sheet and required three people to complete. We utilized OneTrust to collect and track requests and integrated them with the API. This enabled us to successfully delete all user data and retrieve it when requested without human intervention.
2009 - 2013

Bachelor's Degree in Mathematics

Harvey Mudd College - Clarmont, CA, USA


Apache Airflow, CircleCI, Kafka Streams, Amazon SageMaker, Terraform, Amazon Elastic MapReduce (EMR)


Python, SQL, Snowflake, Java, JavaScript, Scala, R


ETL, Agile, Business Intelligence (BI)


Amazon Web Services (AWS), Docker, AWS Lambda, Apache Kafka, Azure


PostgreSQL, MySQL, Redshift, Amazon S3 (AWS S3), Data Pipelines, MongoDB, Amazon DynamoDB




California Consumer Privacy Act (CCPA), Data Analysis, Data Engineering, CI/CD Pipelines, Statistical Analysis, Statistical Modeling, Machine Learning, EMR, Data Build Tool (dbt), Data Warehousing, Query Optimization, Data Warehouse Design

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring