Jake Thomas, Developer in 01915, United States
Jake is available for hire
Hire Jake

Jake Thomas

Verified Expert  in Engineering

Data Engineer and Developer

Location
01915, United States
Toptal Member Since
February 8, 2022

Jake is a data engineer experienced in public companies, mid-size private companies, and startups. His past accomplishments include migrating data warehouses to Snowflake, building frameworks to ingest data from hundreds of third-party sources, leveraging DBT tame data modeling, lineage, and documentation, leading data quality and alerting efforts, and teaching online Snowflake courses with Pearson and O'Reilly. Jake is passionate about scaling data systems that empower business decision-making.

Portfolio

6 River Systems
Python, Go, SQL, Google Cloud Platform (GCP), CircleCI, Data Build Tool (dbt)...
Cargurus
Python, SQL, Snowflake, Amazon Web Services (AWS), Data Build Tool (dbt)...
Wanderu
Apache Airflow, Python, Redshift, Docker

Experience

Availability

Part-time

Preferred Environment

MacOS

The most amazing...

...feeling I've achieved in my job is helping people grow in their careers and become better engineers.

Work Experience

Lead Data Platform Engineer

2021 - PRESENT
6 River Systems
  • Created PostgreSQL to BigQuery pipelines across thousands of PG databases.
  • Migrated the in-house data modeling toolsets to DBT, drastically improving data model documentation, lineage, and dependency management.
  • Built, deployed, and maintained streaming event pipelines across thousands of fulfillment robots.
  • Developed and maintained a customer-facing data API to serve data to partners.
Technologies: Python, Go, SQL, Google Cloud Platform (GCP), CircleCI, Data Build Tool (dbt), GitOps, Terraform, Atlantis, BigQuery

Lead Data Engineer

2018 - 2021
Cargurus
  • Migrated a data warehouse from BigQuery to Snowflake.
  • Built a framework to integrate hundreds of third-party sources with Snowflake.
  • Deployed and managed an autoscaling instance of Snowplow Analytics event streaming pipelines. The system processed 12k-15k messages per second continuously.
  • Moved a legacy modeling framework to DBT to make data modeling sustainable and transferable.
  • Wrote Terraform code to deploy all pieces of the analytical infrastructure.
  • Deployed Airflow for DAG scheduling and dependency management.
Technologies: Python, SQL, Snowflake, Amazon Web Services (AWS), Data Build Tool (dbt), Data Warehousing, Data Warehouse Design, Apache Kafka, Apache Airflow, Streaming

Data Engineer

2016 - 2018
Wanderu
  • Set up and maintained a Redshift-based data warehouse.
  • Created data pipelines from various PostgreSQL and Mongo databases to Redshift.
  • Installed and maintained an auto-scaling BI platform.
  • Developed and maintained Snowplow Analytics to collect and warehousing streaming event data.
  • Assembled and maintained Kafka for log and event centralization.
  • Automated AdWords and a traffic acquisition platform.
  • Created pipelines for customer-facing route metrics.
  • Became a certified EnterpriseDB PostgreSQL administrator.
Technologies: Apache Airflow, Python, Redshift, Docker

One Billion Events Per Day with Snowplow and Snowflake

https://www.bostata.com/268-billion-events-with-snowplow-snowflake-at-cargurus
At CarGurus, I led the implementation of an auto-scaling event system that processes over a billion events per day using AWS and Snowflake.

The system collects and stores many petabytes of validated and warehoused data within minutes.

Building a Modern Data Platform with Snowflake

https://www.oreilly.com/live-events/building-a-modern-data-platform-with-snowflake/0636920414971/0636920064273/
Snowflake is a modern data warehouse that is built for cloud-scale workloads.

I planned, created, and delivered numerous Data Warehousing courses for Pearson on O'Reilly Learning's platform. The introductory course is a three-hour lesson covering getting started using Snowflake from scratch.

https://github.com/silverton-io/building-a-modern-data-platform-with-snowflake

Three Reasons Why Your Company Should Own Its Data

Periodically, I guest-post on well-known technical blogs. This post is a collaboration between Snowplow Analytics and my side business and discusses the importance of owning your own data pipelines and storage.

Languages

Python, SQL, Snowflake, Go

Tools

Terraform, BigQuery, Apache Airflow, Snowplow Analytics, CircleCI

Platforms

Google Cloud Platform (GCP), Amazon Web Services (AWS), Apache Kafka, Docker, MacOS

Other

Data Build Tool (dbt), Atlantis, Data Warehousing, Streaming, Amazon Kinesis, Data Warehouse Design, AWS DevOps, Web Security, Cloud Security, GitOps

Paradigms

DevOps

Storage

Redshift

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring