Jake Thomas
Verified Expert in Engineering
Data Engineer and Developer
01915, United States
Toptal member since February 8, 2022
Jake is a data engineer experienced in public companies, mid-size private companies, and startups. His past accomplishments include migrating data warehouses to Snowflake, building frameworks to ingest data from hundreds of third-party sources, leveraging DBT tame data modeling, lineage, and documentation, leading data quality and alerting efforts, and teaching online Snowflake courses with Pearson and O'Reilly. Jake is passionate about scaling data systems that empower business decision-making.
Portfolio
Experience
- Python - 7 years
- Data Warehousing - 7 years
- SQL - 6 years
- Apache Kafka - 6 years
- Apache Airflow - 6 years
- BigQuery - 5 years
- Amazon Web Services (AWS) - 5 years
- Snowflake - 5 years
Availability
Preferred Environment
MacOS
The most amazing...
...feeling I've achieved in my job is helping people grow in their careers and become better engineers.
Work Experience
Lead Data Platform Engineer
6 River Systems
- Created PostgreSQL to BigQuery pipelines across thousands of PG databases.
- Migrated the in-house data modeling toolsets to DBT, drastically improving data model documentation, lineage, and dependency management.
- Built, deployed, and maintained streaming event pipelines across thousands of fulfillment robots.
- Developed and maintained a customer-facing data API to serve data to partners.
Lead Data Engineer
Cargurus
- Migrated a data warehouse from BigQuery to Snowflake.
- Built a framework to integrate hundreds of third-party sources with Snowflake.
- Deployed and managed an autoscaling instance of Snowplow Analytics event streaming pipelines. The system processed 12k-15k messages per second continuously.
- Moved a legacy modeling framework to DBT to make data modeling sustainable and transferable.
- Wrote Terraform code to deploy all pieces of the analytical infrastructure.
- Deployed Airflow for DAG scheduling and dependency management.
Data Engineer
Wanderu
- Set up and maintained a Redshift-based data warehouse.
- Created data pipelines from various PostgreSQL and Mongo databases to Redshift.
- Installed and maintained an auto-scaling BI platform.
- Developed and maintained Snowplow Analytics to collect and warehousing streaming event data.
- Assembled and maintained Kafka for log and event centralization.
- Automated AdWords and a traffic acquisition platform.
- Created pipelines for customer-facing route metrics.
- Became a certified EnterpriseDB PostgreSQL administrator.
Experience
One Billion Events Per Day with Snowplow and Snowflake
https://www.bostata.com/268-billion-events-with-snowplow-snowflake-at-cargurusThe system collects and stores many petabytes of validated and warehoused data within minutes.
Building a Modern Data Platform with Snowflake
https://www.oreilly.com/live-events/building-a-modern-data-platform-with-snowflake/0636920414971/0636920064273/I planned, created, and delivered numerous Data Warehousing courses for Pearson on O'Reilly Learning's platform. The introductory course is a three-hour lesson covering getting started using Snowflake from scratch.
https://github.com/silverton-io/building-a-modern-data-platform-with-snowflake
Three Reasons Why Your Company Should Own Its Data
Skills
Tools
Terraform, BigQuery, Apache Airflow, Snowplow Analytics, CircleCI
Languages
Python, SQL, Snowflake, Go
Platforms
Google Cloud Platform (GCP), Amazon Web Services (AWS), Apache Kafka, Docker, MacOS
Paradigms
DevOps
Storage
Redshift
Other
Data Build Tool (dbt), Atlantis, Data Warehousing, Streaming, Amazon Kinesis, Data Warehouse Design, AWS DevOps, Web Security, Cloud Security, GitOps
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring