Data Transformation Engineer
2021 - 2022PepsiCo Global - Main via Toptal- Taught and mentored several PepsiCo contractors and employees across multiple teams on how to use dbt more efficiently.
- Introduced and implemented CI/CD and Slim CI processes to the entire eCommerce division of Pepsi, which increased productivity by 800% and at the same time drastically reduced resources used in the data warehouse.
- Introduced SQLFluff, a SQL linter, to all of Pepsi's eCommerce divisions. Linted around 700 different SQL dbt data models with over 20 linting rules. Got an entire eCommerce division using the same SQL dialect and SQL style.
- Implemented incremental models or data loads in place of full data refreshes, reducing the time it took for bigger data pipelines to run and increasing developer productivity.
- Introduced dbt best practices for data engineers, analytics engineers, and data analysts.
Technologies: SQL, Python, Dimensional Modeling, SQL Server Integration Services (SSIS), Data Pipelines, Data Modeling, Data Build Tool (dbt), Snowflake, ETL, Data Engineering, Data Transformation, Data Warehouse Design, Database ArchitectureSenior Data Engineer
2020 - 2021Netlify- Maintained business-critical data pipelines which handled 2TB of data a day with over 60 different data pipelines.
- Migrated the company's data process from ETL to ELT using DBT and Spark on Databricks. This simplified the number of programming languages used from four to two and allowed for easier debugging and idempotent data pipelines.
- Introduced and productionized the use of Apache Airflow for data workflow orchestration and to kick off extractions and loads that previously were extremely tedious and difficult to monitor.
- Switched from error-prone data extractions for some data sources over to Fivetran. After Fivetran was set up, introduced several new datasets, and integrated them into our data warehouse for analytical uses.
- Authored and maintained big data compute jobs by leveraging PySpark and SparkSQL on Databricks and building out a data lakehouse in the process.
- Secured data warehouse clusters by automating the creation of credentials through Ansible and Terraform.
- Built a cost-of-goods data model that leveraged source billing data from major cloud providers into one conformed model in order to answer questions about the company's cloud spending.
Technologies: Data Build Tool (dbt), Spark, Databricks, SQL, Python, Scala, Ansible, Terraform, Big Data, Redshift, BigQuery, Google Cloud Platform (GCP), DigitalOcean, ETL, ELT, Apache Airflow, Fivetran, PySpark, Business Intelligence (BI), Amazon Web Services (AWS), Data Migration, Salesforce API, Salesforce, REST APIs, Pandas, Dashboard Design, Dashboards, Data Analytics, Data Analysis, Data Warehouse Design, Database Architecture, MongoDBData Engineer Curriculum Writer (Subject Matter Expert)
2020 - 2020Thinkful- Designed a 6-month curriculum plan for teaching data engineers.
- Wrote core modules and led meetings to brainstorm the best topics to cover.
- Created hands-on labs for students to go through to gain practical experience in the role of a data engineer.
Technologies: Apache Airflow, Linux, Redshift, SQL, Bash, Python, Data Modeling, Amazon Web Services (AWS), Data Analytics, Data Analysis, PostgreSQLSenior Data Engineer Contractor
2020 - 2020Offer Up- Migrated dozens of data pipelines from Apache Airflow to Google Cloud Composer in a very short time span.
- Assisted in a mass data warehouse migration from Snowflake to BigQuery for a total of 1PB (petabyte) of data being migrated between AWS and Google Cloud.
- Reconciled data records in old and new data warehouses to ensure that the data pipelines worked as expected.
- Rewrote analytical SQL queries that were Snowflake-specific to use BigQuery's syntax which involved lateral flattens and other types of complicated joins on nested data and semi-structured data.
Technologies: Apache Airflow, BigQuery, Snowflake, Amazon S3 (AWS S3), Google Cloud Storage, Data Build Tool (dbt), Python, SQL, Fivetran, Amazon Web Services (AWS), Data Migration, Data Analytics, Data AnalysisData Engineer
2019 - 2020Brushfire- Built out a new data warehouse in AWS for analytical and data mining workloads (the first data warehouse of the company).
- Implemented business intelligence and data visualizations for the company and client KPIs.
- Built repeatable, scalable ETL data pipelines in the cloud.
- Identified data performance issues in queries, indexes, data modeling, and so on.
- Migrated from .NET to .NET Core, then transitioned the entire web stack from Azure VMs to Kubernetes (complete with cluster and pod autoscaling and CI/CD with Azure DevOps).
Technologies: Redshift, Azure, Kubernetes, SQL Server DBA, SQL, C#, .NET, .NET Core, Azure DevOps, Data Pipelines, AWS Data Pipeline Service, Stripe, ETL, CI/CD Pipelines, Azure Kubernetes Service (AKS), Tableau, Python, Azure DevOps Services, Amazon Web Services (AWS), Web Scraping, T-SQL, Data Migration, Pandas, Dashboard Design, Dashboards, Data Analytics, Data Analysis, PostgreSQL, Data Warehouse Design, Database Architecture, MongoDBPrincipal Big Data Consultant
2014 - 2019zData, Inc- Architected big data and cloud solutions in a cloud-based enterprise Linux environment utilizing compute resources such as EC2, S3, AWS Auto Scaling, Lamba functions, DynamoDB, Couchbase, SQS, and Elastic MapReduce (EMR).
- Automated repeatable deployments of big data software using Ansible, AWS CloudFormation, and Terraform.
- Secured distributed clusters via security groups, firewalls, authorization and authentication policies. Kerberized Hadoop and other distributed clusters for strong authentication.
- Led various software development projects in back-end web APIs and cloud-based web applications in Java, Python, and Elixir.
- Architected distributed, fault-tolerant, and highly available systems using on-premise or cloud-based hardware.
- Built a custom Apache Ambari stack containing installation and management capabilities for Pivotal Greenplum, Pivotal HAWQ, and Chorus in Python.
- Developed a parallel backup and restore solution for large compute clusters in the AWS cloud in Java.
Technologies: Hadoop, Spark, Apache ZooKeeper, Apache Hive, HBase, Apache Ambari, Data Lakes, Big Data Architecture, Data Lake Design, Greenplum, Redshift, Apache Kafka, Amazon EC2, Linux, Ansible, Bash, AWS CloudFormation, Terraform, Amazon S3 (AWS S3), Hortonworks Data Platform (HDP), Cloudera, EMR, Autoscaling, Couchbase, Amazon Simple Queue Service (SQS), Amazon DynamoDB, Kerberos, Java, Python, Elixir, MySQL, Amazon Web Services (AWS), Web Scraping, Data Migration, Salesforce API, Data Analytics, Data Analysis, PostgreSQLDevOps Engineer
2014 - 2014Melaleuca- Built an internal tool to dynamically discover WCF endpoints and make async requests to see if they are online or not. The tool was used to warm up the services to trigger JIT compilation and to actively monitor which endpoints were down.
- Wrote a custom DevOps dashboard that the entire team used to monitor how deployments were going and the health of the cloud infrastructure.
- Automated several complicated workflows in SharePoint to reduce the number of repetitive tasks for the team.
Technologies: C#, Angular, JavaScript, HTML, CSS, Windows PowerShell, Bash, Azure, SharePoint, Windows Communication Framework (WCF), Amazon Web Services (AWS)Firmware Engineer | Web Developer | Network Operations Manager | Senior Support Technician
2007 - 2013Linora- Wrote the back-end and front-end code for router control panels and for "MeshView" which was a cloud-based (before the cloud) management portal for WiFi networks.
- Managed a small team of network operators and support technicians to take inbound level I and level II calls; was responsible for hiring, firing, teaching, and scheduling the team.
- Wrote firmware for WiFiRanger and BlueMesh Networks product firmware; worked with kernel modules, internal and external radios, USB modem connectivity, controlling remote radios over ethernet, failover, and failback logic.
- Assisted in the management, monitoring, and software for monitoring over 10,000 IoT and routers in the field.
Technologies: HTML, PHP, CSS, C, Bash, Linux, SSH, USB, Firmware, Linux Kernel Modules, WiFi, JavaScript, jQuery, MySQL, Web Scraping, PostgreSQL