Brian de la Motte, Developer in Meridian, ID, United States
Brian is available for hire
Hire Brian

Brian de la Motte

Verified Expert  in Engineering

Data Engineer and Developer

Location
Meridian, ID, United States
Toptal Member Since
July 13, 2021

Brian is a senior data engineer and software engineer with over 14 years of experience writing code, leading projects, and solving some of the toughest problems within popular cloud and big data ecosystems. He specializes in data skills such as Airflow and DBT, as well as cloud automation skills like Terraform and Ansible. Brian's favorite projects are data and cloud migration projects, machine learning, automation, and even traditional software development.

Portfolio

PepsiCo Global - Main
SQL, Python, Dimensional Modeling, SQL Server Integration Services (SSIS)...
Netlify
Data Build Tool (dbt), Spark, Databricks, SQL, Python, Scala, Ansible...
Thinkful
Apache Airflow, Linux, Redshift, SQL, Bash, Python, Data Modeling...

Experience

Availability

Part-time

Preferred Environment

Apache Airflow, Data Build Tool (dbt), Amazon Web Services (AWS), Redshift, Snowflake, Databricks, Python, Java, SQL, Spark

The most amazing...

...project I'm proud of was architecting and implementing a modern cloud-based big data platform and data lake for the Social Security Administration.

Work Experience

Data Transformation Engineer

2021 - 2022
PepsiCo Global - Main
  • Taught and mentored several PepsiCo contractors and employees across multiple teams on how to use dbt more efficiently.
  • Introduced and implemented CI/CD and Slim CI processes to the entire eCommerce division of Pepsi, which increased productivity by 800% and at the same time drastically reduced resources used in the data warehouse.
  • Introduced SQLFluff, a SQL linter, to all of Pepsi's eCommerce divisions. Linted around 700 different SQL dbt data models with over 20 linting rules. Got an entire eCommerce division using the same SQL dialect and SQL style.
  • Implemented incremental models or data loads in place of full data refreshes, reducing the time it took for bigger data pipelines to run and increasing developer productivity.
  • Introduced dbt best practices for data engineers, analytics engineers, and data analysts.
Technologies: SQL, Python, Dimensional Modeling, SQL Server Integration Services (SSIS), Data Pipelines, Data Modeling, Data Build Tool (dbt), Snowflake, ETL, Data Engineering, Data Transformation, Data Warehouse Design, Database Architecture

Senior Data Engineer

2020 - 2021
Netlify
  • Maintained business-critical data pipelines which handled 2TB of data a day with over 60 different data pipelines.
  • Migrated the company's data process from ETL to ELT using DBT and Spark on Databricks. This simplified the number of programming languages used from four to two and allowed for easier debugging and idempotent data pipelines.
  • Introduced and productionized the use of Apache Airflow for data workflow orchestration and to kick off extractions and loads that previously were extremely tedious and difficult to monitor.
  • Switched from error-prone data extractions for some data sources over to Fivetran. After Fivetran was set up, introduced several new datasets, and integrated them into our data warehouse for analytical uses.
  • Authored and maintained big data compute jobs by leveraging PySpark and SparkSQL on Databricks and building out a data lakehouse in the process.
  • Secured data warehouse clusters by automating the creation of credentials through Ansible and Terraform.
  • Built a cost-of-goods data model that leveraged source billing data from major cloud providers into one conformed model in order to answer questions about the company's cloud spending.
Technologies: Data Build Tool (dbt), Spark, Databricks, SQL, Python, Scala, Ansible, Terraform, Big Data, Redshift, BigQuery, Google Cloud Platform (GCP), DigitalOcean, ETL, ELT, Apache Airflow, Fivetran, PySpark, Business Intelligence (BI), Amazon Web Services (AWS), Data Migration, Salesforce, REST APIs, Pandas, Dashboard Design, Dashboards, Data Analytics, Data Analysis, Data Warehouse Design, Database Architecture, MongoDB

Data Engineer Curriculum Writer (Subject Matter Expert)

2020 - 2020
Thinkful
  • Designed a 6-month curriculum plan for teaching data engineers.
  • Wrote core modules and led meetings to brainstorm the best topics to cover.
  • Created hands-on labs for students to go through to gain practical experience in the role of a data engineer.
Technologies: Apache Airflow, Linux, Redshift, SQL, Bash, Python, Data Modeling, Amazon Web Services (AWS), Data Analytics, Data Analysis, PostgreSQL

Senior Data Engineer Contractor

2020 - 2020
Offer Up
  • Migrated dozens of data pipelines from Apache Airflow to Google Cloud Composer in a very short time span.
  • Assisted in a mass data warehouse migration from Snowflake to BigQuery for a total of 1PB (petabyte) of data being migrated between AWS and Google Cloud.
  • Reconciled data records in old and new data warehouses to ensure that the data pipelines worked as expected.
  • Rewrote analytical SQL queries that were Snowflake-specific to use BigQuery's syntax which involved lateral flattens and other types of complicated joins on nested data and semi-structured data.
Technologies: Apache Airflow, BigQuery, Snowflake, Amazon S3 (AWS S3), Google Cloud Storage, Data Build Tool (dbt), Python, SQL, Fivetran, Amazon Web Services (AWS), Data Migration, Data Analytics, Data Analysis

Data Engineer

2019 - 2020
Brushfire
  • Built out a new data warehouse in AWS for analytical and data mining workloads (the first data warehouse of the company).
  • Implemented business intelligence and data visualizations for the company and client KPIs.
  • Built repeatable, scalable ETL data pipelines in the cloud.
  • Identified data performance issues in queries, indexes, data modeling, and so on.
  • Migrated from .NET to .NET Core, then transitioned the entire web stack from Azure VMs to Kubernetes (complete with cluster and pod autoscaling and CI/CD with Azure DevOps).
Technologies: Redshift, Azure, Kubernetes, SQL Server DBA, SQL, C#, .NET, .NET Core, Azure DevOps, Data Pipelines, AWS Data Pipeline Service, Stripe, ETL, CI/CD Pipelines, Azure Kubernetes Service (AKS), Tableau, Python, Azure DevOps Services, Amazon Web Services (AWS), Web Scraping, T-SQL (Transact-SQL), Data Migration, Pandas, Dashboard Design, Dashboards, Data Analytics, Data Analysis, PostgreSQL, Data Warehouse Design, Database Architecture, MongoDB

Principal Big Data Consultant

2014 - 2019
zData, Inc
  • Architected big data and cloud solutions in a cloud-based enterprise Linux environment utilizing compute resources such as EC2, S3, AWS Auto Scaling, Lamba functions, DynamoDB, Couchbase, SQS, and Elastic MapReduce (EMR).
  • Automated repeatable deployments of big data software using Ansible, AWS CloudFormation, and Terraform.
  • Secured distributed clusters via security groups, firewalls, authorization and authentication policies. Kerberized Hadoop and other distributed clusters for strong authentication.
  • Led various software development projects in back-end web APIs and cloud-based web applications in Java, Python, and Elixir.
  • Architected distributed, fault-tolerant, and highly available systems using on-premise or cloud-based hardware.
  • Built a custom Apache Ambari stack containing installation and management capabilities for Pivotal Greenplum, Pivotal HAWQ, and Chorus in Python.
  • Developed a parallel backup and restore solution for large compute clusters in the AWS cloud in Java.
Technologies: Hadoop, Spark, Apache ZooKeeper, Apache Hive, HBase, Apache Ambari, Data Lakes, Big Data Architecture, Data Lake Design, Greenplum, Redshift, Apache Kafka, Amazon EC2, Linux, Ansible, Bash, AWS CloudFormation, Terraform, Amazon S3 (AWS S3), Hortonworks Data Platform (HDP), Cloudera, EMR, Autoscaling, Couchbase, Amazon Simple Queue Service (SQS), Amazon DynamoDB, Kerberos, Java, Python, Elixir, MySQL, Amazon Web Services (AWS), Web Scraping, Data Migration, Data Analytics, Data Analysis, PostgreSQL

DevOps Engineer

2014 - 2014
Melaleuca
  • Built an internal tool to dynamically discover WCF endpoints and make async requests to see if they are online or not. The tool was used to warm up the services to trigger JIT compilation and to actively monitor which endpoints were down.
  • Wrote a custom DevOps dashboard that the entire team used to monitor how deployments were going and the health of the cloud infrastructure.
  • Automated several complicated workflows in SharePoint to reduce the number of repetitive tasks for the team.
Technologies: C#, Angular, JavaScript, HTML, CSS, Windows PowerShell, Bash, Azure, SharePoint, Windows Communication Foundation (WCF), Amazon Web Services (AWS)

Firmware Engineer | Web Developer | Network Operations Manager | Senior Support Technician

2007 - 2013
Linora
  • Wrote the back-end and front-end code for router control panels and for "MeshView" which was a cloud-based (before the cloud) management portal for WiFi networks.
  • Managed a small team of network operators and support technicians to take inbound level I and level II calls; was responsible for hiring, firing, teaching, and scheduling the team.
  • Wrote firmware for WiFiRanger and BlueMesh Networks product firmware; worked with kernel modules, internal and external radios, USB modem connectivity, controlling remote radios over ethernet, failover, and failback logic.
  • Assisted in the management, monitoring, and software for monitoring over 10,000 IoT and routers in the field.
Technologies: HTML, PHP, CSS, C, Bash, Linux, SSH, USB, Firmware, Linux Kernel Modules, WiFi, JavaScript, jQuery, MySQL, Web Scraping, PostgreSQL

Cost-of-goods Data Model

http://netlify.com
I built a data model that unified Google Cloud Platform, Amazon Web Services, Packet, Digital Ocean cloud invoices into a star schema dimensional model.

I designed it from end to end and built all the data pipelines. This project was used internally to forecast the company's future cloud costs and to drill down into the specifics to see how money was being spent.

Big Data Platform

https://www.ssa.gov/
I architected, designed, and built a big data ecosystem nicknamed (BDP) for the United State's Social Security Administration. This was a Hortonworks stack on AWS 100% and automated by Ansible, CloudFormation, and Cloudbreak. The tech stack included Spark, Hadoop, Hive and more.

Out of this came an idea of dynamic templating so several clusters could be spun up in parallel and used for development or production.

The government had the most stringent requirements involving things being approved for FEDRAMPED and before using the cluster for official government use, we had to get the official "Authority to Operate" approval. This involved turning every security bell and whistle you could think of, including vormetric disk encryption, SSL on everything, security groups, firewalls, authorization, SSSD, Kerberos, RedHat IdM integration, cross-forest Active Directory trusts.

After getting the ATO, we helped several teams within the SSA get onboarded to the BDP to use Hive and Apache Spark for various ML and analytical workloads. Our work was so successful, the NIH wanted to use our cluster for some work too.

The ML work involved parameter tuning and installing TensorFlow on Spark, XGBoost, and other popular libraries.

Hadoop Data Lake

I built a modern data lake on AWS for Time, Inc. This involved two Greenplum clusters and two Hadoop clusters and everything was automated with CloudFormation and Bash. Several clusters were also tied into the company's identity infrastructure.

Massive Data Warehouse Migration

http://offerup.com
I assisted in the migration of over 1PB from AWS to GCP. We migrated this data from S3 and Snowflake over to BigQuery and GCS. It took a total of ten engineers and three months. Dozens of data pipelines needed to be rewritten to work on Google's GCP Composer service (Google's hosted version of Airflow), tested, and deployed to production.

Virtual Machine to Kubernetes Migration

http://brushfire.com
I migrated an entire web stack hosted in Azure on Cloud Services (Classic) over to a new Kubernetes cluster running on Azure AKS. This involved finishing up the migration from .NET to .NET Core and then getting .NET Core to build on Linux containers instead of on Windows successfully.

Finally, after getting the cluster pod autoscaling and horizontal pod autoscaling working correctly, I set up several CI /CD pipelines for production, staging, and development branches to automatically build and deploy the updated images to the production, staging, and development websites.

Out of all this, the company saved about $2,000 per month on their hosting costs as the autoscaling was much more resilient, and the new cluster was cheaper than paying for the raw VMs.

Empire Unmanned

https://www.empireunmanned.com/
I built a modern web app for Empire Unmanned. This web app was responsible for allowing field pilot drones to upload their drone videos to the cloud to be processed for aerial drone inspections and mapping. In addition to being a back-end developer on the project, I was the engagement manager on the project and led a small team of engineers to complete the project.

Modern API Rewrite

https://www.recallinfolink.com/
I led a small team of software engineers to build out a small API written in Phoenix to replace a very old Perl-based system on its last legs. We got a basic API spec from the client and had to implement their needs while ensuring the API met the client's demands and was properly tested.

Languages

Python, SQL, Bash, Snowflake, Java, C, Scala, C#, Elixir, HTML, PHP, CSS, JavaScript, R, T-SQL (Transact-SQL)

Frameworks

Spark, .NET, .NET Core, Hadoop, Angular, Windows PowerShell, Phoenix

Libraries/APIs

PySpark, REST APIs, Pandas, Stripe, jQuery, React

Tools

Apache Airflow, Ansible, Terraform, BigQuery, Azure Kubernetes Service (AKS), Tableau, Azure DevOps Services, Apache ZooKeeper, Apache Ambari, AWS CloudFormation, Cloudera, Amazon Simple Queue Service (SQS), Amazon Elastic Container Service (Amazon ECS), Amazon EBS, Amazon Redshift Spectrum, AWS Glue, Amazon Athena

Paradigms

ETL, Agile, Scrum, Business Intelligence (BI), Azure DevOps, Test-driven Development (TDD), Dimensional Modeling

Platforms

Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), DigitalOcean, Azure, Kubernetes, Linux, Apache Kafka, Amazon EC2, Hortonworks Data Platform (HDP), SharePoint, Salesforce

Storage

Data Pipelines, PostgreSQL, Database Architecture, Redshift, MySQL, MongoDB, SQL Server DBA, AWS Data Pipeline Service, Amazon S3 (AWS S3), Google Cloud Storage, Apache Hive, HBase, Data Lakes, Data Lake Design, Greenplum, Couchbase, Amazon DynamoDB, Google Cloud, SQL Server Integration Services (SSIS)

Other

Big Data, ELT, CI/CD Pipelines, Data Architecture, Data Engineering, Data Analytics, Data Analysis, Data Warehouse Design, Data Build Tool (dbt), Data Modeling, Data Migration, Dashboard Design, Dashboards, Software Development Lifecycle (SDLC), Programming, Quantum Computing, Ethics, Algorithms, Fivetran, Big Data Architecture, EMR, Autoscaling, Kerberos, SSH, USB, Firmware, Linux Kernel Modules, WiFi, Windows Communication Foundation (WCF), Active Directory Federation, Vormetric, Web Security, Cloud Architecture, Mainframe, Web Scraping, Amazon Kinesis, Data Transformation

2009 - 2016

Bachelor's Degree in Computer Science

Boise State University - Boise, ID, United States

JULY 2022 - JULY 2024

AWS Certified Solutions Architect Associate

AWS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring