Brian de la Motte
Verified Expert in Engineering
Data Engineer and Developer
Brian is a senior data engineer and software engineer with over 14 years of experience writing code, leading projects, and solving some of the toughest problems within popular cloud and big data ecosystems. He specializes in data skills such as Airflow and DBT, as well as cloud automation skills like Terraform and Ansible. Brian's favorite projects are data and cloud migration projects, machine learning, automation, and even traditional software development.
Portfolio
Experience
Availability
Preferred Environment
Apache Airflow, Data Build Tool (dbt), Amazon Web Services (AWS), Redshift, Snowflake, Databricks, Python, Java, SQL, Spark
The most amazing...
...project I'm proud of was architecting and implementing a modern cloud-based big data platform and data lake for the Social Security Administration.
Work Experience
Data Transformation Engineer
PepsiCo Global - Main
- Taught and mentored several PepsiCo contractors and employees across multiple teams on how to use dbt more efficiently.
- Introduced and implemented CI/CD and Slim CI processes to the entire eCommerce division of Pepsi, which increased productivity by 800% and at the same time drastically reduced resources used in the data warehouse.
- Introduced SQLFluff, a SQL linter, to all of Pepsi's eCommerce divisions. Linted around 700 different SQL dbt data models with over 20 linting rules. Got an entire eCommerce division using the same SQL dialect and SQL style.
- Implemented incremental models or data loads in place of full data refreshes, reducing the time it took for bigger data pipelines to run and increasing developer productivity.
- Introduced dbt best practices for data engineers, analytics engineers, and data analysts.
Senior Data Engineer
Netlify
- Maintained business-critical data pipelines which handled 2TB of data a day with over 60 different data pipelines.
- Migrated the company's data process from ETL to ELT using DBT and Spark on Databricks. This simplified the number of programming languages used from four to two and allowed for easier debugging and idempotent data pipelines.
- Introduced and productionized the use of Apache Airflow for data workflow orchestration and to kick off extractions and loads that previously were extremely tedious and difficult to monitor.
- Switched from error-prone data extractions for some data sources over to Fivetran. After Fivetran was set up, introduced several new datasets, and integrated them into our data warehouse for analytical uses.
- Authored and maintained big data compute jobs by leveraging PySpark and SparkSQL on Databricks and building out a data lakehouse in the process.
- Secured data warehouse clusters by automating the creation of credentials through Ansible and Terraform.
- Built a cost-of-goods data model that leveraged source billing data from major cloud providers into one conformed model in order to answer questions about the company's cloud spending.
Data Engineer Curriculum Writer (Subject Matter Expert)
Thinkful
- Designed a 6-month curriculum plan for teaching data engineers.
- Wrote core modules and led meetings to brainstorm the best topics to cover.
- Created hands-on labs for students to go through to gain practical experience in the role of a data engineer.
Senior Data Engineer Contractor
Offer Up
- Migrated dozens of data pipelines from Apache Airflow to Google Cloud Composer in a very short time span.
- Assisted in a mass data warehouse migration from Snowflake to BigQuery for a total of 1PB (petabyte) of data being migrated between AWS and Google Cloud.
- Reconciled data records in old and new data warehouses to ensure that the data pipelines worked as expected.
- Rewrote analytical SQL queries that were Snowflake-specific to use BigQuery's syntax which involved lateral flattens and other types of complicated joins on nested data and semi-structured data.
Data Engineer
Brushfire
- Built out a new data warehouse in AWS for analytical and data mining workloads (the first data warehouse of the company).
- Implemented business intelligence and data visualizations for the company and client KPIs.
- Built repeatable, scalable ETL data pipelines in the cloud.
- Identified data performance issues in queries, indexes, data modeling, and so on.
- Migrated from .NET to .NET Core, then transitioned the entire web stack from Azure VMs to Kubernetes (complete with cluster and pod autoscaling and CI/CD with Azure DevOps).
Principal Big Data Consultant
zData, Inc
- Architected big data and cloud solutions in a cloud-based enterprise Linux environment utilizing compute resources such as EC2, S3, AWS Auto Scaling, Lamba functions, DynamoDB, Couchbase, SQS, and Elastic MapReduce (EMR).
- Automated repeatable deployments of big data software using Ansible, AWS CloudFormation, and Terraform.
- Secured distributed clusters via security groups, firewalls, authorization and authentication policies. Kerberized Hadoop and other distributed clusters for strong authentication.
- Led various software development projects in back-end web APIs and cloud-based web applications in Java, Python, and Elixir.
- Architected distributed, fault-tolerant, and highly available systems using on-premise or cloud-based hardware.
- Built a custom Apache Ambari stack containing installation and management capabilities for Pivotal Greenplum, Pivotal HAWQ, and Chorus in Python.
- Developed a parallel backup and restore solution for large compute clusters in the AWS cloud in Java.
DevOps Engineer
Melaleuca
- Built an internal tool to dynamically discover WCF endpoints and make async requests to see if they are online or not. The tool was used to warm up the services to trigger JIT compilation and to actively monitor which endpoints were down.
- Wrote a custom DevOps dashboard that the entire team used to monitor how deployments were going and the health of the cloud infrastructure.
- Automated several complicated workflows in SharePoint to reduce the number of repetitive tasks for the team.
Firmware Engineer | Web Developer | Network Operations Manager | Senior Support Technician
Linora
- Wrote the back-end and front-end code for router control panels and for "MeshView" which was a cloud-based (before the cloud) management portal for WiFi networks.
- Managed a small team of network operators and support technicians to take inbound level I and level II calls; was responsible for hiring, firing, teaching, and scheduling the team.
- Wrote firmware for WiFiRanger and BlueMesh Networks product firmware; worked with kernel modules, internal and external radios, USB modem connectivity, controlling remote radios over ethernet, failover, and failback logic.
- Assisted in the management, monitoring, and software for monitoring over 10,000 IoT and routers in the field.
Experience
Cost-of-goods Data Model
http://netlify.comI designed it from end to end and built all the data pipelines. This project was used internally to forecast the company's future cloud costs and to drill down into the specifics to see how money was being spent.
Big Data Platform
https://www.ssa.gov/Out of this came an idea of dynamic templating so several clusters could be spun up in parallel and used for development or production.
The government had the most stringent requirements involving things being approved for FEDRAMPED and before using the cluster for official government use, we had to get the official "Authority to Operate" approval. This involved turning every security bell and whistle you could think of, including vormetric disk encryption, SSL on everything, security groups, firewalls, authorization, SSSD, Kerberos, RedHat IdM integration, cross-forest Active Directory trusts.
After getting the ATO, we helped several teams within the SSA get onboarded to the BDP to use Hive and Apache Spark for various ML and analytical workloads. Our work was so successful, the NIH wanted to use our cluster for some work too.
The ML work involved parameter tuning and installing TensorFlow on Spark, XGBoost, and other popular libraries.
Hadoop Data Lake
Massive Data Warehouse Migration
http://offerup.comVirtual Machine to Kubernetes Migration
http://brushfire.comFinally, after getting the cluster pod autoscaling and horizontal pod autoscaling working correctly, I set up several CI /CD pipelines for production, staging, and development branches to automatically build and deploy the updated images to the production, staging, and development websites.
Out of all this, the company saved about $2,000 per month on their hosting costs as the autoscaling was much more resilient, and the new cluster was cheaper than paying for the raw VMs.
Empire Unmanned
https://www.empireunmanned.com/Modern API Rewrite
https://www.recallinfolink.com/Skills
Languages
Python, SQL, Bash, Snowflake, Java, C, Scala, C#, Elixir, HTML, PHP, CSS, JavaScript, R, T-SQL (Transact-SQL)
Frameworks
Spark, .NET, .NET Core, Hadoop, Angular, Windows PowerShell, Phoenix
Libraries/APIs
PySpark, REST APIs, Pandas, Stripe, jQuery, React
Tools
Apache Airflow, Ansible, Terraform, BigQuery, Azure Kubernetes Service (AKS), Tableau, Azure DevOps Services, Apache ZooKeeper, Apache Ambari, AWS CloudFormation, Cloudera, Amazon Simple Queue Service (SQS), Amazon Elastic Container Service (Amazon ECS), Amazon EBS, Amazon Redshift Spectrum, AWS Glue, Amazon Athena
Paradigms
ETL, Agile, Scrum, Business Intelligence (BI), Azure DevOps, Test-driven Development (TDD), Dimensional Modeling
Platforms
Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), DigitalOcean, Azure, Kubernetes, Linux, Apache Kafka, Amazon EC2, Hortonworks Data Platform (HDP), SharePoint, Salesforce
Storage
Data Pipelines, PostgreSQL, Database Architecture, Redshift, MySQL, MongoDB, SQL Server DBA, AWS Data Pipeline Service, Amazon S3 (AWS S3), Google Cloud Storage, Apache Hive, HBase, Data Lakes, Data Lake Design, Greenplum, Couchbase, Amazon DynamoDB, Google Cloud, SQL Server Integration Services (SSIS)
Other
Big Data, ELT, CI/CD Pipelines, Data Architecture, Data Engineering, Data Analytics, Data Analysis, Data Warehouse Design, Data Build Tool (dbt), Data Modeling, Data Migration, Dashboard Design, Dashboards, Software Development Lifecycle (SDLC), Programming, Quantum Computing, Ethics, Algorithms, Fivetran, Big Data Architecture, EMR, Autoscaling, Kerberos, SSH, USB, Firmware, Linux Kernel Modules, WiFi, Windows Communication Foundation (WCF), Active Directory Federation, Vormetric, Web Security, Cloud Architecture, Mainframe, Web Scraping, Amazon Kinesis, Data Transformation
Education
Bachelor's Degree in Computer Science
Boise State University - Boise, ID, United States
Certifications
AWS Certified Solutions Architect Associate
AWS
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring