Brian de la Motte, Data Engineer and Developer in Meridian, ID, United States
Brian de la Motte

Data Engineer and Developer in Meridian, ID, United States

Member since July 13, 2021
Brian is a senior data engineer and software engineer with over 14 years of experience writing code, leading projects, and solving some of the toughest problems within popular cloud and big data ecosystems. He specializes in data skills such as Airflow and DBT, as well as cloud automation skills like Terraform and Ansible. Brian's favorite projects are data and cloud migration projects, machine learning, automation, and even traditional software development.
Brian is now available for hire

Portfolio

  • PepsiCo Global - Main via Toptal
    SQL, Python, Dimensional Modeling, SQL Server Integration Services (SSIS)...
  • Netlify
    Data Build Tool (dbt), Spark, Databricks, SQL, Python, Scala, Ansible...
  • Thinkful
    Apache Airflow, Linux, Redshift, SQL, Bash, Python, Data Modeling...

Experience

Location

Meridian, ID, United States

Availability

Full-time

Preferred Environment

Apache Airflow, Data Build Tool (dbt), Amazon Web Services (AWS), Redshift, Snowflake, Databricks, Python, Java, SQL, Spark

The most amazing...

...project I'm proud of was architecting and implementing a modern cloud-based big data platform and data lake for the Social Security Administration.

Employment

  • Data Transformation Engineer

    2021 - 2022
    PepsiCo Global - Main via Toptal
    • Taught and mentored several PepsiCo contractors and employees across multiple teams on how to use dbt more efficiently.
    • Introduced and implemented CI/CD and Slim CI processes to the entire eCommerce division of Pepsi, which increased productivity by 800% and at the same time drastically reduced resources used in the data warehouse.
    • Introduced SQLFluff, a SQL linter, to all of Pepsi's eCommerce divisions. Linted around 700 different SQL dbt data models with over 20 linting rules. Got an entire eCommerce division using the same SQL dialect and SQL style.
    • Implemented incremental models or data loads in place of full data refreshes, reducing the time it took for bigger data pipelines to run and increasing developer productivity.
    • Introduced dbt best practices for data engineers, analytics engineers, and data analysts.
    Technologies: SQL, Python, Dimensional Modeling, SQL Server Integration Services (SSIS), Data Pipelines, Data Modeling, Data Build Tool (dbt), Snowflake, ETL, Data Engineering, Data Transformation, Data Warehouse Design, Database Architecture
  • Senior Data Engineer

    2020 - 2021
    Netlify
    • Maintained business-critical data pipelines which handled 2TB of data a day with over 60 different data pipelines.
    • Migrated the company's data process from ETL to ELT using DBT and Spark on Databricks. This simplified the number of programming languages used from four to two and allowed for easier debugging and idempotent data pipelines.
    • Introduced and productionized the use of Apache Airflow for data workflow orchestration and to kick off extractions and loads that previously were extremely tedious and difficult to monitor.
    • Switched from error-prone data extractions for some data sources over to Fivetran. After Fivetran was set up, introduced several new datasets, and integrated them into our data warehouse for analytical uses.
    • Authored and maintained big data compute jobs by leveraging PySpark and SparkSQL on Databricks and building out a data lakehouse in the process.
    • Secured data warehouse clusters by automating the creation of credentials through Ansible and Terraform.
    • Built a cost-of-goods data model that leveraged source billing data from major cloud providers into one conformed model in order to answer questions about the company's cloud spending.
    Technologies: Data Build Tool (dbt), Spark, Databricks, SQL, Python, Scala, Ansible, Terraform, Big Data, Redshift, BigQuery, Google Cloud Platform (GCP), DigitalOcean, ETL, ELT, Apache Airflow, Fivetran, PySpark, Business Intelligence (BI), Amazon Web Services (AWS), Data Migration, Salesforce API, Salesforce, REST APIs, Pandas, Dashboard Design, Dashboards, Data Analytics, Data Analysis, Data Warehouse Design, Database Architecture, MongoDB
  • Data Engineer Curriculum Writer (Subject Matter Expert)

    2020 - 2020
    Thinkful
    • Designed a 6-month curriculum plan for teaching data engineers.
    • Wrote core modules and led meetings to brainstorm the best topics to cover.
    • Created hands-on labs for students to go through to gain practical experience in the role of a data engineer.
    Technologies: Apache Airflow, Linux, Redshift, SQL, Bash, Python, Data Modeling, Amazon Web Services (AWS), Data Analytics, Data Analysis, PostgreSQL
  • Senior Data Engineer Contractor

    2020 - 2020
    Offer Up
    • Migrated dozens of data pipelines from Apache Airflow to Google Cloud Composer in a very short time span.
    • Assisted in a mass data warehouse migration from Snowflake to BigQuery for a total of 1PB (petabyte) of data being migrated between AWS and Google Cloud.
    • Reconciled data records in old and new data warehouses to ensure that the data pipelines worked as expected.
    • Rewrote analytical SQL queries that were Snowflake-specific to use BigQuery's syntax which involved lateral flattens and other types of complicated joins on nested data and semi-structured data.
    Technologies: Apache Airflow, BigQuery, Snowflake, Amazon S3 (AWS S3), Google Cloud Storage, Data Build Tool (dbt), Python, SQL, Fivetran, Amazon Web Services (AWS), Data Migration, Data Analytics, Data Analysis
  • Data Engineer

    2019 - 2020
    Brushfire
    • Built out a new data warehouse in AWS for analytical and data mining workloads (the first data warehouse of the company).
    • Implemented business intelligence and data visualizations for the company and client KPIs.
    • Built repeatable, scalable ETL data pipelines in the cloud.
    • Identified data performance issues in queries, indexes, data modeling, and so on.
    • Migrated from .NET to .NET Core, then transitioned the entire web stack from Azure VMs to Kubernetes (complete with cluster and pod autoscaling and CI/CD with Azure DevOps).
    Technologies: Redshift, Azure, Kubernetes, SQL Server DBA, SQL, C#, .NET, .NET Core, Azure DevOps, Data Pipelines, AWS Data Pipeline Service, Stripe, ETL, CI/CD Pipelines, Azure Kubernetes Service (AKS), Tableau, Python, Azure DevOps Services, Amazon Web Services (AWS), Web Scraping, T-SQL, Data Migration, Pandas, Dashboard Design, Dashboards, Data Analytics, Data Analysis, PostgreSQL, Data Warehouse Design, Database Architecture, MongoDB
  • Principal Big Data Consultant

    2014 - 2019
    zData, Inc
    • Architected big data and cloud solutions in a cloud-based enterprise Linux environment utilizing compute resources such as EC2, S3, AWS Auto Scaling, Lamba functions, DynamoDB, Couchbase, SQS, and Elastic MapReduce (EMR).
    • Automated repeatable deployments of big data software using Ansible, AWS CloudFormation, and Terraform.
    • Secured distributed clusters via security groups, firewalls, authorization and authentication policies. Kerberized Hadoop and other distributed clusters for strong authentication.
    • Led various software development projects in back-end web APIs and cloud-based web applications in Java, Python, and Elixir.
    • Architected distributed, fault-tolerant, and highly available systems using on-premise or cloud-based hardware.
    • Built a custom Apache Ambari stack containing installation and management capabilities for Pivotal Greenplum, Pivotal HAWQ, and Chorus in Python.
    • Developed a parallel backup and restore solution for large compute clusters in the AWS cloud in Java.
    Technologies: Hadoop, Spark, Apache ZooKeeper, Apache Hive, HBase, Apache Ambari, Data Lakes, Big Data Architecture, Data Lake Design, Greenplum, Redshift, Apache Kafka, Amazon EC2, Linux, Ansible, Bash, AWS CloudFormation, Terraform, Amazon S3 (AWS S3), Hortonworks Data Platform (HDP), Cloudera, EMR, Autoscaling, Couchbase, Amazon Simple Queue Service (SQS), Amazon DynamoDB, Kerberos, Java, Python, Elixir, MySQL, Amazon Web Services (AWS), Web Scraping, Data Migration, Salesforce API, Data Analytics, Data Analysis, PostgreSQL
  • DevOps Engineer

    2014 - 2014
    Melaleuca
    • Built an internal tool to dynamically discover WCF endpoints and make async requests to see if they are online or not. The tool was used to warm up the services to trigger JIT compilation and to actively monitor which endpoints were down.
    • Wrote a custom DevOps dashboard that the entire team used to monitor how deployments were going and the health of the cloud infrastructure.
    • Automated several complicated workflows in SharePoint to reduce the number of repetitive tasks for the team.
    Technologies: C#, Angular, JavaScript, HTML, CSS, Windows PowerShell, Bash, Azure, SharePoint, Windows Communication Framework (WCF), Amazon Web Services (AWS)
  • Firmware Engineer | Web Developer | Network Operations Manager | Senior Support Technician

    2007 - 2013
    Linora
    • Wrote the back-end and front-end code for router control panels and for "MeshView" which was a cloud-based (before the cloud) management portal for WiFi networks.
    • Managed a small team of network operators and support technicians to take inbound level I and level II calls; was responsible for hiring, firing, teaching, and scheduling the team.
    • Wrote firmware for WiFiRanger and BlueMesh Networks product firmware; worked with kernel modules, internal and external radios, USB modem connectivity, controlling remote radios over ethernet, failover, and failback logic.
    • Assisted in the management, monitoring, and software for monitoring over 10,000 IoT and routers in the field.
    Technologies: HTML, PHP, CSS, C, Bash, Linux, SSH, USB, Firmware, Linux Kernel Modules, WiFi, JavaScript, jQuery, MySQL, Web Scraping, PostgreSQL

Experience

  • Cost-of-goods Data Model
    http://netlify.com

    I built a data model that unified Google Cloud Platform, Amazon Web Services, Packet, Digital Ocean cloud invoices into a star schema dimensional model.

    I designed it from end to end and built all the data pipelines. This project was used internally to forecast the company's future cloud costs and to drill down into the specifics to see how money was being spent.

  • Big Data Platform
    https://www.ssa.gov/

    I architected, designed, and built a big data ecosystem nicknamed (BDP) for the United State's Social Security Administration. This was a Hortonworks stack on AWS 100% and automated by Ansible, CloudFormation, and Cloudbreak. The tech stack included Spark, Hadoop, Hive and more.

    Out of this came an idea of dynamic templating so several clusters could be spun up in parallel and used for development or production.

    The government had the most stringent requirements involving things being approved for FEDRAMPED and before using the cluster for official government use, we had to get the official "Authority to Operate" approval. This involved turning every security bell and whistle you could think of, including vormetric disk encryption, SSL on everything, security groups, firewalls, authorization, SSSD, Kerberos, RedHat IdM integration, cross-forest Active Directory trusts.

    After getting the ATO, we helped several teams within the SSA get onboarded to the BDP to use Hive and Apache Spark for various ML and analytical workloads. Our work was so successful, the NIH wanted to use our cluster for some work too.

    The ML work involved parameter tuning and installing TensorFlow on Spark, XGBoost, and other popular libraries.

  • Hadoop Data Lake

    I built a modern data lake on AWS for Time, Inc. This involved two Greenplum clusters and two Hadoop clusters and everything was automated with CloudFormation and Bash. Several clusters were also tied into the company's identity infrastructure.

  • Massive Data Warehouse Migration
    http://offerup.com

    I assisted in the migration of over 1PB from AWS to GCP. We migrated this data from S3 and Snowflake over to BigQuery and GCS. It took a total of ten engineers and three months. Dozens of data pipelines needed to be rewritten to work on Google's GCP Composer service (Google's hosted version of Airflow), tested, and deployed to production.

  • Virtual Machine to Kubernetes Migration
    http://brushfire.com

    I migrated an entire web stack hosted in Azure on Cloud Services (Classic) over to a new Kubernetes cluster running on Azure AKS. This involved finishing up the migration from .NET to .NET Core and then getting .NET Core to build on Linux containers instead of on Windows successfully.

    Finally, after getting the cluster pod autoscaling and horizontal pod autoscaling working correctly, I set up several CI /CD pipelines for production, staging, and development branches to automatically build and deploy the updated images to the production, staging, and development websites.

    Out of all this, the company saved about $2,000 per month on their hosting costs as the autoscaling was much more resilient, and the new cluster was cheaper than paying for the raw VMs.

  • Empire Unmanned
    https://www.empireunmanned.com/

    I built a modern web app for Empire Unmanned. This web app was responsible for allowing field pilot drones to upload their drone videos to the cloud to be processed for aerial drone inspections and mapping. In addition to being a back-end developer on the project, I was the engagement manager on the project and led a small team of engineers to complete the project.

  • Modern API Rewrite
    https://www.recallinfolink.com/

    I led a small team of software engineers to build out a small API written in Phoenix to replace a very old Perl-based system on its last legs. We got a basic API spec from the client and had to implement their needs while ensuring the API met the client's demands and was properly tested.

Skills

  • Languages

    Python, SQL, Bash, Snowflake, Java, C, Scala, C#, Elixir, HTML, PHP, CSS, JavaScript, R, T-SQL
  • Frameworks

    Spark, .NET, .NET Core, Hadoop, Angular, Windows PowerShell, Windows Communication Framework (WCF), Phoenix
  • Libraries/APIs

    PySpark, REST APIs, Pandas, Stripe, jQuery, React
  • Tools

    Apache Airflow, Ansible, Terraform, BigQuery, Azure Kubernetes Service (AKS), Tableau, Azure DevOps Services, Apache ZooKeeper, Apache Ambari, AWS CloudFormation, Cloudera, Amazon Simple Queue Service (SQS), Amazon ECS (Amazon Elastic Container Service), Amazon EBS, Redshift Spectrum, AWS Glue, Amazon Athena
  • Paradigms

    ETL, Agile, Scrum, Business Intelligence (BI), Azure DevOps, Test-driven Development (TDD), Dimensional Modeling
  • Platforms

    Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), DigitalOcean, Azure, Kubernetes, Linux, Apache Kafka, Amazon EC2, Hortonworks Data Platform (HDP), SharePoint, Salesforce, AWS Kinesis
  • Storage

    Data Pipelines, PostgreSQL, Database Architecture, Redshift, MySQL, MongoDB, SQL Server DBA, AWS Data Pipeline Service, Amazon S3 (AWS S3), Google Cloud Storage, Apache Hive, HBase, Data Lakes, Data Lake Design, Greenplum, Couchbase, Amazon DynamoDB, Google Cloud, SQL Server Integration Services (SSIS)
  • Other

    Big Data, ELT, CI/CD Pipelines, Data Architecture, Data Engineering, Data Analytics, Data Analysis, Data Warehouse Design, Data Build Tool (dbt), Data Modeling, Data Migration, Dashboard Design, Dashboards, Software Development Lifecycle (SDLC), Programming, Quantum Computing, Ethics, Algorithms, Fivetran, Big Data Architecture, EMR, Autoscaling, Kerberos, SSH, USB, Firmware, Linux Kernel Modules, WiFi, Active Directory Federation, Vormetric, Web Security, Cloud Architecture, Mainframe, Web Scraping, Data Transformation

Education

  • Bachelor's Degree in Computer Science
    2009 - 2016
    Boise State University - Boise, ID, United States

Certifications

  • AWS Certified Solutions Architect Associate
    JULY 2022 - JULY 2024
    AWS

To view more profiles

Join Toptal
Share it with others