Alex is available for hire

Alex Clark

Verified Expert in Engineering

Data Engineer and Developer

Spokane, WA, United States

Toptal member since November 14, 2022

Expertise

Data Science Data Engineering Data Analysis Business Intelligence Development Data Warehouse Big Data Architecture RDBMS SQL ETL Python Database JSON Hadoop

Bio

Alex is a senior data engineer with 10+ years of experience designing and building scalable data pipelines and analytics platforms. He specializes in data modeling, distributed systems, and cloud technologies such as AWS, delivering reliable, high-quality datasets that enable data-driven product and business decisions.

Portfolio

ProjectPro

Amazon Athena, Amazon Elastic MapReduce (EMR), Amazon RDS, Amazon QuickSight...

Experience

Data Analytics - 10 years
Python - 10 years
SQL - 10 years
Data Pipelines - 10 years
Data Architecture - 10 years
Hadoop - 5 years
Amazon Elastic MapReduce (EMR) - 5 years
PySpark - 4 years

Preferred Environment

Python 3, Amazon Elastic MapReduce (EMR), Apache Hive, Amazon DynamoDB, SQL, Amazon Web Services (AWS), Amazon Redshift, Apache Airflow, MongoDB, Apache Spark

The most amazing...

...system I built enabled platform-wide cost attribution, bringing transparency to infrastructure usage across teams.

Work Experience

Freelance Data Engineer

2022 - PRESENT

ProjectPro

Designed and implemented real-time and batch data pipelines using Python and AWS services, integrating data from APIs, application databases, and third-party sources.
Built analytics-ready datasets and data models to support reporting, dashboards, and metric-driven decision-making.
Developed interactive dashboards and visualization layers to surface actionable insights for business stakeholders.
Designed and maintained ETL workflows and automation to ensure data reliability, consistency, and scalability.

Technologies: Amazon Athena, Amazon Elastic MapReduce (EMR), Amazon RDS, Amazon QuickSight, Microsoft Power BI, Data Pipelines, AWS Lambda, AWS CloudFormation, Apache Hive, Amazon EC2, Amazon S3 (AWS S3), Apache Flink, AWS Step Functions, AWS Glue, Apache Spark, MySQL, PostgreSQL

Senior Data Engineer

2024 - 2025

Data Engineer

2023 - 2023

D2 Nova

Optimized SQL queries and data storage strategies (partitioning, indexing), improving performance by up to 99% and significantly reducing compute costs.
Designed and implemented a MongoDB database with a lightweight Flask-based front end. Developed a custom, low-latency search solution supporting multilingual partial text matching, enhancing usability and performance.
Delivered robust ETL solutions and improved analytical workflows for clients across multiple domains.

Technologies: SQL, MySQL, NoSQL, Data Architecture, Amazon RDS, Amazon Aurora, MongoDB, Flask, Amazon EC2, Amazon S3 (AWS S3)

Data Engineer

2016 - 2021

Amazon.com

Built and maintained large-scale data pipelines and analytics datasets supporting product insights, customer behavior analysis, and business reporting.
Processed large-scale clickstream data to generate behavioral insights, enabling product and marketing teams to better understand user engagement and conversion.
Developed internal API-based billing systems and analytics datasets/dashboards to track product performance and support financial and operational reporting.
Provided subject matter expertise on platform data and collaborated with cross-functional teams to deliver business-critical solutions.
Partnered with product, analytics, and business teams to define metrics, design data models, and deliver actionable insights.
Built attribution models linking customer behavior to revenue and retention, establishing foundational metrics used for business and product decision-making.
Managed relational and large-scale data systems for analytics and reporting.
Automated log parsing and reporting pipelines to provide timely, reliable business insights.

Technologies: Data Pipelines, Linux, Hadoop, Big Data Architecture, BI Reporting, Analytics, Attribution Modeling, Marketing Attribution, Amazon Elastic MapReduce (EMR), Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, Databases, PostgreSQL, Cron, Data Visualization, Scripting, CSV File Processing, Data Modeling, Scala, Spark, Stored Procedure, Redshift, AWS Glue, Amazon RDS, SQL, Business Intelligence (BI), BI Reports, Dashboards, Data, Metrics, Big Data, User-defined Functions (UDF), Oracle, Dashboard Design, ETL, Pipelines, Conda, PIP, PyCharm, RDBMS, Amazon QuickSight, JSON, APIs, Database Administration (DBA), Amazon Web Services (AWS), Database Optimization, Pandas, GitHub, NumPy, AWS Lambda, Boto, AWS Step Functions, MongoDB, Relational Databases, MySQL, EMR, Amazon EC2, Amazon S3 (AWS S3), Data Warehousing, Predictive Modeling, Amazon Redshift

Business Systems Analyst

2013 - 2016

Liberty Mutual Insurance

Created an automated process to build and maintain 24 data sets in a centralized location.
Delivered presentations to educate SAS users about data sets and their analytical potential.
Facilitated biweekly meetings with stakeholders to improve the usability and integrity of data sets.
Leveraged SAS and Teradata to efficiently execute numerous ad hoc requests.
Developed SQL queries and VBA macros to streamline monthly reporting.
Built a Microsoft Access database and VBA scripts to automate the production of a weekly status report.

Technologies: SAS, ETL, Data Architecture, Teradata, Business Requirements, Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, Databases, Data Visualization, CSV File Processing, Data Modeling, SQL, BI Reports, Dashboards, Data, Metrics, User-defined Functions (UDF), Dashboard Design, Python, RDBMS, JSON, Database Administration (DBA), Relational Databases, Data Warehousing

Data Analyst

2011 - 2013

Efinancial

Presented complex analyses to upper management, driving high-level decision-making.
Collaborated with the analytics team to develop a calling strategy which led to a 50% increase in sales.
Automated the production of weekly scorecards and reports using SQL and VBA.
Wrote SQL queries and performed data analysis to aid in the development of monthly and/or weekly goals.

Technologies: Microsoft SQL Server, Python, ETL, Data Analytics, Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, Databases, Data Visualization, CSV File Processing, SQL, Dashboards, Data, Metrics, Dashboard Design, Pipelines, PIP, RDBMS, Database Administration (DBA), Database Optimization, Relational Databases, Data Warehousing

Experience

Page-level Metrics & Multi-touch Attribution

At Amazon, I designed and implemented a multi-touch attribution (MTA) pipeline to assign credit to individual pages for purchases, complementing the existing content-level attribution model. The pipeline processed billions of rows across web hits, orders, products, and page-hierarchy tables, handling complex relationships, including ASIN variants, browse-tree hierarchies, and multi-item orders. I developed logic to ensure relevance, assigning credit only to products linked to the pages visited within a 24-hour window. The resulting analytics datasets included CTR, retention, and page-level MTA metrics, enabling product teams to comprehensively measure page performance. The pipeline was partitioned by marketplace and date, optimized for large-scale data in AWS EMR and Athena, and provided a repeatable, reliable framework for ongoing page-level analysis.

Internal Billing System

Led the design and implementation of an internal API-based billing system to track platform usage and allocate infrastructure costs. Collaborated with cross-functional teams to identify cost drivers across multiple services and developed a dynamic pricing model that accounted for compute, storage, and API usage. Built end-to-end data pipelines and analytics dashboards to support transparency, reporting, and operational decision-making, delivering the system on time while ensuring accuracy and scalability.

Content Platform ETL & Analytics Pipeline

At Amazon, I designed and implemented a scalable ETL pipeline for the Content Platform, which handles images, videos, widgets, and other retail site content across four regions (NA, EU, CN, FE). I consolidated nine versioned (Type 2) source tables into analytics-optimized, regionally partitioned Parquet datasets with Snappy compression. Each day, the pipeline automates ingestion, pivoting of key-value placement properties, version selection, and joins, producing one denormalized table with the most frequently queried fields. This enabled teams to quickly answer questions about impressions, click-through rates, placement attribution, and content performance, without querying multiple regions or spinning up expensive Redshift resources. The solution processed hundreds of terabytes of JSON data via AWS EMR, Hive, and Athena, and was eventually integrated into the Data Warehouse. This project improved query performance, reduced operational complexity, and made large-scale content analytics accessible to product, analytics, and business teams.

Education

2014 - 2016

Master's Degree in Business Analytics & Data Science

Bentley University - Waltham, MA, USA

2007 - 2010

Bachelor's Degree in Accounting

Central Washington University - Ellensburg, WA, USA

Skills

Libraries/APIs

PySpark, Pandas, NumPy

Tools

Amazon Athena, Amazon CloudWatch, Amazon Elastic MapReduce (EMR), AWS Glue, Cron, Boto, AWS Step Functions, Amazon Simple Queue Service (SQS), PyCharm, Amazon QuickSight, GitHub, Microsoft Power BI, AWS CloudFormation, Apache Airflow, AWS Cloud Development Kit (CDK)

Languages

SQL, Python, Stored Procedure, SAS, Scala

Paradigms

ETL, Business Intelligence (BI)

Storage

Apache Hive, Databases, PostgreSQL, Redshift, RDBMS, JSON, Database Administration (DBA), Relational Databases, MySQL, Data Pipelines, Teradata, Amazon S3 (AWS S3), Amazon DynamoDB, Microsoft SQL Server, MongoDB, NoSQL, Amazon Aurora

Frameworks

Hadoop, Spark, Flask, Apache Spark

Platforms

AWS Lambda, Linux, Oracle, Amazon Web Services (AWS), Amazon EC2, Apache Flink

Other

Information Systems, Data Architecture, EMR, Data Analytics, Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, CSV File Processing, Data Modeling, Data, Metrics, Big Data, Pipelines, Conda, PIP, APIs, Data Warehousing, Big Data Architecture, BI Reporting, Analytics, Business Requirements, Data Visualization, Scripting, Amazon RDS, BI Reports, Dashboards, Database Optimization, Amazon Redshift, Machine Learning, Statistics, Time Series Analysis, Optimization, Attribution Modeling, Marketing Attribution, Web Analytics, IT Project Management, User-defined Functions (UDF), Dashboard Design, Predictive Modeling, Orchestration, Key Performance Indicators (KPIs), Data Quality, Data Orchestration

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring

Alex Clark

Verified Expert in Engineering

Portfolio

Experience

Preferred Environment

The most amazing...

Work Experience

Freelance Data Engineer

ProjectPro

Senior Data Engineer

Meta

Data Engineer

D2 Nova

Data Engineer

Amazon.com

Business Systems Analyst

Liberty Mutual Insurance

Data Analyst

Efinancial

Experience

Page-level Metrics & Multi-touch Attribution

Internal Billing System

Content Platform ETL & Analytics Pipeline

Education

Master's Degree in Business Analytics & Data Science

Bachelor's Degree in Accounting

Skills

Libraries/APIs

Tools

Languages

Paradigms

Storage

Frameworks

Platforms

Other

How to Work with Toptal

Share your needs

Choose your talent

Start your risk-free talent trial