Alex is available for hire

Alex Clark

Verified Expert in Engineering

Data Engineer and Developer

Location

Seattle, WA, United States

Toptal Member Since

November 14, 2022

Alex is an innovative and experienced big data engineer, skilled in a wide variety of tools and technologies. He is competent in all aspects of the data science process, including data ingestion, storage, transformation, and statistical modeling. Alex has a proven track record of thoughtfully dissecting business problems, vetting requirements, and then designing the processes to address them.

Data Analytics Data Engineering Data Analysis Data Cleansing Data Profiling Metrics Data Warehousing Big Data RDBMS Relational Databases SQL ETL Python Databases JSON Data Cleaning Teradata AWS Athena

Portfolio

D2 Nova

SQL, MySQL, NoSQL, Data Architecture, Amazon RDS, Amazon Aurora, MongoDB, Flask...

ProjectPro

Amazon Athena, Amazon Elastic MapReduce (EMR), Amazon RDS, Amazon QuickSight...

Amazon.com

Data Pipelines, Linux, Hadoop, Big Data Architecture, BI Reporting, Analytics...

Experience

Data Analytics - 10 years Python - 10 years SQL - 10 years Data Pipelines - 10 years Data Architecture - 10 years Hadoop - 5 years Amazon Elastic MapReduce (EMR) - 5 years PySpark - 4 years

Availability

Part-time

Preferred Environment

Python 3, Amazon Elastic MapReduce (EMR), Apache Hive, PySpark, AWS Lambda, Amazon CloudWatch, Amazon DynamoDB, Amazon Simple Queue Service (SQS), SQL, Databases, Amazon Web Services (AWS)

The most amazing...

...solution I've developed is an internal API-based billing system.

Work Experience

Data Engineer

2023 - PRESENT

D2 Nova

Evaluated the client's SQL database and provided recommendations for improving design and query performance.
Implemented MySQL table partitioning and composite indexes, which improved query performance from 2,240 milliseconds to 10 milliseconds.
Designed and implemented a MongoDB database with a lightweight front end built in Flask. Created a custom search functionality that allows users to search for partial text matches in several languages while maintaining low latency.

Technologies: SQL, MySQL, NoSQL, Data Architecture, Amazon RDS, Amazon Aurora, MongoDB, Flask, Amazon EC2, Amazon S3 (AWS S3)

Freelance Data Engineer

2022 - PRESENT

ProjectPro

Developed a CDK project that deploys data to S3, creates a data pipeline in EMR, and transforms the data using Hive. Connected the EMR cluster to Microsoft Power BI and created data visualizations.
Architected a CDK project/data pipeline that extracts data from a SQL database in RDS, loads incremental data from an API using AWS Lambda, and transforms data using Spark.
Created detailed written and video documentation for each project.

Technologies: Amazon Athena, Amazon Elastic MapReduce (EMR), Amazon RDS, Amazon QuickSight, Microsoft Power BI, Data Pipelines, AWS Lambda, AWS CloudFormation, Spark, Apache Hive, Amazon EC2, Amazon S3 (AWS S3)

Data Engineer

2016 - 2021

Amazon.com

Gathered data from disparate sources to identify and deprecate low-performing content.
Collaborated with stakeholders, software engineers, and managers to design and construct an internal API-based billing system.
Developed an API pricing model for the platform's primary services.
Built an attribution model to assign customer orders to customer actions.
Organized and maintained data storage systems, including relational databases, big data systems, and serverless technologies.
Constructed data pipelines and managed ETL processes.
Developed metrics to analyze and report on page performance and customer retention.
Created custom map-reduce jobs to parse complex, high-volume data structures.

Technologies: Data Pipelines, Linux, Hadoop, Big Data Architecture, BI Reporting, Analytics, Attribution Modeling, Marketing Attribution, Amazon Elastic MapReduce (EMR), Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, Databases, PostgreSQL, Cron, Data Visualization, Scripting, CSV File Processing, Data Modeling, Scala, Spark, Stored Procedure, Redshift, AWS Glue, Amazon RDS, SQL, Business Intelligence (BI), BI Reports, Dashboards, Data, Metrics, Big Data, User-defined Functions (UDF), Oracle, Dashboard Design, ETL, Pipelines, Conda, PIP, PyCharm, RDBMS, Amazon QuickSight, JSON, APIs, Database Administration (DBA), Amazon Web Services (AWS), Database Optimization, Pandas, GitHub, NumPy, AWS Lambda, Boto, AWS Step Functions, MongoDB, Relational Databases, MySQL, EMR, Amazon EC2, Amazon S3 (AWS S3), Data Warehousing, Predictive Modeling

Business Systems Analyst

2013 - 2016

Liberty Mutual Insurance

Created an automated process to build and maintain 24 data sets in a centralized location.
Delivered presentations to educate SAS users about data sets and their analytical potential.
Facilitated biweekly meetings with stakeholders to improve the usability and integrity of data sets.
Leveraged SAS and Teradata to efficiently execute numerous ad hoc requests.
Developed SQL queries and VBA macros to streamline monthly reporting.
Built a Microsoft Access database and VBA scripts to automate the production of a weekly status report.

Technologies: SAS, ETL, Data Architecture, Teradata, Business Requirements, Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, Databases, Data Visualization, CSV File Processing, Data Modeling, SQL, BI Reports, Dashboards, Data, Metrics, User-defined Functions (UDF), Dashboard Design, Python, RDBMS, JSON, Database Administration (DBA), Relational Databases, Data Warehousing

Data Analyst

2011 - 2013

Efinancial

Presented complex analyses to upper management, driving high-level decision-making.
Collaborated with the analytics team to develop a calling strategy which led to a 50% increase in sales.
Automated the production of weekly scorecards and reports using SQL and VBA.
Wrote SQL queries and performed data analysis to aid in the development of monthly and/or weekly goals.

Technologies: Microsoft SQL Server, Python, ETL, Data Analytics, Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, Databases, Data Visualization, CSV File Processing, SQL, Dashboards, Data, Metrics, Dashboard Design, Pipelines, PIP, RDBMS, Database Administration (DBA), Database Optimization, Relational Databases, Data Warehousing

Experience

Page-to-order Attribution Model

Gathered data from disparate sources and collaborated with key stakeholders to develop a page-to-order attribution model. The fundamental data source for this project was a large retail website traffic data. I used this web data to track customer actions across pages until a purchase action, then attributed portions of the purchase value to the pages visited. This big data processing was primarily performed within AWS EMR and AWS Athena and orchestrated using AWS Lambda, Step Functions, SQS and CloudWatch.

Internal Billing System

Collaborated with stakeholders, software engineers, and managers to design and construct an internal API-based billing system. This project involved determining a pricing model for our platform, identifying data sources, and defining the data pipeline and ETL processes.

Skills

Languages

SQL, Python, Stored Procedure, SAS, Scala

Tools

Amazon Athena, Amazon CloudWatch, Amazon Elastic MapReduce (EMR), AWS Glue, Cron, Boto, AWS Step Functions, Amazon Simple Queue Service (SQS), PyCharm, Amazon QuickSight, GitHub, Microsoft Power BI, AWS CloudFormation

Paradigms

ETL, Business Intelligence (BI)

Storage

Apache Hive, Databases, PostgreSQL, Redshift, RDBMS, JSON, Database Administration (DBA), Relational Databases, MySQL, Data Pipelines, Teradata, Amazon S3 (AWS S3), Amazon DynamoDB, Microsoft SQL Server, MongoDB, NoSQL, Amazon Aurora

Other

Information Systems, Data Architecture, EMR, Data Analytics, Datasets, Data Engineering, Data Analysis, Data Cleansing, Data Profiling, CSV File Processing, Data Modeling, Data, Metrics, Big Data, Pipelines, Conda, PIP, APIs, Data Warehousing, Big Data Architecture, BI Reporting, Analytics, Business Requirements, Data Visualization, Scripting, Amazon RDS, BI Reports, Dashboards, Database Optimization, Machine Learning, Statistics, Time Series Analysis, Optimization, Attribution Modeling, Marketing Attribution, Web Analytics, IT Project Management, User-defined Functions (UDF), Dashboard Design, Predictive Modeling

Frameworks

Hadoop, Spark, Flask

Libraries/APIs

PySpark, Pandas, NumPy

Platforms

AWS Lambda, Linux, Oracle, Amazon Web Services (AWS), Amazon EC2

Education

2014 - 2016

Master's Degree in Business Analytics & Data Science

Bentley University - Waltham, MA, USA

2007 - 2010

Bachelor's Degree in Accounting

Central Washington University - Ellensburg, WA, USA

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring