Festus Asare Yeboah, Developer in Plano, TX, United States
Festus is available for hire
Hire Festus

Festus Asare Yeboah

Verified Expert  in Engineering

Data Engineer and Developer

Plano, TX, United States

Toptal member since May 14, 2020

Bio

Festus is a data and machine learning engineer with ​in-depth, hands-on technical expertise in data pipeline architecture. He excels at the design and implementation of big data technologies (Spark, Kafka, data lakes) and has a proven track record in consulting on architecture design and implementation.

Portfolio

Google
Spark, Google Cloud Platform (GCP), Databricks, Google BigQuery
Databricks
Azure, Scala, Python, Amazon Web Services (AWS), SQL, Spark
Copart
Azure, Apache Kafka, Pentaho, Python, SQL

Experience

  • SQL - 6 years
  • Data Warehouse Design - 5 years
  • Azure - 4 years
  • Python 3 - 4 years
  • Data Engineering - 3 years
  • Databricks - 3 years
  • Machine Learning - 3 years
  • Azure Data Factory - 2 years

Availability

Part-time

Preferred Environment

Data Lakes, Data Warehouse Design, Data Warehousing, Machine Learning, Spark

The most amazing...

...thing I've built is a data engineering pipeline that streams data from an IoT device like a bag scanner in an airport to a data lake.

Work Experience

ML/Data Engineer

2020 - PRESENT
Google
  • Helped customers migrate their data pipelines from on-prem to the Google Cloud Platform.
  • Migrated ETL pipelines from AWS and Azure to Google Cloud.
  • Collaborated with Data Scientists to develop Machine Learning Operations based on trained models.
Technologies: Spark, Google Cloud Platform (GCP), Databricks, Google BigQuery

Data/ML Engineer

2019 - 2020
Databricks
  • Developed an app to store and track changes in the hyperparameters used in training models and the data utilized to train the models. This application saves model metadata and provides access to them using API calls.
  • Built an optical character recognition pipeline that converted images to a table.
  • Increased querying performance of a 75TB data lake table. The reports pulled from this table had an SLA of 30 seconds. By applying Spark performance tuning techniques, I decreased the query time to less than five seconds.
Technologies: Azure, Scala, Python, Amazon Web Services (AWS), SQL, Spark

Senior Data Engineer

2017 - 2018
Copart
  • Developed a real-time data pipeline to move application logs to a more consumable form for reporting.
  • Built a global data warehouse to serve as a single source of truth for company-wide open operational metrics.
  • Migrated the company's ETL architecture to the cloud.
Technologies: Azure, Apache Kafka, Pentaho, Python, SQL

Software Developer

2015 - 2018
Brocks Solution
  • Developed a real-time data pipeline to stream data from IoT devices (bag tag scanners) at airports to create baggage handling reports for business executives.
  • Led the implementation of analytics into the company's enterprise baggage handling system. software.
  • Created dashboards to report data on baggage handling operations.
Technologies: Azure, DataWare, SQL, Python

Pipeline Medical Records into a Scalable Data Store

A company receives patient medical records in three formats (XML, CSV, and text format) and develops reports on them. This company initially had pipelines to move this data into a data warehouse using SSIS. However, this pipeline could not scale, it was slow and very complex so troubleshooting took hours They needed a new solution that could scale and was 100 percent in the cloud.

Using AWS Kinesis, Lambda, Airflow, and data bricks, I was able to rearchitect their pipeline to a simpler, scalable one. The pipeline improved from running in 30 minutes to running in two minutes.

Optimize Data Reads from a 75TB Data Lake

A data lake with data up to 65TB data served as a data source for a forecasting model. Queries to this data lake were slow and were not meeting the 15-second business SLA. The project requirement was to analyze the queries and the data lake architecture to determine opportunities for optimization. After the project, I was able to bring down query time from over five minutes to less than three seconds.

Meta Store for ML Model Training

The client I was working with needed to track all the information that went into training a model. This included the training dataset, the hyper-parameters used, as well as output parameters from the model.

I developed a library that saved all the model metadata to a data store and made it accessible through an API endpoint.
2017 - 2019

Master's Degree in Machine Learning

Southern Methodist University - Dallas, TX, USA

2010 - 2010

Bachelor's Degree in Aerospace Engineering

Kwame Nkrumah University of Science and Technology - Kumasi, Ghana

FEBRUARY 2020 - PRESENT

Spark Certification

Databricks

Tools

Amazon Elastic MapReduce (EMR), Apache Airflow, BigQuery

Languages

SQL, Python 3, Python, Scala

Frameworks

Spark

Platforms

Databricks, Apache Kafka, Azure, Azure Event Hubs, Pentaho, Amazon Web Services (AWS), Google Cloud Platform (GCP)

Paradigms

ETL

Storage

Data Lakes, DataWare

Other

Data Warehouse Design, Machine Learning, Data Engineering, Azure Data Factory, Lambda Functions, Data Warehousing, Google BigQuery

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring