Festus Asare Yeboah, Data Engineer and Developer in Plano, TX, United States
Festus Asare Yeboah

Data Engineer and Developer in Plano, TX, United States

Member since May 14, 2020
Festus is a data and machine learning engineer with ‚Äčin-depth, hands-on technical expertise in data pipeline architecture. He excels at the design and implementation of big data technologies (Spark, Kafka, data lakes) and has a proven track record in consulting on architecture design and implementation.
Festus is now available for hire


  • Google
    Spark, Google Cloud Platform (GCP), Databricks, Google BigQuery
  • Databricks
    Azure, Scala, Python, AWS, SQL, Spark
  • Copart
    Azure, Apache Kafka, Pentaho, Python, SQL



Plano, TX, United States



Preferred Environment

Data Lakes, Data Warehouse Design, Data Warehousing, Machine Learning, Spark

The most amazing...

...thing I've built is a data engineering pipeline that streams data from an IoT device like a bag scanner in an airport to a data lake.


  • ML/Data Engineer

    2020 - PRESENT
    • Helped customers migrate their data pipelines from on-prem to the Google Cloud Platform.
    • Migrated ETL pipelines from AWS and Azure to Google Cloud.
    • Collaborated with Data Scientists to develop Machine Learning Operations based on trained models.
    Technologies: Spark, Google Cloud Platform (GCP), Databricks, Google BigQuery
  • Data/ML Engineer

    2019 - 2020
    • Developed an app to store and track changes in the hyperparameters used in training models and the data utilized to train the models. This application saves model metadata and provides access to them using API calls.
    • Built an optical character recognition pipeline that converted images to a table.
    • Increased querying performance of a 75TB data lake table. The reports pulled from this table had an SLA of 30 seconds. By applying Spark performance tuning techniques, I decreased the query time to less than five seconds.
    Technologies: Azure, Scala, Python, AWS, SQL, Spark
  • Senior Data Engineer

    2017 - 2018
    • Developed a real-time data pipeline to move application logs to a more consumable form for reporting.
    • Built a global data warehouse to serve as a single source of truth for company-wide open operational metrics.
    • Migrated the company's ETL architecture to the cloud.
    Technologies: Azure, Apache Kafka, Pentaho, Python, SQL
  • Software Developer

    2015 - 2018
    Brocks Solution
    • Developed a real-time data pipeline to stream data from IoT devices (bag tag scanners) at airports to create baggage handling reports for business executives.
    • Led the implementation of analytics into the company's enterprise baggage handling system. software.
    • Created dashboards to report data on baggage handling operations.
    Technologies: Azure, DataWare, SQL, Python


  • Pipeline Medical Records into a Scalable Data Store

    A company receives patient medical records in three formats (XML, CSV, and text format) and develops reports on them. This company initially had pipelines to move this data into a data warehouse using SSIS. However, this pipeline could not scale, it was slow and very complex so troubleshooting took hours They needed a new solution that could scale and was 100 percent in the cloud.

    Using AWS Kinesis, Lambda, Airflow, and data bricks, I was able to rearchitect their pipeline to a simpler, scalable one. The pipeline improved from running in 30 minutes to running in two minutes.

  • Optimize Data Reads from a 75TB Data Lake

    A data lake with data up to 65TB data served as a data source for a forecasting model. Queries to this data lake were slow and were not meeting the 15-second business SLA. The project requirement was to analyze the queries and the data lake architecture to determine opportunities for optimization. After the project, I was able to bring down query time from over five minutes to less than three seconds.

  • Meta Store for ML Model Training

    The client I was working with needed to track all the information that went into training a model. This included the training dataset, the hyper-parameters used, as well as output parameters from the model.

    I developed a library that saved all the model metadata to a data store and made it accessible through an API endpoint.


  • Languages

    SQL, Python 3, Python, Scala
  • Frameworks

    Spark, AWS EMR
  • Platforms

    Databricks, Apache Kafka, Azure, Azure Event Hubs, Pentaho, Amazon Web Services (AWS), Google Cloud Platform (GCP)
  • Other

    Data Warehouse Design, Machine Learning, Data Engineering, Azure Data Factory, Lambda Functions, Data Warehousing, AWS, Google BigQuery
  • Tools

    Apache Airflow, BigQuery
  • Paradigms

  • Storage

    Data Lakes, DataWare


  • Master's Degree in Machine Learning
    2017 - 2019
    Southern Methodist University - Dallas, TX, USA
  • Bachelor's Degree in Aerospace Engineering
    2010 - 2010
    Kwame Nkrumah University of Science and Technology - Kumasi, Ghana


  • Spark Certification

To view more profiles

Join Toptal
Share it with others