Festus Asare Yeboah, Data Engineer and Developer in Plano, TX, United States
Festus Asare Yeboah

Data Engineer and Developer in Plano, TX, United States

Member since April 2, 2020
Festus is a data and machine learning engineer with ‚Äčin-depth, hands-on technical expertise in architecting data pipelines. He excels at the design and implementation of big data technologies (Spark, Kafka, data lakes) and has a proven track record in consulting on architecture design and implementation.
Festus is now available for hire

Portfolio

  • Databricks
    Azure, Amazon Web Services (AWS), Scala, Python, AWS, SQL, Spark
  • Copart
    Azure, Apache Kafka, Pentaho, Python, SQL
  • Brocks Solution
    Azure, DataWare, SQL, Python

Experience

Location

Plano, TX, United States

Availability

Part-time

Preferred Environment

Data Lakes, Data Warehouse Design, Data Warehousing, Machine Learning, Spark

The most amazing...

...thing I've built is a data engineering pipeline that streams data from an IoT device like a bag scanner in an airport to a data lake.

Employment

  • Data/ML Engineer

    2019 - PRESENT
    Databricks
    • Developed an application to store and track the changes in the hyperparameters used in training models as well as the data the model was trained on. This application saves model metadata and provides access to them using API calls.
    • Built an optical character recognition pipeline that converted images to a table.
    • Increased querying performance of a 75TB data lake table. The reports that pulled from this table had an SLA of 30 seconds. By applying Spark performance tuning techniques, I was able to make the query time to less than five seconds.
    Technologies: Azure, Amazon Web Services (AWS), Scala, Python, AWS, SQL, Spark
  • Senior Data Engineer

    2017 - 2018
    Copart
    • Developed a real-time data pipeline to move application logs to a more consumable form for reporting.
    • Built a global data warehouse to serve as a single source of truth for company-wide open operational metrics.
    • Migrated the company's ETL architecture to the cloud.
    Technologies: Azure, Apache Kafka, Pentaho, Python, SQL
  • Software Developer

    2015 - 2018
    Brocks Solution
    • Developed a real-time data pipeline to stream data from IoT devices (bag tag scanners) at airports to create baggage handling reports for business executives.
    • Led the implementation of analytics into the company's enterprise baggage handling system. software.
    • Created dashboards to report data on baggage handling operations.
    Technologies: Azure, DataWare, SQL, Python

Experience

  • Pipeline Medical Records into a Scalable Data Store (Development)

    A company receives patient medical records in three formats (XML, CSV, and text format) and develops reports on them. This company initially had pipelines to move this data into a data warehouse using SSIS. However, this pipeline could not scale, it was slow and very complex so troubleshooting took hours They needed a new solution that could scale and was 100 percent in the cloud.

    Using AWS Kinesis, Lambda, Airflow, and data bricks, I was able to rearchitect their pipeline to a simpler, scalable one. The pipeline improved from running in 30 minutes to running in two minutes.

  • Optimize Data Reads from a 75TB Data Lake (Development)

    A data lake with data up to 65TB data served as a data source for a forecasting model. Queries to this data lake were slow and were not meeting the 15-second business SLA. The project requirement was to analyze the queries and the data lake architecture to determine opportunities for optimization. After the project, I was able to bring down query time from over five minutes to less than three seconds.

  • Meta Store for ML Model Training (Development)

    The client I was working with needed to track all the information that went into training a model. This included the training dataset, the hyper-parameters used, as well as output parameters from the model.

    I developed a library that saved all the model metadata to a data store and made it accessible through an API endpoint.

Skills

  • Languages

    SQL, Python 3, Python, Scala
  • Frameworks

    Spark, AWS EMR
  • Platforms

    Databricks, Apache Kafka, Azure, Azure Event Hubs, Pentaho, Amazon Web Services (AWS)
  • Other

    Data Warehouse Design, Machine Learning, Data Engineering, Azure Data Factory, Lambda Functions, Data Warehousing, AWS
  • Tools

    Apache Airflow
  • Paradigms

    ETL
  • Storage

    Data Lakes, DataWare

Education

  • Master's degree in Machine Learning
    2017 - 2019
    Southern Methodist University - Dallas, TX, USA
  • Bachelor's degree in Aerospace Engineering
    2010 - 2010
    Kwame Nkrumah University of Science and Technology - Kumasi, Ghana

Certifications

  • Spark Certification
    FEBRUARY 2020 - PRESENT
    Databricks

To view more profiles

Join Toptal
Share it with others