Abdullah Tarek Farag, Developer in Cairo, Cairo Governorate, Egypt
Abdullah is available for hire
Hire Abdullah

Abdullah Tarek Farag

Verified Expert  in Engineering

Data Scientist and Developer

Location
Cairo, Cairo Governorate, Egypt
Toptal Member Since
March 11, 2022

Abdullah Tarek is a data scientist and engineer with more than four years of experience in data engineering, machine learning, data science, data analysis, computer vision, and NLP. He helps businesses build AI systems to improve their revenue, enhance operations, and build and deploy AI products. Abdullah Tarek is looking for interesting AI projects to work on across the globe using data science.

Portfolio

Inkitt
Python, Recommendation Systems, SQL, Data Science, Redshift, A/B Testing...
Malbek
Python, Data Science, Generative Pre-trained Transformers (GPT)...
Capiter
Python, Data Science, Docker, Software Engineering, Machine Learning, BigQuery...

Experience

Availability

Full-time

Preferred Environment

Python, Machine Learning, Data Science, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Computer Vision, Data Analysis

The most amazing...

...thing I've done is developing systems that increased companies' revenue and improved operations for some of the best companies in the MENA region.

Work Experience

Senior Data Engineer | Data Scientist

2022 - PRESENT
Inkitt
  • Developed a collaborative-based recommender system that increased the average chapters read per user by 4%.
  • Created an NLP content-based recommender system that increased the average chapters read per user by 5%.
  • Built ETL pipelines using Python, Airflow, and DBT to extract data from various data sources, transform and join various transactional entities, apply complex business logic, and load it into the target OLAP system (e.g., Redshift, Snowflake).
  • Prepared and implemented an A/B testing framework that runs statistical tests on all our A/B tests and returns the result to be displayed with an informative data dashboard using Redash.
  • Developed a Bayesian ranking algorithm that personalized users' home screens according to their interests.
  • Created multiple data dashboards using Redash to show the progress of features and KPIs that are valuable for business decisions.
  • Led a significant refactoring effort of our Airflow codebase to make it more modular, testable, and reusable.
  • Trained Stable Diffusion on book covers to help book writers generate book covers easily.
Technologies: Python, Recommendation Systems, SQL, Data Science, Redshift, A/B Testing, Machine Learning, Collaborative Filtering, Apache Airflow, Data Engineering, Data Analysis, Data Visualization, Data Analytics, Redash, BERT, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Data Build Tool (dbt), Artificial Intelligence (AI), ETL, Software Architecture, Data Architecture, Machine Learning Operations (MLOps), Data Pipelines

Senior Data Scientist

2021 - PRESENT
Malbek
  • Developed a legal contract classifier using Bert that reached 92% accuracy.
  • Created a clause classifier using Bert that achieved 93% accuracy.
  • Built a NER system that captures the important aspects of a contract, like the parties and effective dates, with an 87% F1 score.
  • Developed an extractive QA system that answered questions about the contract, like extracting laws.
  • Moved the trained model to Java and Kotlin using ONNX to integrate the models with the back-end systems.
  • Deployed a Batch inference ML model with Kotlin and ONNX on AWS graviton. I have followed software design principles to make the code scalable and testable.
Technologies: Python, Data Science, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Java, Kotlin, Pandas, NumPy, Scikit-learn, XGBoost, SciPy, BERT, Artificial Intelligence (AI), Open Neural Network Exchange (ONNX), Software Engineering, Machine Learning Operations (MLOps)

Senior Data Engineer | Data Scientist

2021 - 2022
Capiter
  • Developed and deployed a product recommendation engine that increased the average basket value by 5%.
  • Developed Batch ETL pipelines using Python, Airflow, and DBT that transfer data from our Postgres databases into BigQuery.
  • Built a data streaming pipeline using Dataproc and PySpark to stream time-sensitive data into BigQuery.
  • Developed and deployed a demand estimation model that predicts demand for 2,000 products in the upcoming days in our warehouses—it reduced stock-outs by 50% and increased revenue by 17%.
  • Created an A/B testing framework that helped the business and data science to make decisions more confidently.
  • Developed and deployed a product discount manager using machine learning and the price elasticity of demand to increase revenue and sell stocks before a specific time frame.
  • Analyzed product sales data to make crucial decisions about pricing and outsourcing to optimize revenue.
  • Built and deployed a stock management system that reduced manual labor by 80%.
  • Built multiple DataStudio Dashboards for the operations teams that were very essential in making decisions and tracking KPIs.
  • On-boarded semi-structured data sources to the data lake (GCP Buckets) that allow data scientists to run ad-hoc analytics and train predictive.
Technologies: Python, Data Science, Docker, Software Engineering, Machine Learning, BigQuery, Google Cloud Platform (GCP), NumPy, Pandas, Scikit-learn, SciPy, Recommendation Systems, Data Analytics, Data Analysis, Data Engineering, Apache Airflow, PySpark, ETL, Streaming Data, Google Data Studio, Time Series Analysis, Time Series, Artificial Intelligence (AI), Data Build Tool (dbt), Data Visualization, Data Lakes, Machine Learning Operations (MLOps), Data Pipelines

Senior Data Engineer | Data Scientist

2020 - 2021
Speakol
  • Developed an ad click-through rate prediction model that increased the ad click-through rate by 7%.
  • Created an A/B testing framework that helped determine whether test results were statistically significant using probability and statistics rules.
  • Trained and fine-tuned masked language models like BERT and distillation to extract features from articles for a recommendation system.
  • Developed an article content-based recommendation system using Go and NLP that served millions of users daily.
  • Architected and built scalable, serverless, and event-driven ETL pipelines from scratch, bringing thousands of raw data files to production per day by leveraging EC2, S3, EFS, Step Functions, Lambda, Glue, and Redshift.
  • Trained and tuned NER models to extract named entities from articles. I also trained article classification models that were used in targeting Ads.
  • Conducted SWOT analysis for the CPA system to improve conversion by 20%. This critical analysis helped the business make essential decisions and discover the source of problems.
  • Implemented data dashboards using Redash and Tableau that contained different marketing KPIs and insights that were crucial to making day-to-day decisions.
  • Identified and planned upcoming projects data and AI projects to move the business forward.
  • Deployed a recommendation engine using Go as a RESTful API by using software engineering best practices.
Technologies: Python, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Machine Learning, Deep Learning, Docker, Kubernetes, Software Engineering, Data Science, Recommendation Systems, NumPy, Amazon Web Services (AWS), Pandas, XGBoost, Scikit-learn, SciPy, Natural Language Toolkit (NLTK), Data Analytics, Data Analysis, Data Visualization, Artificial Intelligence (AI), Data Engineering, Amazon Athena, Tableau, Machine Learning Operations (MLOps), Amazon S3 (AWS S3), AWS Glue, Amazon EC2 API, Amazon EFS, Data Pipelines

Machine Learning | BI Engineer

2018 - 2020
The D. GmbH
  • Trained and deployed a computer vision deep learning object detector called YOLO.
  • Developed motion blur augmentation paper to enhance the detection rate for fast-moving objects from 30% to 78%.
  • Integrated and enhanced a tracker module over the object detector called DeepSORT.
  • Developed an optical flow-based system to enhance the detection rate for fast-moving objects from 78% to 93%.
  • Trained and tuned a pose model to detect human joints, which improved the localization of joints.
  • Developed object motion analysis systems that take object bounding boxes and pose outputs to classify what action is taking place in a video snippet.
  • Designed and built real-time dashboards to provide insights on market trends and user behaviors with Tableau.
Technologies: Python, Machine Learning, Computer Vision, Deep Learning, Docker, Kubernetes, NumPy, Pandas, Scikit-learn, SciPy, Image Recognition, Object Detection, Object Tracking, Video Analysis, Artificial Intelligence (AI), Tableau

MultiCheXNet

https://arxiv.org/abs/2008.01973
MultiCheXNet is an end-to-end multi-task learning model that can take advantage of different X-rays data sets of pneumonia-like diseases in one neural architecture, performing three tasks simultaneously—diagnosis, segmentation, and localization.

Retrieval-augmented Generation for Question Answering Systems

http://paper.ijcsns.org/07_book/202206/20220644.pdf
A QA system using BERT models that answer common questions about religion. Since this field is very broad, we retrieved sources that were most relevant to the asked questions and gave them to the QA model while generating an answer.

NLP Classifier

https://github.com/abdullahtarek/nlp_classifier
This is a general-purpose NLP classifier that helps machine learning engineers develop NLP classifiers quickly, efficiently, and with minimal coding. This library uses Hugging Face to utilize very powerful NLP models pre-trained on massive datasets.
2014 - 2018

Bachelor's Degree in Computer Science

University of Greenwich - London, UK

APRIL 2020 - PRESENT

Data Scientist Nanodegree

Udacity

APRIL 2018 - PRESENT

Data Analyst Nanodegree

Udacity

Libraries/APIs

TensorFlow, Keras, NumPy, Pandas, Scikit-learn, SciPy, XGBoost, Natural Language Toolkit (NLTK), PyTorch, PySpark, Amazon EC2 API

Tools

Tableau, BigQuery, Apache Airflow, Redash, Amazon Athena, AWS Glue

Languages

R, Java, Kotlin, Python, C++, SQL

Paradigms

Data Science, ETL

Platforms

Docker, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes

Storage

Redshift, Data Lakes, Amazon S3 (AWS S3), Amazon EFS, Data Pipelines

Other

Machine Learning, Programming, Computer Vision, Deep Learning, Natural Language Processing (NLP), Data Analysis, Time Series, Recommendation Systems, Data Wrangling, Dashboards, Storytelling, BERT, Image Recognition, GPT, Generative Pre-trained Transformers (GPT), Software Engineering, Object Detection, Object Tracking, Video Analysis, Data Analytics, Data Visualization, Data Engineering, Streaming Data, A/B Testing, Collaborative Filtering, Data Build Tool (dbt), Google Data Studio, Time Series Analysis, Artificial Intelligence (AI), Open Neural Network Exchange (ONNX), Software Architecture, Data Architecture, Machine Learning Operations (MLOps), Data Warehousing

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring