Abdullah Tarek Farag
Verified Expert in Engineering
Data Scientist and Developer
Abdullah Tarek is a data scientist and engineer with more than four years of experience in data engineering, machine learning, data science, data analysis, computer vision, and NLP. He helps businesses build AI systems to improve their revenue, enhance operations, and build and deploy AI products. Abdullah Tarek is looking for interesting AI projects to work on across the globe using data science.
Portfolio
Experience
Availability
Preferred Environment
Python, Machine Learning, Data Science, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Computer Vision, Data Analysis
The most amazing...
...thing I've done is developing systems that increased companies' revenue and improved operations for some of the best companies in the MENA region.
Work Experience
Senior Data Engineer | Data Scientist
Inkitt
- Developed a collaborative-based recommender system that increased the average chapters read per user by 4%.
- Created an NLP content-based recommender system that increased the average chapters read per user by 5%.
- Built ETL pipelines using Python, Airflow, and DBT to extract data from various data sources, transform and join various transactional entities, apply complex business logic, and load it into the target OLAP system (e.g., Redshift, Snowflake).
- Prepared and implemented an A/B testing framework that runs statistical tests on all our A/B tests and returns the result to be displayed with an informative data dashboard using Redash.
- Developed a Bayesian ranking algorithm that personalized users' home screens according to their interests.
- Created multiple data dashboards using Redash to show the progress of features and KPIs that are valuable for business decisions.
- Led a significant refactoring effort of our Airflow codebase to make it more modular, testable, and reusable.
- Trained Stable Diffusion on book covers to help book writers generate book covers easily.
Senior Data Scientist
Malbek
- Developed a legal contract classifier using Bert that reached 92% accuracy.
- Created a clause classifier using Bert that achieved 93% accuracy.
- Built a NER system that captures the important aspects of a contract, like the parties and effective dates, with an 87% F1 score.
- Developed an extractive QA system that answered questions about the contract, like extracting laws.
- Moved the trained model to Java and Kotlin using ONNX to integrate the models with the back-end systems.
- Deployed a Batch inference ML model with Kotlin and ONNX on AWS graviton. I have followed software design principles to make the code scalable and testable.
Senior Data Engineer | Data Scientist
Capiter
- Developed and deployed a product recommendation engine that increased the average basket value by 5%.
- Developed Batch ETL pipelines using Python, Airflow, and DBT that transfer data from our Postgres databases into BigQuery.
- Built a data streaming pipeline using Dataproc and PySpark to stream time-sensitive data into BigQuery.
- Developed and deployed a demand estimation model that predicts demand for 2,000 products in the upcoming days in our warehouses—it reduced stock-outs by 50% and increased revenue by 17%.
- Created an A/B testing framework that helped the business and data science to make decisions more confidently.
- Developed and deployed a product discount manager using machine learning and the price elasticity of demand to increase revenue and sell stocks before a specific time frame.
- Analyzed product sales data to make crucial decisions about pricing and outsourcing to optimize revenue.
- Built and deployed a stock management system that reduced manual labor by 80%.
- Built multiple DataStudio Dashboards for the operations teams that were very essential in making decisions and tracking KPIs.
- On-boarded semi-structured data sources to the data lake (GCP Buckets) that allow data scientists to run ad-hoc analytics and train predictive.
Senior Data Engineer | Data Scientist
Speakol
- Developed an ad click-through rate prediction model that increased the ad click-through rate by 7%.
- Created an A/B testing framework that helped determine whether test results were statistically significant using probability and statistics rules.
- Trained and fine-tuned masked language models like BERT and distillation to extract features from articles for a recommendation system.
- Developed an article content-based recommendation system using Go and NLP that served millions of users daily.
- Architected and built scalable, serverless, and event-driven ETL pipelines from scratch, bringing thousands of raw data files to production per day by leveraging EC2, S3, EFS, Step Functions, Lambda, Glue, and Redshift.
- Trained and tuned NER models to extract named entities from articles. I also trained article classification models that were used in targeting Ads.
- Conducted SWOT analysis for the CPA system to improve conversion by 20%. This critical analysis helped the business make essential decisions and discover the source of problems.
- Implemented data dashboards using Redash and Tableau that contained different marketing KPIs and insights that were crucial to making day-to-day decisions.
- Identified and planned upcoming projects data and AI projects to move the business forward.
- Deployed a recommendation engine using Go as a RESTful API by using software engineering best practices.
Machine Learning | BI Engineer
The D. GmbH
- Trained and deployed a computer vision deep learning object detector called YOLO.
- Developed motion blur augmentation paper to enhance the detection rate for fast-moving objects from 30% to 78%.
- Integrated and enhanced a tracker module over the object detector called DeepSORT.
- Developed an optical flow-based system to enhance the detection rate for fast-moving objects from 78% to 93%.
- Trained and tuned a pose model to detect human joints, which improved the localization of joints.
- Developed object motion analysis systems that take object bounding boxes and pose outputs to classify what action is taking place in a video snippet.
- Designed and built real-time dashboards to provide insights on market trends and user behaviors with Tableau.
Experience
MultiCheXNet
https://arxiv.org/abs/2008.01973Retrieval-augmented Generation for Question Answering Systems
http://paper.ijcsns.org/07_book/202206/20220644.pdfNLP Classifier
https://github.com/abdullahtarek/nlp_classifierEducation
Bachelor's Degree in Computer Science
University of Greenwich - London, UK
Certifications
Data Scientist Nanodegree
Udacity
Data Analyst Nanodegree
Udacity
Skills
Libraries/APIs
TensorFlow, Keras, NumPy, Pandas, Scikit-learn, SciPy, XGBoost, Natural Language Toolkit (NLTK), PyTorch, PySpark, Amazon EC2 API
Tools
Tableau, BigQuery, Apache Airflow, Redash, Amazon Athena, AWS Glue
Languages
R, Java, Kotlin, Python, C++, SQL
Paradigms
Data Science, ETL
Platforms
Docker, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes
Storage
Redshift, Data Lakes, Amazon S3 (AWS S3), Amazon EFS, Data Pipelines
Other
Machine Learning, Programming, Computer Vision, Deep Learning, Natural Language Processing (NLP), Data Analysis, Time Series, Recommendation Systems, Data Wrangling, Dashboards, Storytelling, BERT, Image Recognition, GPT, Generative Pre-trained Transformers (GPT), Software Engineering, Object Detection, Object Tracking, Video Analysis, Data Analytics, Data Visualization, Data Engineering, Streaming Data, A/B Testing, Collaborative Filtering, Data Build Tool (dbt), Google Data Studio, Time Series Analysis, Artificial Intelligence (AI), Open Neural Network Exchange (ONNX), Software Architecture, Data Architecture, Machine Learning Operations (MLOps), Data Warehousing
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring