Mohab Ayman, Data Scientist and AI Developer in Cairo, Cairo Governorate, Egypt
Mohab Ayman

Data Scientist and AI Developer in Cairo, Cairo Governorate, Egypt

Member since December 4, 2020
Mohab is a data scientist and machine learning developer, specializing in natural language processing (NLP) and computer vision. He has five years of professional experience, and recent projects have focused on machine learning in the areas of natural language understanding (NLU), cheminformatics, and self-driving cars. Mohab stays current with cutting-edge advancements in deep learning.
Mohab is now available for hire


  • Octimine
    Web Development, Data Modeling, JavaScript, Docker Hub, NumPy, Matplotlib...
  • Microsoft
    Web Development, Matplotlib, NumPy, .NET, Visual Studio Code, Visual Studio...
  • Self-employed
    C++, Visual Studio Code, GitLab, GitHub, Git, Jupyter Notebook, R...



Cairo, Cairo Governorate, Egypt



Preferred Environment

Anaconda, PyTorch, Linux, Python

The most amazing...

...project I've developed is a deep learning system for pairing work partners to cooperate on their similar goals based on semantic similarity of their profiles.


  • Data Scientist

    2019 - 2021
    • Conducted research in biomedical named entity recognition (NER) and developed a system in Python that extracts and normalizes chemical entities and diseases from the legal text.
    • Created a monitoring system in Node.js to collect information from staging and production servers. Visualized the results and created monitoring dashboards using Grafana.
    • Used Docker to containerize external dependencies and runtimes for various system components to alleviate the dependency overhead and create faster development pipelines.
    Technologies: Web Development, Data Modeling, JavaScript, Docker Hub, NumPy, Matplotlib, Machine Learning, Visual Studio Code, GitLab, Git, Jupyter Notebook, Natural Language Processing (NLP), Word2Vec, Linux, Java, Python, Data Science, Software Engineering, Neural Networks, Deep Neural Networks, NLTK, Node.js, Docker, Grafana, Pandas, Cheminformatics, Named-entity Recognition (NER), Data Visualization, Deep Learning, Transformers, HDF5, Scikit-learn, Seaborn, PyTorch, Elasticsearch, Kibana, Data Engineering, Big Data, XPath, XQuery, Scraping, Data Scraping, Text Classification, Regex, Categorization, Data Pipelines, Data Analytics, Data Analysis, Analysis, Analytics, Scientific Data Analysis, JSON, Redis, Data Mining, Pytest, Text Mining, Language Models, BERT, Artificial Intelligence (AI)
  • Research Software Development Engineer

    2018 - 2019
    • Developed an automated benchmarking pipeline in Python based on various NLU evaluation metrics. The pipeline runs periodically in an automated fashion and produces up-to-date evaluation metrics of the system and comparisons with competitor systems.
    • Worked on back-end servers with C# and .NET framework. Created new API endpoints and optimized existing ones, resulting in a significant drop in response latency.
    • Refactored a large system component with legacy code to an extensible design following best-practice design patterns, thus allowing for easier future extendibility while maintaining backward compatibility.
    Technologies: Web Development, Matplotlib, NumPy, .NET, Visual Studio Code, Visual Studio, Git, Natural Language Processing (NLP), Word2Vec, Anaconda, Agile Software Development, Software Engineering, Data Science, Seaborn, Scikit-learn, NLTK, Pandas, Natural Language Understanding (NLU), Named-entity Recognition (NER), Data Visualization, ASP.NET, C#, Python, Regex, Text Classification, Classification, Text Categorization, SQL, Data Analysis, Data Analytics, Data Pipelines, Analysis, Analytics, Scientific Data Analysis, Pytest, ETL, ETL Tools, ETL Testing, JavaScript, Text Mining, JSON, Code Review, Technical Hiring, Interviewing, Artificial Intelligence (AI)
  • Data Scientist

    2016 - 2016
    • Collaborated with chemist experts on chemical data analysis tasks, focusing on finding patterns and relations between chemical compound structures and their usage in drugs related to specific diseases.
    • Conducted experiments in natural language understanding and created a pipeline that performs intent classification and named-entity recognition to automate the processing of client receipts.
    • Used image recognition and computer vision algorithms to enhance the capabilities of a license plate recognition system to identify non-standard, hand-written, and multilingual characters.
    Technologies: C++, Visual Studio Code, GitLab, GitHub, Git, Jupyter Notebook, R, Natural Language Processing (NLP), Word2Vec, Anaconda, Data Science, Software Engineering, Data Visualization, NLTK, Computer Vision, OpenCV, Named-entity Recognition (NER), Natural Language Understanding (NLU), Cheminformatics, NumPy, Matplotlib, Seaborn, Scikit-learn, Pandas, Python, Exploratory Data Analysis, Text Categorization, Data Analysis, Data Analytics, Analysis, Analytics, Scientific Data Analysis, Data Mining, JavaScript, JSON, Artificial Intelligence (AI)
  • Research Intern

    2015 - 2015
    Ulm University
    • Conducted research in neuroinformatics, focusing on analyzing biomedical data of patients and identifying patterns that reflect the level of pain a patient is undergoing during a medical operation.
    • Created machine learning models that predict the pain intensity of a specific patient based on visual data from their facial expressions and biopotential data from sensors recording signals in their nervous system.
    • Developed a neural network package in the R language that implements a parameterized, multi-layer perception optimized with resilient and classic backpropagation algorithms.
    Technologies: Ggplot2, C#, Data Science, Data Visualization, Deep Learning, Neuroinformatics, Neural Networks, Machine Learning, Python, R, Data Analysis, Data Analytics, Analysis, Analytics, Scientific Data Analysis, Clustering, RStudio, RStudio Shiny, Dplyr, Tidyverse, Artificial Intelligence (AI)


  • Deep Learning Helper for Annotating Pixels for Semantic Segmentation

    A system that helps human annotators label image pixels for semantic segmentation. The system uses active learning to suggest a subset of the pixels to be labeled, yet arrives at comparable accuracy. I built and implemented the system using PyTorch and other helping libraries. It significantly reduces the effort required by a human annotator to label an image.

  • Word Embeddings for Work Colleague Matching

    A system that matches colleagues with similar objectives and skills to facilitate collaboration. I built a system that uses word-embedding algorithms to find the semantic similarity of employees based on their profiles. Employees can easily use the system to find colleagues to help them with their tasks.

  • Automated Data Processing and Visualization Pipeline

    An automated pipeline, created with Python and Bonobo, runs chained data processing operations with multiple parameters and automatically produces analysis plots with Seaborn. The goal of the pipeline was to reverse engineer the parameters needed to replicate some results for the client.

  • Generative Adversarial Networks for Improving Image Quality

    A system that uses Generative Adversarial Networks to recover high-resolution images from low-resolution images and semantic labels used by the client. The system is implemented in PyTorch and uses the pre-trained model of the SuperResolution GAN architecture.

  • Traffic Scene Generation Based on Graph CNNs and GANs

    A deep learning pipeline for generating traffic scene images for training self-driving cars. The pipeline uses recent advances in graph convolutional neural networks and generative adversarial networks to control the number and type of objects in the traffic scene and the generated images' daytime and weather conditions.


  • Languages

    Python, SQL, C#, R, Java, C++, JavaScript, SPARQL, RDF, XPath, XQuery, Regex, Google Apps Script
  • Libraries/APIs

    Pandas, SciPy, PyTorch, NLTK, HDF5, TensorFlow, Node.js, Scikit-learn, Matplotlib, NumPy, OpenCV, Ggplot2, Spark ML, SQLAlchemy, NetworkX, Tidyverse, Google Sheets API, Google Speech API, Google Speech-to-Text API, React, Office API, LINQ
  • Paradigms

    Data Science, Agile Software Development, MapReduce, ETL
  • Platforms

    Jupyter Notebook, Visual Studio Code, Linux, Anaconda, Docker, Amazon Web Services (AWS), RStudio, Google Cloud Platform (GCP)
  • Storage

    PostgreSQL, Cassandra, Elasticsearch, MySQL, Data Pipelines, JSON, Redis, Redshift
  • Other

    Neural Networks, Data Visualization, Natural Language Processing (NLP), Machine Learning, Artificial Intelligence (AI), Data Analysis, Data Analytics, Analysis, Analytics, Transformers, Computer Vision, Active Learning, Deep Learning, BERT, A/B Testing, Cohort Analysis, Metabase, Language Models, Natural Language Understanding (NLU), Semantic Segmentation, Software Engineering, Cheminformatics, Word2Vec, GloVe, Convolutional Neural Networks, Recurrent Neural Networks, Neuroinformatics, Deep Neural Networks, Data Modeling, Web Development, Linear Regression, Linear Algebra, Time Series, Time Series Analysis, Social Network Analysis, Network Analysis, Data Engineering, Mathematics, Statistics, Data Processing, Bonobo, Reverse Engineering, Big Data, Scraping, Data Scraping, Text Classification, Classification, Exploratory Data Analysis, Text Categorization, Categorization, Scientific Data Analysis, Clustering, FAISS, Social Network Analytics, Image Processing, Data Build Tool (dbt), Funnel Analysis, Hypothesis Testing, Generative Adversarial Networks (GANs), Image Analysis, Shell Scripting, Web Scraping, Statistical Data Analysis, Hugging Face, Machine Learning Operations (MLOps), Google Cloud Functions, Predictive Analytics, Data Mining, ETL Tools, ETL Testing, Text Mining, Self-driving Cars, Code Review, Technical Hiring, Interviewing, Recommendation Systems, Excel 365, Experimental Design, OfficeJS, Office Add-ins, Database Analytics, Artificial Neural Networks (ANN), Search
  • Frameworks

    ASP.NET, Flask, .NET, Spark, Apache Spark, RStudio Shiny
  • Tools

    Named-entity Recognition (NER), Seaborn, Git, Visual Studio, Grafana, GitLab, Docker Hub, GitHub, Spark SQL, Kibana, Apache Airflow, Amazon SageMaker, Elastic, Dplyr, Google Sheets, Pytest, Babel, Yeoman, Doc2Vec


  • Master's Degree in Data Science
    2019 - 2022
    Technical University of Munich (TUM) - Germany
  • Bachelor's Degree (Hons) in Computer Science
    2011 - 2016
    The German University in Cairo - New Cairo, Egypt


  • C#: Advanced Practice
  • React.js: Building an Interface
  • Amazon Redshift Essentials
  • Microsoft Office Add-ins for Developers
  • React.js Essential Training
  • React: Creating and Hosting a Full-stack Site (2019)
  • The Data Science of Experimental Design

To view more profiles

Join Toptal
Share it with others