Mohab is available for hire

Mohab Ayman

Verified Expert in Engineering

Data Scientist and AI Developer

Location

Cairo, Cairo Governorate, Egypt

Toptal Member Since

December 4, 2020

Mohab is a data scientist and machine learning developer, specializing in natural language processing (NLP) and computer vision. He has five years of professional experience, and recent projects have focused on machine learning in the areas of natural language understanding (NLU), cheminformatics, and self-driving cars. Mohab stays current with cutting-edge advancements in deep learning.

Portfolio

Quantum Innovation Ventures LLC

Artificial Intelligence (AI), GPT, Back-end, APIs, Django, Azure, LangChain...

Octimine

Web Development, Data Modeling, JavaScript, Docker Hub, NumPy, Matplotlib...

Microsoft

Web Development, Matplotlib, NumPy, .NET, Visual Studio Code (VS Code)...

Experience

Python - 5 years Pandas - 5 years GPT - 4 years Deep Learning - 4 years Natural Language Processing (NLP) - 4 years Natural Language Toolkit (NLTK) - 4 years Data Science - 4 years Neural Networks - 4 years

Availability

Part-time

Preferred Environment

Anaconda, PyTorch, Linux, Python

The most amazing...

...project I've developed is a deep learning system for pairing work partners to cooperate on their similar goals based on semantic similarity of their profiles.

Work Experience

AI Developer

2023 - 2024

Quantum Innovation Ventures LLC

Developed an LLM-powered application to automate the process of investment memo creation.
Created the architecture for the LLM with Langchain.
Wrapped the LLM app in a Django application and deployed it to Azure Cloud.

Technologies: Artificial Intelligence (AI), GPT, Back-end, APIs, Django, Azure, LangChain, FAISS, OpenAI, OpenAI GPT-3 API, OpenAI GPT-4 API, Large Language Models (LLMs), Prompt Engineering

Data Scientist

2019 - 2021

Octimine

Conducted research in biomedical named entity recognition (NER) and developed a system in Python that extracts and normalizes chemical entities and diseases from the legal text.
Created a monitoring system in Node.js to collect information from staging and production servers. Visualized the results and made monitoring dashboards using Grafana.
Used Docker to containerize external dependencies and runtimes for various system components to alleviate the dependency overhead and create faster development pipelines.

Technologies: Web Development, Data Modeling, JavaScript, Docker Hub, NumPy, Matplotlib, Machine Learning, Visual Studio Code (VS Code), GitLab, Git, Jupyter Notebook, Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Word2Vec, Linux, Java, Python, Data Science, Software Engineering, Neural Networks, Deep Neural Networks, Natural Language Toolkit (NLTK), Node.js, Docker, Grafana, Pandas, Cheminformatics, Named-entity Recognition (NER), Data Visualization, Deep Learning, Transformers, HDF5, Scikit-learn, Seaborn, PyTorch, Elasticsearch, Kibana, Data Engineering, Big Data, XPath, XQuery, Scraping, Data Scraping, Text Classification, Regex, Categorization, Data Pipelines, Data Analytics, Data Analysis, Analysis, Analytics, Scientific Data Analysis, JSON, Redis, Data Mining, Pytest, Text Mining, Language Models, BERT, Artificial Intelligence (AI), Text Recognition, Data Processing, Data Transformation, Word Embedding, Back-end, Jupyter, Dashboards, Software Architecture

Research Software Development Engineer

2018 - 2019

Microsoft

Developed an automated benchmarking pipeline in Python based on various NLU evaluation metrics. The pipeline runs periodically in an automated fashion and produces up-to-date evaluation metrics of the system and comparisons with competitor systems.
Worked on back-end servers with C# and .NET framework. Created new API endpoints and optimized existing ones, resulting in a significant drop in response latency.
Refactored a large system component with legacy code to an extensible design following best-practice design patterns, thus allowing for easier future extendibility while maintaining backward compatibility.

Technologies: Web Development, Matplotlib, NumPy, .NET, Visual Studio Code (VS Code), Visual Studio, Git, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPT, Word2Vec, Anaconda, Agile Software Development, Software Engineering, Data Science, Seaborn, Scikit-learn, Natural Language Toolkit (NLTK), Pandas, Natural Language Understanding (NLU), Named-entity Recognition (NER), Data Visualization, ASP.NET, C#, Python, Regex, Text Classification, Classification, Text Categorization, SQL, Data Analysis, Data Analytics, Data Pipelines, Analysis, Analytics, Scientific Data Analysis, Pytest, ETL, ETL Tools, ETL Testing, JavaScript, Text Mining, JSON, Code Review, Technical Hiring, Interviewing, Artificial Intelligence (AI), Data Processing, CSV, Data Transformation, Back-end, Software Architecture, Azure

Data Scientist

2016 - 2016

Self-employed

Collaborated with chemist experts on chemical data analysis tasks, focusing on finding patterns and relations between chemical compound structures and their usage in drugs related to specific diseases.
Conducted experiments in natural language understanding and created a pipeline that performs intent classification and named-entity recognition to automate the processing of client receipts.
Used image recognition and computer vision algorithms to enhance the capabilities of a license plate recognition system to identify non-standard, hand-written, and multilingual characters.

Technologies: C++, Visual Studio Code (VS Code), GitLab, GitHub, Git, Jupyter Notebook, R, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Word2Vec, Anaconda, Data Science, Software Engineering, Data Visualization, Natural Language Toolkit (NLTK), Computer Vision, OpenCV, Named-entity Recognition (NER), Natural Language Understanding (NLU), Cheminformatics, NumPy, Matplotlib, Seaborn, Scikit-learn, Pandas, Python, Exploratory Data Analysis, Text Categorization, Data Analysis, Data Analytics, Analysis, Analytics, Scientific Data Analysis, Data Mining, JavaScript, JSON, Artificial Intelligence (AI), OCR, Text Recognition, Jupyter

Research Intern

2015 - 2015

Ulm University

Conducted research in neuroinformatics, focusing on analyzing biomedical data of patients and identifying patterns that reflect the level of pain a patient is undergoing during a medical operation.
Created machine learning models that predict the pain intensity of a specific patient based on visual data from their facial expressions and biopotential data from sensors recording signals in their nervous system.
Developed a neural network package in the R language that implements a parameterized, multi-layer perception optimized with resilient and classic backpropagation algorithms.

Technologies: Ggplot2, C#, Data Science, Data Visualization, Deep Learning, Neuroinformatics, Neural Networks, Machine Learning, Python, R, Data Analysis, Data Analytics, Analysis, Analytics, Scientific Data Analysis, Clustering, RStudio, RStudio Shiny, Dplyr, Tidyverse, Artificial Intelligence (AI)

Experience

AI Assistant for Investment Memo Creation

An AI assistant that I created for capital venture firms. The assistant automates the process of creating investment memos. The model uses multiple data sources as input: web search, web crawls, and custom-uploaded documents. The model cuts down the time to create investment memos from weeks to days.

AI Assistant for Lawyers

A system for assisting lawyers in reading and writing contracts, where a recommendation system was built to provide suggestions based on similar sentences to the sentences the lawyers write. Based on the collected data, a large language model (LLM) was developed to assist lawyers in generating definitions for legal terms and sections for legal contracts. The project involved creating a data pipeline to crawl and parse legal contracts in HTML.

AI Judge for Automating Customer Service Chat Evaluation

An automatic AI Judge, built using GPT4, to automate the process of evaluating chat dialogues between customer service bots and clients for an eCommerce website. The AI Judge evaluates whether the responses of the customer service bot are factual, appropriate, and of good quality. To achieve this, the AI Judge has access to data sources that it can query to validate the correctness of the customer service bot.

Generating Informed Sitemaps Using Web Crawling and GPT

A Python-based pipeline designed to create comprehensive sitemaps through web crawling of related websites, leveraging GPT to analyze multiple crawled sitemaps, and generating a unified and more complete overview of website structures.

Word Embeddings for Work Colleague Matching

A system that matches colleagues with similar objectives and skills to facilitate collaboration. I built a recommendation system that uses word-embedding algorithms to find the semantic similarity of employees based on their profiles. Additionally, the employees can easily use the system to get recommendations about colleagues who will help them in their tasks.

Deep Learning Helper for Annotating Pixels for Semantic Segmentation

A system that helps human annotators label image pixels for semantic segmentation. The system uses active learning to suggest a subset of the pixels to be labeled, yet arrives at comparable accuracy. I built and implemented the system using PyTorch and other helping libraries. It significantly reduces the effort required by a human annotator to label an image.

Automated Data Processing and Visualization Pipeline

An automated pipeline, created with Python and Bonobo, runs chained data processing operations with multiple parameters and automatically produces analysis plots with Seaborn. The goal of the pipeline was to reverse engineer the parameters needed to replicate some results for the client.

Generative Adversarial Networks for Improving Image Quality

A system that uses Generative Adversarial Networks to recover high-resolution images from low-resolution images and semantic labels used by the client. The system is implemented in PyTorch and uses the pre-trained model of the SuperResolution GAN architecture.

Traffic Scene Generation Based on Graph CNNs and GANs

A deep learning pipeline for generating traffic scene images for training self-driving cars. The pipeline uses recent advances in graph convolutional neural networks and generative adversarial networks to control the number and type of objects in the traffic scene and the generated images' daytime and weather conditions.

Education

2019 - 2022

Master's Degree in Data Science

Technical University of Munich (TUM) - Germany

2011 - 2016

Bachelor's Degree (Hons) in Computer Science

The German University in Cairo - New Cairo, Egypt

Certifications

OCTOBER 2022 - PRESENT

C#: Advanced Practice

OCTOBER 2022 - PRESENT

React.js: Building an Interface

OCTOBER 2022 - PRESENT

Amazon Redshift Essentials

OCTOBER 2022 - PRESENT

Microsoft Office Add-ins for Developers

OCTOBER 2022 - PRESENT

React.js Essential Training

OCTOBER 2022 - PRESENT

React: Creating and Hosting a Full-stack Site (2019)

OCTOBER 2022 - PRESENT

The Data Science of Experimental Design

Skills

Libraries/APIs

Pandas, Scikit-learn, NumPy, SciPy, PyTorch, Natural Language Toolkit (NLTK), HDF5, TensorFlow, Node.js, Matplotlib, OpenCV, Ggplot2, Spark ML, SQLAlchemy, NetworkX, Tidyverse, Google Sheets API, Google Speech API, Google Speech-to-Text API, React, Office API, LINQ, D3.js

Tools

ChatGPT, Celery, Named-entity Recognition (NER), Seaborn, Git, Visual Studio, Grafana, GitLab, Docker Hub, GitHub, Spark SQL, Kibana, Apache Airflow, Amazon SageMaker, Elastic, Dplyr, Google Sheets, Pytest, Babel, Yeoman, Doc2Vec, Jupyter

Languages

Python, SQL, C#, R, Java, C++, JavaScript, SPARQL, RDF, XPath, XQuery, Regex, Google Apps Script, TypeScript

Paradigms

Data Science, Agile Software Development, MapReduce, ETL, Search Engine Optimization (SEO)

Platforms

Jupyter Notebook, Visual Studio Code (VS Code), Amazon Web Services (AWS), Linux, Anaconda, Docker, RStudio, Google Cloud Platform (GCP), Azure

Storage

PostgreSQL, Cassandra, Elasticsearch, MySQL, Data Pipelines, JSON, Redis, Redshift

Frameworks

ASP.NET, Flask, .NET, Spark, Apache Spark, RStudio Shiny, Django, Jinja, Streamlit

Other

Neural Networks, Data Visualization, Natural Language Processing (NLP), Machine Learning, Artificial Intelligence (AI), Data Analysis, Data Scraping, Data Analytics, Analysis, Analytics, Large Language Models (LLMs), GPT, OpenAI GPT-4 API, LangChain, OpenAI GPT-3 API, OpenAI, Prompt Engineering, Semantic search, Transformers, Computer Vision, Active Learning, Deep Learning, Data Engineering, BERT, A/B Testing, Cohort Analysis, Metabase, Language Models, Natural Language Understanding (NLU), Semantic Segmentation, Software Engineering, Cheminformatics, Word2Vec, GloVe, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Neuroinformatics, Deep Neural Networks, Data Modeling, Web Development, Linear Regression, Linear Algebra, Time Series, Time Series Analysis, Social Network Analysis, Network Analysis, Mathematics, Statistics, Data Processing, Bonobo, Reverse Engineering, Big Data, Scraping, Text Classification, Classification, Exploratory Data Analysis, Text Categorization, Categorization, Scientific Data Analysis, Clustering, FAISS, Social Network Analytics, Image Processing, Data Build Tool (dbt), Funnel Analysis, Hypothesis Testing, Generative Adversarial Networks (GANs), Image Analysis, Shell Scripting, Web Scraping, Statistical Data Analysis, Hugging Face, Machine Learning Operations (MLOps), Google Cloud Functions, Predictive Analytics, Data Mining, ETL Tools, ETL Testing, Text Mining, Self-driving Cars, Code Review, Technical Hiring, Interviewing, Recommendation Systems, Excel 365, Experimental Design, OfficeJS, Office Add-ins, Database Analytics, Artificial Neural Networks (ANN), Search, Generative Pre-trained Transformers (GPT), GPT Neo, HTML Parsing, Text Generation, OCR, Text Recognition, CSV, Data Transformation, Word Embedding, Back-end, Dashboards, Gunicorn, Chatbots, Full-stack, Software Architecture, APIs, Cloud, Text to Task, Data Synthesis

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring