Mohab Ayman
Verified Expert in Engineering
Data Scientist and AI Developer
Cairo, Cairo Governorate, Egypt
Toptal member since December 4, 2020
Mohab is a data scientist and machine learning developer, specializing in natural language processing (NLP) and computer vision. He has five years of professional experience, and recent projects have focused on machine learning in the areas of natural language understanding (NLU), cheminformatics, and self-driving cars. Mohab stays current with cutting-edge advancements in deep learning.
Portfolio
Experience
- Natural Language Processing (NLP) - 8 years
- Data Science - 8 years
- Python - 8 years
- Pandas - 5 years
- Natural Language Toolkit (NLTK) - 4 years
- Neural Networks - 4 years
- Deep Learning - 4 years
- Natural Language Understanding (NLU) - 3 years
Availability
Preferred Environment
Anaconda, PyTorch, Linux, Python
The most amazing...
...project I've developed is a deep learning system for pairing work partners to cooperate on their similar goals based on semantic similarity of their profiles.
Work Experience
Machine Learning Engineer
Piggyback Inc
- Designed and developed personalized AI assistants tailored to specific use cases, such as sales assistance or customer support.
- Collected and processed data of various types to train AI assistants.
- Improved AI assistants by tuning their responses based on user feedback and user-provided documents.
Senior Product Data Scientist
ClassDojo
- Worked closely with product teams to develop yearly strategy plans. Built data models, data pipelines, and visualization dashboards to monitor progress against these plans and delivered periodic progress reports.
- Designed and maintained data pipelines with Airflow, integrating data from multiple sources and creating data models into the analytics platform.
- Developed an automated fuzzy matching pipeline to align external data sources with internal product entries.
- Used generative AI to automate manual team processes, such as reviewing user applications and making quality-based decisions, significantly increasing efficiency.
- Built predictive models to forecast user behaviors over future periods based on historical data and correlative factors.
- Conducted A/B testing and applied causal inference techniques, such as propensity score matching and difference-in-differences analysis, to measure team efforts' impact in controlled and non-experimental settings.
Machine Learning Engineer
Quantum Innovation Ventures LLC
- Developed an LLM-powered application to automate the process of creating investment memos.
- Created the architecture for the LLM with LangChain.
- Wrapped the LLM app in a Django application and deployed it to Azure Cloud.
Data Scientist
Octimine
- Conducted research in biomedical named entity recognition (NER) and developed a system in Python that extracts and normalizes chemical entities and diseases from the legal text.
- Created a monitoring system in Node.js to collect information from staging and production servers. Visualized the results and made monitoring dashboards using Grafana.
- Used Docker to containerize external dependencies and runtimes for various system components to alleviate the dependency overhead and create faster development pipelines.
Research Software Development Engineer
Microsoft
- Developed an automated benchmarking pipeline in Python based on various NLU evaluation metrics. The pipeline runs periodically in an automated fashion and produces up-to-date evaluation metrics of the system and comparisons with competitor systems.
- Worked on back-end servers with C# and .NET framework. Created new API endpoints and optimized existing ones, resulting in a significant drop in response latency.
- Refactored a large system component with legacy code to an extensible design following best-practice design patterns, thus allowing for easier future extendibility while maintaining backward compatibility.
Data Scientist
Self-employed
- Collaborated with chemist experts on chemical data analysis tasks, focusing on finding patterns and relations between chemical compound structures and their usage in drugs related to specific diseases.
- Conducted experiments in natural language understanding and created a pipeline that performs intent classification and named-entity recognition to automate the processing of client receipts.
- Used image recognition and computer vision algorithms to enhance the capabilities of a license plate recognition system to identify non-standard, hand-written, and multilingual characters.
Research Intern
Ulm University
- Conducted research in neuroinformatics, focusing on analyzing biomedical data of patients and identifying patterns that reflect the level of pain a patient is undergoing during a medical operation.
- Created machine learning models that predict the pain intensity of a specific patient based on visual data from their facial expressions and biopotential data from sensors recording signals in their nervous system.
- Developed a neural network package in the R language that implements a parameterized, multi-layer perception optimized with resilient and classic backpropagation algorithms.
Experience
AI Assistant for Investment Memo Creation
AI Assistant for Lawyers
Personalized AI Assistants
AI Evaluator for Automating Customer Service Chat Evaluation
Generating Informed Sitemaps Using Web Crawling and GPT
Word Embeddings for Work Colleague Matching
Deep Learning Helper for Annotating Pixels for Semantic Segmentation
Automated Data Processing and Visualization Pipeline
Generative Adversarial Networks for Improving Image Quality
Traffic Scene Generation Based on Graph CNNs and GANs
Predicting Likelihood of Customer Purchase in eCommerce
Education
Master's Degree in Data Engineering and Analytics
Technical University of Munich - Germany
Bachelor's Degree (Hons) in Computer Science
The German University in Cairo - New Cairo, Egypt
Skills
Libraries/APIs
Pandas, Scikit-learn, NumPy, SciPy, PyTorch, Natural Language Toolkit (NLTK), REST APIs, Hugging Face Transformers, HDF5, TensorFlow, Node.js, Matplotlib, OpenCV, Ggplot2, Spark ML, SQLAlchemy, NetworkX, Tidyverse, Google Sheets API, Google Speech API, Google Speech-to-Text API, React, Office API, LINQ, D3.js, OpenAI API, OpenAI Assistants API, Redis Queue, XGBoost, SpaCy
Tools
ChatGPT, AI Prompts, Celery, Named-entity Recognition (NER), Seaborn, Git, Visual Studio, Grafana, GitLab, Docker Hub, GitHub, Spark SQL, Kibana, Apache Airflow, Amazon SageMaker, Elastic, Dplyr, Google Sheets, Pytest, Babel, Yeoman, Doc2Vec, Jupyter, Dialogflow, Azure Machine Learning
Languages
Python, SQL, TypeScript, C#, R, Java, C++, JavaScript, SPARQL, RDF, XPath, XQuery, Regex, Google Apps Script, Python 3
Paradigms
ETL, Agile Software Development, MapReduce, Search Engine Optimization (SEO)
Platforms
Jupyter Notebook, Visual Studio Code (VS Code), Docker, Amazon Web Services (AWS), Linux, Anaconda, RStudio, Google Cloud Platform (GCP), Azure, AWS Lambda
Storage
PostgreSQL, Data Pipelines, Document Databases, NoSQL, Cassandra, Elasticsearch, MySQL, JSON, Redis, Redshift, ClickHouse
Frameworks
Django, ASP.NET, Flask, .NET, Spark, Apache Spark, RStudio Shiny, Jinja, Streamlit, LangGraph, LlamaIndex, GAE
Other
Data Science, Neural Networks, Data Visualization, Natural Language Processing (NLP), Machine Learning, Artificial Intelligence (AI), Data Analysis, Data Scraping, Data Analytics, Analysis, Analytics, Large Language Models (LLMs), OpenAI GPT-4 API, LangChain, OpenAI GPT-3 API, OpenAI, Prompt Engineering, Semantic Search, Retrieval-augmented Generation (RAG), ChatGPT API, Data Scientist, Generative Artificial Intelligence (GenAI), AI Agents, AI Chatbots, Multi-agent Systems, Back-end Development, Vector Databases, Fine-tuning, Feature Engineering, Transformers, Computer Vision, Active Learning, Deep Learning, Data Engineering, BERT, A/B Testing, Cohort Analysis, Funnel Analysis, Hypothesis Testing, Metabase, Statistical Data Analysis, Hugging Face, Machine Learning Operations (MLOps), ETL Tools, Recommendation Systems, Language Models, Chatbots, Full-stack, APIs, Statistical Analysis, Algorithms, Data Structures, AI-enabled Search, API Integration, Open Source, Data Cleansing, Data Reporting, Workflow Automation, Natural Language Understanding (NLU), Semantic Segmentation, Software Engineering, Cheminformatics, Word2Vec, GloVe, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Neuroinformatics, Deep Neural Networks (DNNs), Data Modeling, Web Development, Linear Regression, Linear Algebra, Time Series, Time Series Analysis, Social Network Analysis, Network Analysis, Mathematics, Statistics, Data Processing, Bonobo, Reverse Engineering, Big Data, Scraping, Text Classification, Classification, Exploratory Data Analysis, Text Categorization, Categorization, Scientific Data Analysis, Clustering, FAISS, Social Network Analytics, Image Processing, Data Build Tool (dbt), Generative Adversarial Networks (GANs), Image Analysis, Shell Scripting, Web Scraping, Google Cloud Functions, Predictive Analytics, Data Mining, ETL Testing, Text Mining, Self-driving Cars, Code Review, Technical Hiring, Interviewing, Excel 365, Experimental Design, OfficeJS, Office Add-ins, Database Analytics, Artificial Neural Networks (ANN), Search, Generative Pre-trained Transformers (GPT), GPT Neo, HTML Parsing, Text Generation, Optical Character Recognition (OCR), Text Recognition, CSV, Data Transformation, Word Embedding, Back-end, Dashboards, Gunicorn, Software Architecture, Cloud, Text to Task, Data Synthesis, ChatGPT Prompts, Document Processing, Multimodal Models, Weaviate, Website Data Scraping, Web Crawlers, eCommerce, Causal Inference, Data Matching, Forecasting, Star Schema, Fact Tables, Rankings, nDCG, Domain Adaptation, Large Language Model Operations (LLMOps), Transformer Models
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring