Khaled is available for hire

Khaled Abdelhamid

Verified Expert in Engineering

Machine Learning Engineer and Developer

Location

6th of October City, Giza Governorate, Egypt

Toptal Member Since

June 20, 2022

Khaled is a senior machine learning engineer with four years of experience building state-of-the-art solutions. He is passionate about natural language processing and computer vision. Khaled specializes in web scraping, data collection, and publishing competitive datasets in the Arabic language.

Portfolio

Agolo

AI Design, Machine Learning, Natural Language Processing (NLP)...

Udacity

Python 3, Data Pipelines, Advisory

Online Freelance Agency

Python 3, Natural Language Toolkit (NLTK), SpaCy, NumPy, Pandas, Git, Rasa NLU...

Experience

Python 3 - 4 years PyTorch - 3 years Natural Language Toolkit (NLTK) - 3 years SpaCy - 3 years TensorFlow - 3 years Amazon Web Services (AWS) - 3 years Docker - 2 years Rasa NLU - 2 years

Availability

Part-time

Preferred Environment

Linux, Windows, Visual Studio Code (VS Code), Slack, Notion, Python 3, Jira

The most amazing...

...thing I've developed is a smart engine that creates a knowledge graph of jobs in different industries for an HR development company.

Work Experience

Machine Learning/NLP Engineer

2022 - PRESENT

Agolo

Led the implementation of multiple cutting-edge machine learning models for natural language processing (NLP) applications, achieving state-of-the-art accuracy and performance.
Developed custom pre- and post-processing pipelines to enhance named-entity recognition (NER) robustness and accuracy while also improving multilingual support.
Contributed to the development and optimization of a knowledge base system specifically tailored for information retrieval, streamlining access to critical information within the organization.
Created tailored evaluation pipelines to meticulously measure the accuracy of various machine learning models, contributing to ongoing excellence.
Constructed comprehensive quantitative evaluation dashboards, empowering development teams to conduct rigorous assessments and identify areas for improvement.
Actively engaged in diverse research initiatives aimed at harnessing the capabilities of large language models, integrating them into the pipeline using tools like LangChain.
Orchestrated end-to-end pipelines to facilitate seamless communication among numerous interconnected services, optimizing system performance.
Played a pivotal role in seamlessly integrating machine learning models into CI/CD pipelines, ensuring scalability and enhanced performance.
Devised and executed extensive performance stress testing protocols for deployed ML services, determining optimal resource requirements under varying loads.
Managed a dynamic team of five professionals, successfully achieving the goal of implementing robust multilingual support within an impressive timeframe of under a month.

Technologies: AI Design, Machine Learning, Natural Language Processing (NLP), OpenAI GPT-3 API, Large Language Models (LLMs), Data Cleansing, OpenAI GPT-4 API

Session Lead in Computer Science

2022 - 2023

Udacity

Successfully managed weekly sessions with a cohort of 35 students, achieving an impressive graduation rate of 93%.
Demonstrated exceptional leadership as a top-rated session lead, consistently earning a 5-star rating and receiving outstanding feedback from students.
Actively maintained and provided mentorship to numerous students through the Slack platform, fostering a supportive and conducive learning environment.

Technologies: Python 3, Data Pipelines, Advisory

Machine Learning Engineer

2019 - 2022

Online Freelance Agency

Contributed to building machine learning-based solutions for numerous customers.
Designed and established data pipelines for NLP-based projects.
Acted as the machine learning freelance achieving the maximum ratings and a 98% job success rate.

Technologies: Python 3, Natural Language Toolkit (NLTK), SpaCy, NumPy, Pandas, Git, Rasa NLU, PyTorch, TensorFlow, Keras, Amazon Web Services (AWS), Docker, MATLAB, Deep Learning, Machine Learning, Data Science, PostgreSQL, TensorBoard, BERT, DeepSpeech, Statistics, Visualization, A/B Testing, Matplotlib, Seaborn, Streamlit, Amazon S3 (AWS S3), Amazon EC2, Amazon EBS, SciPy, Jupyter, Jupyter Notebook, OpenCV, LabVIEW, Scrapy, Beautiful Soup, Selenium, Linux, Windows, Visual Studio Code (VS Code), Slack, Trello, Notion, English, Scikit-learn, Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Regex, OCR, Data Visualization, Google Cloud Platform (GCP), Python Asyncio, LaTeX, Web Dashboards, REST, Computer Vision, Bash, Bash Script, FFmpeg, Web Scraping, Google Colaboratory (Colab), Rasa.ai, Named-entity Recognition (NER), Chatbots, Image Processing, Markdown, YAML, JSON, ELK (Elastic Stack), Data Cleaning, Data Collection, FastAPI, Data Analysis, Data Processing, Signal Processing, Digital Signal Processing, Artificial Intelligence (AI), Text Analytics, Python, GPU Computing, Cloud, Machine Vision, Neural Networks, Apache Spark, ETL, Text Generation, Language Models, Speech Recognition, APIs, Data Pipelines, Google BigQuery, AI Design, Chatbot Conversation Design, Project Consultancy, Advisory, ChatGPT, OpenAI GPT-3 API, Large Language Models (LLMs), Azure Machine Learning, Amazon Machine Learning, Google Cloud Machine Learning, Data Cleansing, OpenAI GPT-4 API, Chatbot, LlamaIndex, LangChain

Research Assistant

2020 - 2021

Zewail City

Developed a transformer-based model for the Arabic text diacritization task and outperformed the state-of-the-art method with a total accuracy of 98%.
Processed the Arabic speech data and coded a deep learning-based model to do speech-to-text over the processed data.
Wrote comparative articles and literature reviews on natural language processing and deep learning and their application in Arabic for non-specialized Arabic readers.

Technologies: Python 3, TensorBoard, TensorFlow, Deep Learning, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), SpaCy, Natural Language Toolkit (NLTK), NumPy, Pandas, Git, Rasa NLU, PyTorch, Keras, Amazon Web Services (AWS), Docker, MATLAB, Machine Learning, Data Science, BERT, DeepSpeech, Statistics, Visualization, A/B Testing, Matplotlib, Seaborn, Streamlit, Amazon S3 (AWS S3), Amazon EC2, Amazon EBS, SciPy, Jupyter, Jupyter Notebook, OpenCV, LabVIEW, Linux, Windows, Visual Studio Code (VS Code), Slack, Trello, Notion, English, Scikit-learn, Regex, OCR, Data Visualization, Google Cloud Platform (GCP), LaTeX, Web Dashboards, REST, Computer Vision, Bash, Bash Script, FFmpeg, Web Scraping, NodeMCU, Google Colaboratory (Colab), Named-entity Recognition (NER), Chatbots, Image Processing, Markdown, YAML, JSON, Data Cleaning, Data Collection, FastAPI, Data Analysis, Data Processing, Signal Processing, Digital Signal Processing, Artificial Intelligence (AI), Text Analytics, Python, GPU Computing, Neural Networks, ETL, PySpark, Speech Recognition, APIs, Data Pipelines

Machine Learning Engineer

2020 - 2020

Proteinea (startup)

Conducted training, testing, and logging pipelines to use deep learning in predicting protein expression levels using DNA features.
Established monitoring systems of the factory production status for logging and further analysis.
Developed a computer vision-based system to count moving occluded objects over a conveyor built with fast and robust predictions.

Technologies: Python 3, PyTorch, TensorFlow, Pandas, NumPy, Scikit-learn, SciPy, Amazon Web Services (AWS), TensorBoard, Data Science, Git, Keras, Docker, Deep Learning, Machine Learning, Statistics, Visualization, A/B Testing, Matplotlib, Seaborn, Amazon S3 (AWS S3), Amazon EC2, Amazon EBS, Jupyter, Jupyter Notebook, OpenCV, Linux, Windows, Visual Studio Code (VS Code), Slack, Trello, Notion, English, OCR, Data Visualization, LaTeX, Web Dashboards, REST, Computer Vision, Bash, Bash Script, NodeMCU, Google Colaboratory (Colab), Image Processing, Markdown, JSON, Data Cleaning, Data Collection, Data Analysis, Data Processing, Signal Processing, Digital Signal Processing, Artificial Intelligence (AI), Text Analytics, Python, GPU Computing, Cloud, Machine Vision, Neural Networks, ETL, APIs, Data Pipelines, AI Design

Experience

Event Information Extraction from Tweets for Tech Startup

The project aims to extract information like dates, phone numbers, organizations, etc., from user tweets and save them in a structured format.

I have implemented the project using a combination of pattern recognition techniques using regex in addition to using Spacy's named entity recognition features. As for the dates, I have used a temporal model to parse the text and extract the dates and normalize them to be queried and analyzed.

Ranking and Associating Job Descriptions Along With Their Job Families for HR Agency

The project aims to get the similarity between each job description along with a job family description and level. Using NLP, I was able to make a ranking algorithm that gives a similarity score between each job and its job family.

Text Mining and Analysis for Marketing Company

The project aims to build multiple APIs over audio dialogues of focus groups and market surveys. The required modules included entity recognition, speech-to-text, text summarization, translation, key phrase extraction, and sentiment analysis.

End-to-end Chatbot for Customer Services

https://www.youtube.com/watch?v=SJRmzDfWIec

I developed a voice chatbot using the Rasa framework that contains the following features:
• Connects to REST APIs to handle users' voice and text commands from their database.
• Performs the extraction of the answers from archived business contracts using deep learning and fuzzy search.
• Conducts named-entity recognition, intent classification, and form-filling to extract all the needed information to perform a successful information retrieval for users.

AI-based OCR for Extracting Data from Electric Meters

The project aimed to build an AI OCR model and extract text and numbers from a calculator-type digital screen. The provided dataset was used to train and extract details with nearly 100% accuracy, while the setup was in a controlled environment indoors with an exact camera shot.

First, I applied gamma correction to the images and other noise-canceling techniques to reach the optimal state the OCR engine could handle. Then, I developed the solution using EasyOCR and PaddleOCR. To enhance the performance of the seven-segment display, I created a dataset using the TRDG library with custom seven-segment fonts to train the OCR engines. Because there were expected text patterns, I used NLP methods to correct text based on the most likely predictions.

YouTube Scraper

I architected highly parallel software to scrape channel video information and its timestamp. I used Selenium to scrape dynamic web pages, retrieve comments containing video timestamps detected with the regular expression, and deployed the system on an EC2 instance with continuous monitoring.

Traffic Analysis with CCTV Cameras

The goal was to classify and track all the moving objects in a fixed traffic spot to analyze the traffic movement. The video feed came from fixed-angle CCTV cameras with almost 12 hours of coverage for each location. The data was chunked and processed into reasonable sizes to be fed inside the model sequentially.

I used bash scripting and FFmpeg for the video processing and the FairMOT model to classify and track the objects. The categories were pedestrians, cars, motorcycles, tricycles, and buses. The tracking data was filtered and used to create a traffic flow map, a visualization chart showing the most condensed areas in a specific location for a given time.

Email Scraper

The project aimed to extract emails from 30,000 URLs saved in a CSV file.

I used Scrapy to make asynchronous requests to speed up scraping. For each page, I extracted all the emails using regular expression patterns to ensure the minimum amount of false negatives. The code finished with successful results and extracted emails from 90% of the websites.

Genuine Artistic Images from Custom Tags Using GANs

The project aimed to generate artistic images from royalty-free images using deep learning. The model inputs were name tags or categories such as cats, cars, and animals, and the expected output was an artistic representation of the given image with unique features.

I built a Streamlit dashboard to interact smoothly with the user and get the name tags. Then I scraped websites containing high-resolution free images and used the DeepDream model in processing the scraped images to get newer artistic ones that were unique and related to the given name tags. The output was delivered, allowing the user to refresh and edit the parameters provided to the DeepDream model.

Named-Entity Extractor Pipeline

The project included processing a dataset of sentences and labeling all the ones that contained names of locations. Once I extracted the entities, the project required further processing to standardize the writing of the extracted places and get detailed information about the location, such as the country and city.

Text Data Augmentation

The goal was to perform text data augmentation on a given dataset. I used a combination of the following techniques:
• random replacements of characters in words based on the most likely mistakes for such a word;
• use of synonyms and antonyms with negation;
• translation of phrases into a different language and then re-translation to the original language (this method could chain multiple languages sequentially); and
• paraphrasing sentences using the BERT-based model.

Converting Driver License Images Into Tabulated Data

The project aimed to extract all the possible information from a driver's license image and make a structured dataset with all the associated records. I used an OCR engine to scrap all the data from the cards and associate each text block into a category, such as a name, ID, or expiration date. The records were saved into a CSV, and I coded a Streamlit dashboard to show the dataset and analyzed the dataset further, including age distribution and card locations.

ECG Analysis and Cardiac Arrhythmia Detection

The project classified cardiac arrhythmia cases using the reading of ECG signals.

I built hardware to extract the ECG signal from a person using NodeMCU and ECG custom kit. Then I read and uploaded the data into Firebase and trained and deployed a machine learning classifier in GCP. The signal could have six categories—normal and the other five cardiac arrhythmia types. The signal was processed and analyzed with LabVIEW.

Website Topic Categorization

The goal was to label a given list of websites in a category based on the content, such as sports, economy, and news. The URLs were delivered in a CSV file.

I used Scrapy to process all the links asynchronously and get all the paragraph and header tags, which contained most of the informative text inside the URL. Then I used Google NLP API to extract the most likely category, saved the data continuously, and stored it in another CSV file.

JPEG Image Compression

https://github.com/Khaled-Abdelhamid/Jpeg

The project aimed to implement the basic JPEG image compression algorithm and develop metrics to monitor the data loss and compression efficiency.

I was able to get off 50% compression efficiency with almost 0.02% data loss between the raw and compressed image.

Spectrum Analyzer Application

https://github.com/Khaled-Abdelhamid/Spectrum-analyzer-application

The project involved building an application that performed spectrum analysis functionalities over a given audio signal like FFT, filtering, and windowing. The application also provided the user with an option to test and compare a variety of arbitrary signals.

Skills

Languages

Python 3, Regex, Markdown, Python, C++, C, Julia, C#, Bash, Bash Script, YAML

Frameworks

Streamlit, Hadoop, Spark, Scrapy, Selenium, Apache Spark, LlamaIndex

Libraries/APIs

Natural Language Toolkit (NLTK), SpaCy, NumPy, Pandas, Rasa NLU, PyTorch, SciPy, TensorFlow, Keras, Matplotlib, OpenCV, Beautiful Soup, Scikit-learn, PySpark, DeepSpeech, Python Asyncio, FFmpeg

Tools

Git, Jupyter, Notion, Named-entity Recognition (NER), Seaborn, LabVIEW, Slack, Trello, Rasa.ai, ChatGPT, Kibana, Azure Machine Learning, MATLAB, TensorBoard, Amazon EBS, LaTeX, Plotly, ELK (Elastic Stack), Jira

Paradigms

Data Science, REST, ETL

Platforms

Jupyter Notebook, Amazon Web Services (AWS), Docker, Visual Studio Code (VS Code), Amazon EC2, Linux, Windows, Google Cloud Platform (GCP), Firebase

Storage

JSON, Data Pipelines, PostgreSQL, Amazon S3 (AWS S3), Elasticsearch, HBase, SQL Server 2016, Apache Hive

Other

Deep Learning, Machine Learning, English, Natural Language Processing (NLP), Google Colaboratory (Colab), Chatbots, Image Processing, Data Cleaning, Data Processing, Data Analysis, Artificial Intelligence (AI), Text Analytics, Data Engineering, Neural Networks, Speech Recognition, APIs, AI Design, Chatbot Conversation Design, Project Consultancy, Advisory, GPT, Generative Pre-trained Transformers (GPT), Data Cleansing, BERT, Data Visualization, Web Dashboards, Computer Vision, Web Scraping, Signal Processing, Digital Signal Processing, GPU Computing, Cloud, Machine Vision, Text Generation, Language Models, Google BigQuery, OpenAI GPT-3 API, Large Language Models (LLMs), Amazon Machine Learning, Google Cloud Machine Learning, OpenAI GPT-4 API, Chatbot, LangChain, Statistics, Visualization, A/B Testing, OCR, NodeMCU, FastAPI, Data Collection

Education

2016 - 2021

Bachelor of Engineering Degree in Communications and Information Technology Engineering

University of Science and Technology, Zewail City - Giza, Egypt

Certifications

APRIL 2021 - PRESENT

Natural Language Processing Specialization

Coursera

MARCH 2021 - PRESENT

AWS Cloud Practitioner Essentials

Amazon Web Services

MARCH 2021 - PRESENT

Advanced Data Analysis Nanodegree Program

Udacity

NOVEMBER 2020 - NOVEMBER 2022

IELTS | 7.5 | C1

British Council

SEPTEMBER 2019 - PRESENT

Deep Learning Specialization

Coursera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring