Karim is available for hire

Karim Foda

Verified Expert in Engineering

NLP Researcher and Developer

Location

London, United Kingdom

Toptal Member Since

July 6, 2020

Karim is an NLP researcher with in-depth and hands-on experience working on building machine learning (ML) models that aim to replicate specific human functions, thereby accelerating a business's processes. Most recently, Karim's focus has been on training large language models (LLMs) for natural language understanding (NLU) and natural language generation (NLG) through conversational chatbots.

Python Machine Learning Natural Language Processing (NLP)Statistics Artificial Intelligence (AI)Pandas Docker R Keras MATLAB TensorFlow SQL PostgreSQL Bash Time Series Analysis Hugging Face Natural Language Generation Optical Character Recognition

Portfolio

Kaizan

Artificial Intelligence (AI), OpenAI GPT-4 API, OpenAI GPT-3 API...

Shortform

Natural Language Processing (NLP), OpenAI GPT-4 API, Elasticsearch...

Grata

JSON, Roku, Machine Learning, Deep Neural Networks...

Experience

Python - 9 years Natural Language Processing (NLP) - 5 years Transformers - 3 years Hugging Face - 3 years Generative Pre-trained Transformers (GPT) - 2 years OpenAI - 2 years Generative Pre-trained Transformer 3 (GPT-3) - 2 years Web Scraping - 2 years

Availability

Full-time

Preferred Environment

Python

The most amazing...

...thing I believe I've built is a LongT5 model fine-tuned on generating automatic summaries of self-help books.

Work Experience

Lead NLP Engineer

2021 - PRESENT

Kaizan

Built a GPT-4-driven chatbot that combined factored cognition, LangChain, and Elasticsearch to augment an organization's employees with a perfect memory of all their teams' calls and emails.
Developed an internal annotation platform to increase manual annotations using weak labels and designed a data augmentation strategy that increased user data size fourfold.
Fine-tuned a Pegasus large model on video call summary data using the Hugging Face Transformers and Microsoft's DeepSpeed libraries to automatically generate meeting actions and summaries.

Technologies: Artificial Intelligence (AI), OpenAI GPT-4 API, OpenAI GPT-3 API, Language Models, Django, Hugging Face, Generative Pre-trained Transformers (GPT), Elasticsearch, PostgreSQL, Redis, Google Cloud, Docker, Causal Inference, Fine-tuning, Generative Artificial Intelligence (GenAI), Research

NLP Consultant

2021 - 2023

Shortform

Pre-trained a LongT5 XXL model on three times more data that outperformed LongT5 XL on the BookSum dataset to write coherent reading guides for fiction books with personalized commentary.
Built agents powered by language models and vector DB search to assist users in creating expanding and contradicting points to a specific book's main theses.
Deployed a pipeline for summarizing book chapters using GPT-4 and a summary of summaries approach.

Technologies: Natural Language Processing (NLP), OpenAI GPT-4 API, Elasticsearch, Google Cloud, Artificial Intelligence (AI), Docker, Hugging Face, Causal Inference, Fine-tuning, Generative Artificial Intelligence (GenAI)

NLP Engineer

2021 - 2022

Grata

Finetuned a t5-3b model to generate descriptions of companies in a predefined format using text scraped from their website, achieving an 89% average BERTScore precision.
Deployed a finetuned t5-3b model on Amazon SageMaker to automatically generate descriptions of companies from their website.
Custom-built a question-answering dataset to finetune a RoBERTa-based model to automatically extract a company's specific information from its website—such as trading name, location, and products.

Technologies: JSON, Roku, Machine Learning, Deep Neural Networks, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Python 3, Sequence Models, BERT, PyTorch, Hugging Face, OpenAI, Artificial Intelligence (AI), Docker, Causal Inference, Fine-tuning, Generative Artificial Intelligence (GenAI)

NLP Engineer

2018 - 2021

Lloyds Banking Group

Developed Python scripts that extracted comments from internal social media sites, analyzed their change in sentiment over time, and visualized the findings in the Python Dash app.
Built a chatbot focused on improving colleagues' mental health through emotion logging capabilities and using a GPT-2 transformer that enabled it to have basic conversations with users.
Classified 100,000 customer cases automatically using categories identified by an LDA topic analysis model run on verbatim text commentary describing each case.
Utilized regular expressions to detect and encode personal customer data within an RDS database.

Technologies: Natural Language Generation (NLG), Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), R, Tableau, Python, Sequence Models, Hugging Face, Causal Inference

NLP Engineer

2020 - 2020

FACETITLE

Trained a BERT-based NER model to detect when a character was mentioned in tv show subtitles with a 95% degree of accuracy and displayed their headshot in real time on a Roku application.
Created a RoBERTa-based multiple-class classification model that categorizes the sentiment of episode reviews with a 92% degree of accuracy using a Hugging Face Transformer library.
Consulted with the founding team and helped them secure an NSF seed fund grant.

Technologies: Machine Learning, Natural Language Processing (NLP), Web Scraping, GPT, Generative Pre-trained Transformers (GPT), Python

Data Scientist

2016 - 2018

Lloyds Banking Group

Built a classification model for the direction of motion of the EUR/USD rate using an aggregation of the predictions of an entropy-based random forest model and bidirectional LSTMs.
Coordinated with finance business partners and business managers to develop a transparent deal pipeline income forecasting model with a 5% degree of accuracy.
Analyzed intraday correlations between European assets over the period preceding Brexit using VECM and VAR models to promote a strategy focused on German assets.
Automated the process for calculating annual income budgets for 21 industries using a linear regression model that analyzed a time series of yearly income data.

Technologies: Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), R, Machine Learning, Visual Basic for Applications (VBA), Python

Data Engineer

2014 - 2016

Lloyds Banking Group

Built data capturing and visualization tools for digital, commercial banking, and IT support teams.
Led a service improvement initiative that resolved 52% of financial market systems' problem records and set up a dashboard for tracking daily performance.
Conducted research on the financial feasibility of two new mobile banking testing products and estimated and discounted future predicted cash flows to drive a £50 million investment decision.

Technologies: Python, Visual Basic for Applications (VBA), Tableau

Experience

Emotion Classification Using a WAME Optimizer

Implemented the recently developed WAME optimizer by Mosca et al. to improve the performance of an emotion classification convolution neural network. I achieved accuracies higher than those of baseline optimizers such as Adam and RMSProp.

Skills

Languages

Python, R, SQL, Bash, C++, Visual Basic for Applications (VBA), Python 3

Other

Dashboard Design, Transformers, Natural Language Processing (NLP), Dash, Topic Modeling, Emotion Recognition, Sentiment Analysis, Machine Learning, Statistics, Artificial Intelligence (AI), Natural Language Generation (NLG), Neural Networks, Custom BERT, OCR, Hugging Face, Generative Pre-trained Transformer 3 (GPT-3), Language Models, DeepSpeed, GPT, Generative Pre-trained Transformers (GPT), Causal Inference, Bittensor, Fine-tuning, Generative Artificial Intelligence (GenAI), Research, Chatbots, Image Recognition, Web Scraping, Econometrics, Time Series Analysis, Deep Neural Networks, Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNN), Decision Tree Classification, Finite Element Analysis (FEA), Deep Learning, Generative Adversarial Networks (GANs), Roku, Voice, Sequence Models, BERT, OpenAI, OpenAI GPT-4 API, OpenAI GPT-3 API

Libraries/APIs

TensorFlow Deep Learning Library (TFLearn), Keras, TensorFlow, Pandas, DeepSpeech, PyTorch

Tools

MATLAB, Named-entity Recognition (NER), Tableau

Platforms

Docker, Google Cloud Platform (GCP)

Storage

PostgreSQL, JSON, Elasticsearch, Redis, Google Cloud

Frameworks

Django

Paradigms

Data Science

Education

2018 - 2020

Master of Research Degree in Machine Learning

Birkbeck University of London - London, United Kingdom

2016 - 2018

Master's Degree in Finance

London Business School - London, United Kingdom

2010 - 2014

Master of Science Degree in Aeronautical Engineering

Durham University - Durham, United Kingdom

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring