Karim Foda
Verified Expert in Engineering
NLP Researcher and Developer
London, United Kingdom
Toptal member since July 6, 2020
Karim is an NLP researcher with in-depth and hands-on experience working on building machine learning (ML) models that aim to replicate specific human functions, thereby accelerating a business's processes. Most recently, Karim's focus has been on training large language models (LLMs) for natural language understanding (NLU) and natural language generation (NLG) through conversational chatbots.
Portfolio
Experience
- Python - 9 years
- Natural Language Processing (NLP) - 5 years
- Transformers - 3 years
- Hugging Face - 3 years
- OpenAI - 2 years
- Generative Pre-trained Transformer 3 (GPT-3) - 2 years
- Web Scraping - 2 years
- TensorFlow Deep Learning Library (TFLearn) - 2 years
Availability
Preferred Environment
Python
The most amazing...
...thing I believe I've built is a LongT5 model fine-tuned on generating automatic summaries of self-help books.
Work Experience
Lead NLP Engineer
Kaizan
- Built a GPT-4-driven chatbot that combined factored cognition, LangChain, and Elasticsearch to augment an organization's employees with a perfect memory of all their teams' calls and emails.
- Developed an internal annotation platform to increase manual annotations using weak labels and designed a data augmentation strategy that increased user data size fourfold.
- Fine-tuned a Pegasus large model on video call summary data using the Hugging Face Transformers and Microsoft's DeepSpeed libraries to automatically generate meeting actions and summaries.
NLP Consultant
Shortform
- Pre-trained a LongT5 XXL model on three times more data that outperformed LongT5 XL on the BookSum dataset to write coherent reading guides for fiction books with personalized commentary.
- Built agents powered by language models and vector DB search to assist users in creating expanding and contradicting points to a specific book's main theses.
- Deployed a pipeline for summarizing book chapters using GPT-4 and a summary of summaries approach.
NLP Engineer
Grata
- Finetuned a t5-3b model to generate descriptions of companies in a predefined format using text scraped from their website, achieving an 89% average BERTScore precision.
- Deployed a finetuned t5-3b model on Amazon SageMaker to automatically generate descriptions of companies from their website.
- Custom-built a question-answering dataset to finetune a RoBERTa-based model to automatically extract a company's specific information from its website—such as trading name, location, and products.
NLP Engineer
Lloyds Banking Group
- Developed Python scripts that extracted comments from internal social media sites, analyzed their change in sentiment over time, and visualized the findings in the Python Dash app.
- Built a chatbot focused on improving colleagues' mental health through emotion logging capabilities and using a GPT-2 transformer that enabled it to have basic conversations with users.
- Classified 100,000 customer cases automatically using categories identified by an LDA topic analysis model run on verbatim text commentary describing each case.
- Utilized regular expressions to detect and encode personal customer data within an RDS database.
NLP Engineer
FACETITLE
- Trained a BERT-based NER model to detect when a character was mentioned in tv show subtitles with a 95% degree of accuracy and displayed their headshot in real time on a Roku application.
- Created a RoBERTa-based multiple-class classification model that categorizes the sentiment of episode reviews with a 92% degree of accuracy using a Hugging Face Transformer library.
- Consulted with the founding team and helped them secure an NSF seed fund grant.
Data Scientist
Lloyds Banking Group
- Built a classification model for the direction of motion of the EUR/USD rate using an aggregation of the predictions of an entropy-based random forest model and bidirectional LSTMs.
- Coordinated with finance business partners and business managers to develop a transparent deal pipeline income forecasting model with a 5% degree of accuracy.
- Analyzed intraday correlations between European assets over the period preceding Brexit using VECM and VAR models to promote a strategy focused on German assets.
- Automated the process for calculating annual income budgets for 21 industries using a linear regression model that analyzed a time series of yearly income data.
Data Engineer
Lloyds Banking Group
- Built data capturing and visualization tools for digital, commercial banking, and IT support teams.
- Led a service improvement initiative that resolved 52% of financial market systems' problem records and set up a dashboard for tracking daily performance.
- Conducted research on the financial feasibility of two new mobile banking testing products and estimated and discounted future predicted cash flows to drive a £50 million investment decision.
Experience
Emotion Classification Using a WAME Optimizer
Education
Master of Research Degree in Machine Learning
Birkbeck University of London - London, United Kingdom
Master's Degree in Finance
London Business School - London, United Kingdom
Master of Science Degree in Aeronautical Engineering
Durham University - Durham, United Kingdom
Skills
Libraries/APIs
TensorFlow Deep Learning Library (TFLearn), Keras, TensorFlow, Pandas, DeepSpeech, DeepSpeed, PyTorch
Tools
MATLAB, Named-entity Recognition (NER), Tableau
Languages
Python, R, SQL, Bash, C++, Visual Basic for Applications (VBA), Python 3
Platforms
Docker, Google Cloud Platform (GCP)
Storage
PostgreSQL, JSON, Elasticsearch, Redis, Google Cloud
Frameworks
Django
Other
Dashboard Design, Transformers, Natural Language Processing (NLP), Dash, Topic Modeling, Emotion Recognition, Sentiment Analysis, Machine Learning, Statistics, Artificial Intelligence (AI), Natural Language Generation (NLG), Neural Networks, Custom BERT, Optical Character Recognition (OCR), Hugging Face, Generative Pre-trained Transformer 3 (GPT-3), Language Models, Generative Pre-trained Transformers (GPT), Causal Inference, Bittensor, Fine-tuning, Generative Artificial Intelligence (GenAI), Research, Chatbots, Image Recognition, Web Scraping, Econometrics, Time Series Analysis, Data Science, Deep Neural Networks (DNNs), Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Decision Tree Classification, Finite Element Analysis (FEA), Deep Learning, Generative Adversarial Networks (GANs), Roku, Voice, Sequence Models, BERT, OpenAI, OpenAI GPT-4 API, OpenAI GPT-3 API, AI Content Creation
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring