Verified Expert in Engineering
NLP Researcher and Developer
Karim is an NLP researcher with in-depth and hands-on experience working on building machine learning (ML) models that aim to replicate specific human functions, thereby accelerating a business's processes. Most recently, Karim's focus has been on training large language models (LLMs) for natural language understanding (NLU) and natural language generation (NLG) through conversational chatbots.
The most amazing...
...thing I believe I've built is a LongT5 model fine-tuned on generating automatic summaries of self-help books.
Lead NLP Engineer
- Built a GPT-4-driven chatbot that combined factored cognition, LangChain, and Elasticsearch to augment an organization's employees with a perfect memory of all their teams' calls and emails.
- Developed an internal annotation platform to increase manual annotations using weak labels and designed a data augmentation strategy that increased user data size fourfold.
- Fine-tuned a Pegasus large model on video call summary data using the Hugging Face Transformers and Microsoft's DeepSpeed libraries to automatically generate meeting actions and summaries.
- Pre-trained a LongT5 XXL model on three times more data that outperformed LongT5 XL on the BookSum dataset to write coherent reading guides for fiction books with personalized commentary.
- Built agents powered by language models and vector DB search to assist users in creating expanding and contradicting points to a specific book's main theses.
- Deployed a pipeline for summarizing book chapters using GPT-4 and a summary of summaries approach.
- Finetuned a t5-3b model to generate descriptions of companies in a predefined format using text scraped from their website, achieving an 89% average BERTScore precision.
- Deployed a finetuned t5-3b model on Amazon SageMaker to automatically generate descriptions of companies from their website.
- Custom-built a question-answering dataset to finetune a RoBERTa-based model to automatically extract a company's specific information from its website—such as trading name, location, and products.
Lloyds Banking Group
- Developed Python scripts that extracted comments from internal social media sites, analyzed their change in sentiment over time, and visualized the findings in the Python Dash app.
- Built a chatbot focused on improving colleagues' mental health through emotion logging capabilities and using a GPT-2 transformer that enabled it to have basic conversations with users.
- Classified 100,000 customer cases automatically using categories identified by an LDA topic analysis model run on verbatim text commentary describing each case.
- Utilized regular expressions to detect and encode personal customer data within an RDS database.
- Trained a BERT-based NER model to detect when a character was mentioned in tv show subtitles with a 95% degree of accuracy and displayed their headshot in real time on a Roku application.
- Created a RoBERTa-based multiple-class classification model that categorizes the sentiment of episode reviews with a 92% degree of accuracy using a Hugging Face Transformer library.
- Consulted with the founding team and helped them secure an NSF seed fund grant.
Lloyds Banking Group
- Built a classification model for the direction of motion of the EUR/USD rate using an aggregation of the predictions of an entropy-based random forest model and bidirectional LSTMs.
- Coordinated with finance business partners and business managers to develop a transparent deal pipeline income forecasting model with a 5% degree of accuracy.
- Analyzed intraday correlations between European assets over the period preceding Brexit using VECM and VAR models to promote a strategy focused on German assets.
- Automated the process for calculating annual income budgets for 21 industries using a linear regression model that analyzed a time series of yearly income data.
Lloyds Banking Group
- Built data capturing and visualization tools for digital, commercial banking, and IT support teams.
- Led a service improvement initiative that resolved 52% of financial market systems' problem records and set up a dashboard for tracking daily performance.
- Conducted research on the financial feasibility of two new mobile banking testing products and estimated and discounted future predicted cash flows to drive a £50 million investment decision.
Emotion Classification Using a WAME Optimizer
Python, R, SQL, Bash, C++, Visual Basic for Applications (VBA), Python 3
Dashboard Design, Transformers, Natural Language Processing (NLP), Dash, Topic Modeling, Emotion Recognition, Sentiment Analysis, Machine Learning, Statistics, Artificial Intelligence (AI), Natural Language Generation (NLG), Neural Networks, Custom BERT, OCR, Hugging Face, Generative Pre-trained Transformer 3 (GPT-3), Language Models, DeepSpeed, GPT, Generative Pre-trained Transformers (GPT), Causal Inference, Bittensor, Fine-tuning, Generative Artificial Intelligence (GenAI), Research, Chatbots, Image Recognition, Web Scraping, Econometrics, Time Series Analysis, Deep Neural Networks, Recurrent Neural Networks (RNN), Convolutional Neural Networks, Decision Tree Classification, Finite Element Analysis (FEA), Deep Learning, Generative Adversarial Networks (GANs), Roku, Voice, Sequence Models, BERT, OpenAI, OpenAI GPT-4 API, OpenAI GPT-3 API
TensorFlow Deep Learning Library (TFLearn), Keras, TensorFlow, Pandas, DeepSpeech, PyTorch
MATLAB, Named-entity Recognition (NER), Tableau
Docker, Google Cloud Platform (GCP)
PostgreSQL, JSON, Elasticsearch, Redis, Google Cloud
Master of Research Degree in Machine Learning
Birkbeck University of London - London, United Kingdom
Master's Degree in Finance
London Business School - London, United Kingdom
Master of Science Degree in Aeronautical Engineering
Durham University - Durham, United Kingdom