
Thierno Ibrahima Diop
Verified Expert in Engineering
Data Scientist and Developer
Dakar, Dakar Region, Senegal
Toptal member since April 25, 2022
Thierno is a lead data scientist and is passionate about natural language processing (NLP) and everything that revolves around machine learning (ML). He has been mentoring data scientist apprentices for three years. He previously did freelance work for three years in web and mobile application development. Thierno is co-founder of GalsenAI, an artificial intelligence (AI) community in Senegal, a Coursera instructor on data science, and a Google developer expert in ML.
Portfolio
Experience
- Machine Learning - 4 years
- Natural Language Processing (NLP) - 4 years
- Artificial Intelligence (AI) - 4 years
- Python - 4 years
- Language Models - 4 years
- Pandas - 4 years
- Generative Artificial Intelligence (GenAI) - 3 years
- SpaCy - 2 years
Availability
Preferred Environment
Jupyter Notebook, Visual Studio Code (VS Code), TensorFlow, PyTorch, Scikit-learn, Keras, SpaCy, OpenAI, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs)
The most amazing...
...model I've developed is a system detecting different security issues in code. It was built using large language models, such as GPT and LLaMA.
Work Experience
Senior Interview Engineer
Karat
- Accomplished more than 400 interviews and became a senior in less than one year.
- Handled quality control for other interviewers before the results were shared with the clients.
- Gave live reviews for the onboarding of new interviewers.
NLP/LLM Research Engineer
Flock.io
- Built a multi-agent system using LangGraph and tools to streamline Ethereum smart contract creation.
- Combined RAG techniques and prompt engineering to enhance agent collaboration and produce higher-quality code.
- Established a dynamic development workflow that adapts to evolving requirements and best practices.
- Collaborated in creating smart contracts, tests, and execution environments, incorporating a feedback loop for continuous improvement.
NLP/LLM/Data Pipeline Engineer
Gartner - Engineering
- Integrated LLMs with vector-based embeddings to support advanced information retrieval and text generation.
- Implemented re-ranking and filtering methods to ensure contextually relevant, high-quality responses.
- Designed indexing and data pipelines to optimize query efficiency and scalability.
AI Developer via Toptal
Desert Moon Speech Services LLC
- Collected data to convert audio to phonemes. The data was then processed to handle noise, durations, and IPA conversion.
- Trained a simple classification model on the phoneme level as input using transfer learning.
- Transformed the problem to speech recognition for more context and more available data.
- Handed label imbalance as some phonemes are rare.
CEO | Lead Data Scientist
NuurAI
- Led a team of machine learning engineers applying deep learning to detect a popular reciter from an audio input.
- Guided the machine learning engineers in applying deep learning to compute the similarity of a user compared to a reciter.
- Helped the team implement deep learning techniques and experiment with our use cases.
NLP Research Engineer
FLock.io
- Tested different prompt techniques (zero-shot learning, few-shot learning, chain-of-thought, and dynamic few-shot) with different LLMs on more than 20 security issues.
- Finetunned LLMs to solve complex security issues and prepared the data for the models.
- Created the pipeline to process code with intermediate representation and evaluate LLMs.
- Performed topic modeling with GMM and LDA using embeddings from LLMs.
- Generated code using LLM for fuzz testing on the different security issues by creating an agent.
- Built the API and created the releases used in production.
- Multithreaded to accelerate prediction and inference time.
Lead Data Scientist
Baamtu
- Created a text-to-speech program with the Wolof language. Coordinated the data collection with two actors using an algorithm to convert the text to phonemes in Wolof and evaluated phoneme coverage.
- Contributed to the automatic speech recognition in the Wolof language. Designed a platform to collect raw Wolof audio for self-supervised learning.
- Built optical character recognition (OCR) and computer vision models to extract structured data from national ID cards. Deployed models on-premise and AWS Lambda functions for scalability. Built a rotation model to handle the image rotation.
Data Scientist
Baamtu
- Used NLP and NLU to extract useful information in a legal text. Developed a regex tester library.
- Developed an extractive chatbot for automatic FAQ for a telecommunication company with data collection by scraping websites and Twitter.
- Performed data collection and annotation. Deployed using AWS Lambda.
- Developed a rule system with Spark to implement a flexible scoring system with job management and scheduling of the scoring system with Apache Airflow.
- Executed customer segmentation in the telecom domain using data from multiple sources. Compared clustering models with theoretical and business metrics.
Developer
Freelance
- Acted as the full-stack web and mobile developer while working for multiple customers.
- Contributed to the conception and realization of the ProsDispo mobile and web app.
- Developed a web application for the purchase of phone credit.
- Created and worked with the WebChat application using WebSocket.
- Developed REST APIs for the dematerialization of meetings at Gainde 2000, a strategic platform of Senegalese customs centered on customs clearance management.
- Created a web app for various football competitions.
- Built a web service and a social cross-platform mobile application.
- Developed and orchestrated a news website using WordPress.
Experience
Automatic Speech Recognition for the Wolof Language.
This project was challenging due to the scarcity of data, so multiple techniques and tricks were used to make it work.
Wolof Speech Recognition
Chatbot for Customer Support in Telecommunication
Multiple text feature extraction and models were tested and compared using multiple similarity metrics.
Education
Master's Degree in Computer Science
Ecole Superieur Polytechnique de Dakar - Dakar, Senegal
Bachelor's Degree in Computer Science
Ecole Superieur Polytechnique de Dakar - Dakar, Senegal
Certifications
Cloudera CCA 175 Spark and Hadoop Developer
Cloudera
Skills
Libraries/APIs
TensorFlow, Scikit-learn, Keras, Pandas, Matplotlib, NumPy, OpenAI API, PyTorch, SpaCy, React, SciPy, DeepSpeech
Tools
Gensim, Whisper, AI Prompts, Apache Airflow, Amazon Textract, Amazon SageMaker, ChatGPT, Kaldi, Git, Seaborn, TensorBoard, Apache Solr
Languages
Python 3, Python, Bash Script, SQL, TypeScript, PHP, Java, R
Frameworks
Flask, Spark, Streamlit, Symfony, Angular, Ionic, Scrapy, LangGraph
Platforms
Jupyter Notebook, Amazon EC2, Amazon Web Services (AWS), AWS Lambda, Docker, Blockchain
Storage
Amazon S3 (AWS S3), PostgreSQL, Amazon DynamoDB, Databases, Data Pipelines
Paradigms
Fuzz Testing
Other
Natural Language Processing (NLP), Audio, Artificial Intelligence (AI), Machine Learning, Neural Networks, APIs, Hiring, Code Review, Source Code Review, Interviewing, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, Generative Pre-trained Transformers (GPT), Speech to Text, Large Language Models (LLMs), Generative Artificial Intelligence (GenAI), API Integration, AI Agents, Hugging Face, Retrieval-augmented Generation (RAG), Vector Databases, Data Science, AI Chatbots, Open Source, Prompt Engineering, Text to Speech (TTS), Fine-tuning, Transformers, Web Scraping, Team Management, FastAPI, LangChain, Machine Learning Operations (MLOps), Algorithms, Data Structures, DVC, Optical Character Recognition (OCR), Deep Learning, Artificial Neural Networks (ANN), Speech Recognition, OpenAI, Semantic Web, Topic Modeling, Clustering, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API, Transfer Learning, Multiagent Generative Systems (MAGs)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring