Thierno Ibrahima Diop
Verified Expert in Engineering
Data Scientist and Developer
Thierno is a lead data scientist and is passionate about natural language processing (NLP) and everything that revolves around machine learning (ML). He has been mentoring data scientist apprentices for three years. He previously did freelance work for three years in web and mobile application development. Thierno is co-founder of GalsenAI, an artificial intelligence (AI) community in Senegal, a Coursera instructor on data science, and a Google developer expert in ML.
Jupyter Notebook, Visual Studio Code (VS Code), TensorFlow, PyTorch, Scikit-learn, Keras, Flask, SpaCy, Gensim, OpenAI
The most amazing...
...model I've developed is a system detecting different security issues in code. It was built using large language models, such as GPT and LLaMA.
CEO | Lead Data Scientist
- Led a team of machine learning engineers applying deep learning to detect a popular reciter from an audio input.
- Guided the machine learning engineers in applying deep learning to compute the similarity of a user compared to a reciter.
- Helped the team implement deep learning techniques and experiment with our use cases.
Senior Interview Engineer
- Accomplished more than 400 interviews and became a senior in less than one year.
- Handled quality control for other interviewers before the results were shared with the clients.
- Gave live reviews for the onboarding of new interviewers.
NLP Research Engineer
- Tested different prompt techniques (zero-shot learning, few-shot learning, chain-of-thought, and dynamic few-shot) with different LLMs on more than 20 security issues.
- Finetunned LLMs to solve complex security issues and prepared the data for the models.
- Created the pipeline to process code with intermediate representation and evaluate LLMs.
- Performed topic modeling with GMM and LDA using embeddings from LLMs.
- Generated code using LLM for fuzz testing on the different security issues by creating an agent.
- Built the API and created the releases used in production.
- Multithreaded to accelerate prediction and inference time.
Lead Data Scientist
- Created a text-to-speech program with the Wolof language. Coordinated the data collection with two actors using an algorithm to convert the text to phonemes in Wolof and evaluated phoneme coverage.
- Contributed to the automatic speech recognition in the Wolof language. Designed a platform to collect raw Wolof audio for self-supervised learning.
- Built optical character recognition (OCR) and computer vision models to extract structured data from national ID cards. Deployed models on-premise and AWS Lambda functions for scalability. Built a rotation model to handle the image rotation.
- Used NLP and NLU to extract useful information in a legal text. Developed a regex tester library.
- Developed an extractive chatbot for automatic FAQ for a telecommunication company with data collection by scraping websites and Twitter.
- Performed data collection and annotation. Deployed using AWS Lambda.
- Developed a rule system with Spark to implement a flexible scoring system with job management and scheduling of the scoring system with Apache Airflow.
- Executed customer segmentation in the telecom domain using data from multiple sources. Compared clustering models with theoretical and business metrics.
- Acted as the full-stack web and mobile developer while working for multiple customers.
- Contributed to the conception and realization of the ProsDispo mobile and web app.
- Developed a web application for the purchase of phone credit.
- Created and worked with the WebChat application using WebSocket.
- Developed REST APIs for the dematerialization of meetings at Gainde 2000, a strategic platform of Senegalese customs centered on customs clearance management.
- Created a web app for various football competitions.
- Built a web service and a social cross-platform mobile application.
- Developed and orchestrated a news website using WordPress.
Automatic Speech Recognition for the Wolof Language.
This project was challenging due to the scarcity of data, so multiple techniques and tricks were used to make it work.
Wolof Speech Recognition
Chatbot for Customer Support in Telecommunication
Multiple text feature extraction and models were tested and compared using multiple similarity metrics.
Python 3, Python, Bash Script, SQL, PHP, Java, R
Flask, Spark, Streamlit, Symfony, Angular, Ionic, Scrapy
TensorFlow, Scikit-learn, Keras, Pandas, Matplotlib, PyTorch, SpaCy, React, NumPy, SciPy, DeepSpeech
Gensim, Apache Airflow, Amazon SageMaker, Kaldi, Git, Seaborn, TensorBoard, Whisper
Jupyter Notebook, Amazon EC2, Amazon Web Services (AWS), AWS Lambda, Docker
Amazon S3 (AWS S3), PostgreSQL, Amazon DynamoDB, Databases
Natural Language Processing (NLP), Audio, Artificial Intelligence (AI), Machine Learning, Neural Networks, Hiring, Code Review, Source Code Review, Interviewing, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, GPT, Generative Pre-trained Transformers (GPT), Team Management, Amazon Textract, ChatGPT, DVC, OCR, Deep Learning, Artificial Neural Networks (ANN), APIs, Speech Recognition, OpenAI, Semantic Web, Topic Modeling, Clustering, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API
Master's Degree in Computer Science
Ecole Superieur Polytechnique de Dakar - Dakar, Senegal
Bachelor's Degree in Computer Science
Ecole Superieur Polytechnique de Dakar - Dakar, Senegal
Cloudera CCA 175 Spark and Hadoop Developer
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.Start hiring