Thierno Ibrahima Diop, Developer in Dakar, Dakar Region, Senegal
Thierno is available for hire
Hire Thierno

Thierno Ibrahima Diop

Verified Expert  in Engineering

Data Scientist and Developer

Location
Dakar, Dakar Region, Senegal
Toptal Member Since
April 25, 2022

Thierno is a lead data scientist and is passionate about natural language processing (NLP) and everything that revolves around machine learning (ML). He has been mentoring data scientist apprentices for three years. He previously did freelance work for three years in web and mobile application development. Thierno is co-founder of GalsenAI, an artificial intelligence (AI) community in Senegal, a Coursera instructor on data science, and a Google developer expert in ML.

Portfolio

NuurAI
GPT, Natural Language Processing (NLP)...
Karat
Code Review, Source Code Review, Hiring, Interviewing, Programming
Desert Moon Speech Services LLC
Artificial Intelligence (AI), Natural Language Processing (NLP)...

Experience

Availability

Part-time

Preferred Environment

Jupyter Notebook, Visual Studio Code (VS Code), TensorFlow, PyTorch, Scikit-learn, Keras, Flask, SpaCy, Gensim, OpenAI

The most amazing...

...model I've developed is a system detecting different security issues in code. It was built using large language models, such as GPT and LLaMA.

Work Experience

CEO | Lead Data Scientist

2022 - PRESENT
NuurAI
  • Led a team of machine learning engineers applying deep learning to detect a popular reciter from an audio input.
  • Guided the machine learning engineers in applying deep learning to compute the similarity of a user compared to a reciter.
  • Helped the team implement deep learning techniques and experiment with our use cases.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPT, Audio, TensorFlow, PyTorch, Python 3, Artificial Intelligence (AI), Jupyter Notebook, Scikit-learn, Keras, DVC, Git, Matplotlib, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), Neural Networks, Team Management, Interviewing, Hiring, Code Review, Programming, PostgreSQL

Senior Interview Engineer

2021 - PRESENT
Karat
  • Accomplished more than 400 interviews and became a senior in less than one year.
  • Handled quality control for other interviewers before the results were shared with the clients.
  • Gave live reviews for the onboarding of new interviewers.
Technologies: Code Review, Source Code Review, Hiring, Interviewing, Programming

AI Developer via Toptal

2023 - 2024
Desert Moon Speech Services LLC
  • Collected data to convert audio to phonemes. The data was then processed to handle noise, durations, and IPA conversion.
  • Trained a simple classification model on the phoneme level as input using transfer learning.
  • Transformed the problem to speech recognition for more context and more available data.
  • Handed label imbalance as some phonemes are rare.
Technologies: Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning, Speech to Text, Audio, Deep Learning, Transfer Learning, FastAPI, Docker, PyTorch, Python 3

NLP Research Engineer

2023 - 2023
FLock.io
  • Tested different prompt techniques (zero-shot learning, few-shot learning, chain-of-thought, and dynamic few-shot) with different LLMs on more than 20 security issues.
  • Finetunned LLMs to solve complex security issues and prepared the data for the models.
  • Created the pipeline to process code with intermediate representation and evaluate LLMs.
  • Performed topic modeling with GMM and LDA using embeddings from LLMs.
  • Generated code using LLM for fuzz testing on the different security issues by creating an agent.
  • Built the API and created the releases used in production.
  • Multithreaded to accelerate prediction and inference time.
Technologies: Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Python, Artificial Intelligence (AI), Machine Learning, Deep Learning, Topic Modeling, Clustering, Fuzz Testing, Language Models, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API

Lead Data Scientist

2019 - 2021
Baamtu
  • Created a text-to-speech program with the Wolof language. Coordinated the data collection with two actors using an algorithm to convert the text to phonemes in Wolof and evaluated phoneme coverage.
  • Contributed to the automatic speech recognition in the Wolof language. Designed a platform to collect raw Wolof audio for self-supervised learning.
  • Built optical character recognition (OCR) and computer vision models to extract structured data from national ID cards. Deployed models on-premise and AWS Lambda functions for scalability. Built a rotation model to handle the image rotation.
Technologies: TensorFlow, PyTorch, Scikit-learn, Pandas, Python, DVC, Bash Script, Amazon S3 (AWS S3), Amazon Web Services (AWS), Amazon EC2, Neural Networks, DeepSpeech, Deep Learning, NumPy, OCR, Seaborn, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Git, Jupyter Notebook, SpaCy, Machine Learning, Artificial Intelligence (AI), Artificial Neural Networks (ANN), APIs, SQL, Team Management, Source Code Review, Interviewing, Hiring, Code Review, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, AWS Lambda, Amazon Textract, Amazon SageMaker

Data Scientist

2018 - 2019
Baamtu
  • Used NLP and NLU to extract useful information in a legal text. Developed a regex tester library.
  • Developed an extractive chatbot for automatic FAQ for a telecommunication company with data collection by scraping websites and Twitter.
  • Performed data collection and annotation. Deployed using AWS Lambda.
  • Developed a rule system with Spark to implement a flexible scoring system with job management and scheduling of the scoring system with Apache Airflow.
  • Executed customer segmentation in the telecom domain using data from multiple sources. Compared clustering models with theoretical and business metrics.
Technologies: TensorFlow, PyTorch, Scikit-learn, Pandas, Matplotlib, Python 3, Flask, Spark, Apache Airflow, Git, DVC, Gensim, SpaCy, Kaldi, Docker, Bash Script, Audio, Artificial Intelligence (AI), Jupyter Notebook, Keras, Streamlit, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), Neural Networks, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), OCR, NumPy, SciPy, Seaborn, TensorBoard, APIs, SQL, Java, Source Code Review, Programming, Chatbots, Semantic Web, Databases, Language Models, AWS Lambda, Amazon Textract, Amazon SageMaker, Amazon DynamoDB

Developer

2015 - 2018
Freelance
  • Acted as the full-stack web and mobile developer while working for multiple customers.
  • Contributed to the conception and realization of the ProsDispo mobile and web app.
  • Developed a web application for the purchase of phone credit.
  • Created and worked with the WebChat application using WebSocket.
  • Developed REST APIs for the dematerialization of meetings at Gainde 2000, a strategic platform of Senegalese customs centered on customs clearance management.
  • Created a web app for various football competitions.
  • Built a web service and a social cross-platform mobile application.
  • Developed and orchestrated a news website using WordPress.
Technologies: PHP, Symfony, Angular, Ionic, React, Bash Script, Python 3, Jupyter Notebook, Git, Amazon EC2, Python, Amazon S3 (AWS S3), Machine Learning, Amazon Web Services (AWS), APIs, Programming, PostgreSQL, AWS Lambda, Amazon DynamoDB

Automatic Speech Recognition for the Wolof Language.

Developed a speech recognition model for the Wolof language. This project involved audio data collection, and multiple models and methods were evaluated. Data had to be verified and cleaned with respect to diversity and correctness. I transferred learning and from-scratch training with traditional and hybrid approaches.

This project was challenging due to the scarcity of data, so multiple techniques and tricks were used to make it work.

Wolof Speech Recognition

| contributed to creating automatic speech recognition in the Wolof language. I designed a platform to collect raw Wolof audio for self-supervised learning and built and deployed the resulting model.

Chatbot for Customer Support in Telecommunication

A chatbot application to semi-automate customer support and FAQ. The data was scraped from multiple websites and cleaned to build an extractive chatbot.
Multiple text feature extraction and models were tested and compared using multiple similarity metrics.

Languages

Python 3, Python, Bash Script, SQL, PHP, Java, R

Frameworks

Flask, Spark, Streamlit, Symfony, Angular, Ionic, Scrapy

Libraries/APIs

TensorFlow, Scikit-learn, Keras, Pandas, Matplotlib, PyTorch, SpaCy, React, NumPy, SciPy, DeepSpeech

Tools

Gensim, Apache Airflow, Amazon Textract, Amazon SageMaker, Kaldi, Git, Seaborn, TensorBoard, Whisper

Platforms

Jupyter Notebook, Amazon EC2, Amazon Web Services (AWS), AWS Lambda, Docker

Storage

Amazon S3 (AWS S3), PostgreSQL, Amazon DynamoDB, Databases

Other

Natural Language Processing (NLP), Audio, Artificial Intelligence (AI), Machine Learning, Neural Networks, Hiring, Code Review, Source Code Review, Interviewing, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, GPT, Generative Pre-trained Transformers (GPT), Team Management, ChatGPT, DVC, OCR, Deep Learning, Artificial Neural Networks (ANN), APIs, Speech Recognition, OpenAI, Semantic Web, Topic Modeling, Clustering, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API, Speech to Text, Transfer Learning, FastAPI

Paradigms

Fuzz Testing

2015 - 2018

Master's Degree in Computer Science

Ecole Superieur Polytechnique de Dakar - Dakar, Senegal

2013 - 2015

Bachelor's Degree in Computer Science

Ecole Superieur Polytechnique de Dakar - Dakar, Senegal

JANUARY 2018 - PRESENT

Cloudera CCA 175 Spark and Hadoop Developer

Cloudera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring