
Thierno Ibrahima Diop
Verified Expert in Engineering
Data Scientist and Developer
Thierno is a lead data scientist and is passionate about natural language processing (NLP) and everything that revolves around machine learning (ML). He has been mentoring data scientist apprentices for three years. He previously did freelance work for three years in web and mobile application development. Thierno is co-founder of GalsenAI, an artificial intelligence (AI) community in Senegal, a Coursera instructor on data science, and a Google developer expert in ML.
Portfolio
Experience
Availability
Preferred Environment
Jupyter Notebook, Visual Studio Code (VS Code), TensorFlow, PyTorch, Scikit-learn, Keras, Flask, SpaCy, Gensim, OpenAI
The most amazing...
...The model I've developed is a system detecting different security issues in code. The system was built using large language models, such as GPT and LLama.
Work Experience
CEO | Lead Data Scientist
NuurAI
- Led a team of machine learning engineers applying deep learning to detect a popular reciter from an audio input.
- Guided the machine learning engineers in applying deep learning to compute the similarity of a user compared to a reciter.
- Helped the team implement deep learning techniques and experiment with our use cases.
Senior Interview Engineer
Karat
- Accomplished more than 400 interviews and became a senior in less than one year.
- Handled quality control for other interviewers before the results were shared with the clients.
- Gave live reviews for the onboarding of new interviewers.
NLP Research Engineer for a Web3 Project
Toptal
- Tested different prompt techniques(zero-shot, few-shot, Chain of Thought, Dynamic few-shot) with different LLMs.
- Finetunning LLMs to solve complex security issues and preparing the data for the models.
- Create the pipeline to process code with intermediate representation and evaluate LLMs.
- Generate code using LLM for fuzz testing on the different security issues by creating an agent.
- Built the API functions and created the releases used in production.
- Multi-threading to accelerate prediction and inference time.
Lead Data Scientist
Baamtu
- Created a text-to-speech program with the Wolof language. Coordinated the data collection with two actors using an algorithm to convert the text to phonemes in Wolof and evaluated phoneme coverage.
- Contributed to the automatic speech recognition in the Wolof language. Designed a platform to collect raw Wolof audio for self-supervised learning.
- Built optical character recognition (OCR) and computer vision models to extract structured data from national ID cards. Deployed models on-premise and AWS Lambda functions for scalability. Built a rotation model to handle the image rotation.
Data Scientist
Baamtu
- Used NLP and NLU to extract useful information in a legal text. Developed a regex tester library.
- Developed an extractive chatbot for automatic FAQ for a telecommunication company with data collection by scraping websites and Twitter.
- Performed data collection and annotation. Deployed using AWS Lambda.
- Developed a rule system with Spark to implement a flexible scoring system with job management and scheduling of the scoring system with Apache Airflow.
- Executed customer segmentation in the telecom domain using data from multiple sources. Compared clustering models with theoretical and business metrics.
Developer
Freelance
- Acted as the full-stack web and mobile developer while working for multiple customers.
- Contributed to the conception and realization of the ProsDispo mobile and web app.
- Developed a web application for the purchase of phone credit.
- Created and worked with the WebChat application using WebSocket.
- Developed REST APIs for the dematerialization of meetings at Gainde 2000, a strategic platform of Senegalese customs centered on customs clearance management.
- Created a web app for various football competitions.
- Built a web service and a social cross-platform mobile application.
- Developed and orchestrated a news website using WordPress.
Experience
Automatic Speech Recognition for the Wolof Language.
This project was challenging due to the scarcity of data, so multiple techniques and tricks were used to make it work.
Wolof Speech Recognition
Chatbot for Customer Support in Telecommunication
Multiple text feature extraction and models were tested and compared using multiple similarity metrics.
Skills
Languages
Python 3, Python, Bash Script, SQL, PHP, Java, R
Frameworks
Flask, Spark, Symfony, Angular, Ionic, Scrapy
Libraries/APIs
TensorFlow, Scikit-learn, Keras, Pandas, Matplotlib, PyTorch, SpaCy, React, NumPy, SciPy, DeepSpeech
Tools
Gensim, Apache Airflow, Amazon SageMaker, Kaldi, Git, Seaborn, TensorBoard
Platforms
Jupyter Notebook, Amazon EC2, Amazon Web Services (AWS), AWS Lambda, Docker
Storage
Amazon S3 (AWS S3), PostgreSQL, Amazon DynamoDB, Databases
Other
Natural Language Processing (NLP), Audio, Artificial Intelligence (AI), Machine Learning, Neural Networks, Hiring, Code Review, Source Code Review, Interviewing, Programming, Chatbots, BERT, Sentiment Analysis, Language Models, GPT, Generative Pre-trained Transformers (GPT), Team Management, Amazon Textract, ChatGPT, Streamlit, DVC, OCR, Deep Learning, Artificial Neural Networks (ANN), APIs, Speech Recognition, OpenAI, Semantic Web, Topic Modeling, Clustering, Text Classification, OpenAI GPT-4 API, OpenAI GPT-3 API
Paradigms
Fuzz Testing
Education
Master's Degree in Computer Science
Ecole Superieur Polytechnique de Dakar - Dakar, Senegal
Bachelor's Degree in Computer Science
Ecole Superieur Polytechnique de Dakar - Dakar, Senegal
Certifications
Cloudera CCA 175 Spark and Hadoop Developer
Cloudera