Ilya Prokin, Developer in Bordeaux, France
Ilya is available for hire
Hire Ilya

Ilya Prokin

Verified Expert  in Engineering

Data Science Developer

Location
Bordeaux, France
Toptal Member Since
September 7, 2022

Ilya is a data scientist, CTO, and AI tech entrepreneur with a PhD. He's an expert in applied data science, machine learning, AI, and LLM fine-tuning in the manufacturing, finance, and biotech industries. He has published scientific papers, scaled various MVPs to full-featured products, pitched startups, and built a strong AI community that spread to six cities. Ilya enjoys improving businesses with data, developing innovative ways to apply data and AI, and geeking out about optimization.

Portfolio

ImbaMed
Large Language Models (LLMs), Speech to Text, Text to Speech (TTS), Python 3...
LoanSnap - AI for US mortgage
Data Science, Python, Amazon Web Services (AWS), Google AI Platform, Docker...
Data Brunch
Data Science, Community, Communication, Biology, Machine Learning...

Experience

Availability

Full-time

Preferred Environment

Linux, Visual Studio Code (VS Code), Python, Slack

The most amazing...

...part of my ride was building and exiting startups with VC-backed, end-to-end AI products and building a strong data science community that spread across France.

Work Experience

Text-to-speech LLM Developer

2024 - PRESENT
ImbaMed
  • Developed MVP of virtual voice call sales agent, including text-to-speech, speech-to-text, and LLM sales logic.
  • Used OpenAI GPT to implement custom open LLM for sales agent logic.
  • Iterated over different alternative TTS solutions: ElevenLabs, Coqui, Bark, Silero, Piper, Edge TTS, Tortoise, WhisperSpeech, and OpenVoice.
  • Leveraged faster Whisper for speech-to-text and WhisperX for diarization.
Technologies: Large Language Models (LLMs), Speech to Text, Text to Speech (TTS), Python 3, Azure, Docker, Python, Natural Language Processing (NLP), Phonemes, GPU Computing, AI Chatbots, Custom Models, SWOT Analysis, Transformer Models, AI Model Training, Object-oriented Programming (OOP), AI Modeling, Databases, Cloud, Hugging Face, Open-source LLMs, Data, FastAPI

Data Science Leader

2021 - PRESENT
LoanSnap - AI for US mortgage
  • Coordinated a team of data scientists and engineers. Conducted daily standups and project management.
  • Provided weekly reports to senior leadership, specifically CTO, directors of capital markets, and product.
  • Drove the data science section at company meetings presentations and enabled cross-company collaborations on data science initiatives.
  • Participated in strategic initiatives planning and coordinated the execution effort. The data team's marketing recommendations increased lead volume by two times.
  • Developed custom models optimizing revenue and cost-critical decisions across the entire sales pipeline and secondary market hedging activities.
  • Gathered data from multiple online sources leveraging customized web strapping solutions and performed competitor intelligence analysis.
  • Developed empathetic LLM agents endowed with distinct personalities for personalized customer service. Used OpenAI's GPT-3.5 and GPT-4 API and customized LLMs.
Technologies: Data Science, Python, Amazon Web Services (AWS), Google AI Platform, Docker, Pandas, Scikit-learn, Data Scraping, ETL, Machine Learning, Recommendation Systems, Data Strategy, Data Visualization, Optimization, Linear Optimization, Statistical Analysis, API Integration, OpenAI GPT-4 API, Streamlit, Data Modeling, Forecasting, Amazon SageMaker, Classification, Text Classification, Data Pipelines, GPT, Pricing Models, Data-driven Marketing, Generative Pre-trained Transformers (GPT), OCR, OpenAI GPT-3 API, Tableau, PySpark, Artificial Intelligence (AI), Jupyter, AI Programming, Programming, User Interface (UI), Integration, Language Models, Data Analysis, Machine Learning Operations (MLOps), MySQL, Team Leadership, PostgreSQL, Software Architecture, Sentiment Analysis, Large Language Models (LLMs), Data Engineering, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, NumPy, Regression Modeling, Quantitative Analysis, OpenAI, Leadership, APIs, Generative Pre-trained Transformer 3 (GPT-3), Web Scraping, Research, Generative Artificial Intelligence (GenAI), Notion, Data Reporting, Llama 2, Google Cloud Platform (GCP), Vertex, Google Cloud, Amazon Machine Learning, Google Cloud Machine Learning, Prompt Engineering, AI Design, Databricks, Data Mining, Algorithms, Reporting, Selenium, Data Matching, CSV File Processing, Bots, Pricing, Logistic Regression, PEFT, LangChain, LoRa, Applied Research, Fine-tuning, Finance, Google BigQuery, Snowflake, Generative AI, Statistical Modeling, Natural Language Processing (NLP), GPU Computing, Elasticsearch, Marketing Mix Modeling, Custom Models, AI Research, Transformer Models, Exploratory Data Analysis, AI Model Training, Causal Inference, A/B Testing, Object-oriented Programming (OOP), AI Modeling, Unsupervised Learning, Data Extraction, Databases, PDF, Cloud, Statistics, Capital Markets, Hugging Face, Open-source LLMs, Data, Multithreading, FastAPI

Founder | Community Organizer

2019 - PRESENT
Data Brunch
  • Built a strong data science community that meets weekly to discuss state-of-the-art data science.
  • Grew an excellent data science ecosystem with access to various deep expertise, including accomplished researchers, math Olympiad winners, and strong, competitive data scientists.
  • Connected with experts across the country and helped data people find jobs.
Technologies: Data Science, Community, Communication, Biology, Machine Learning, Recommendation Systems, Data Visualization, Natural Language Processing (NLP), Forecasting, Classification, Text Classification, Data Pipelines, PySpark, Artificial Intelligence (AI), CTO, Programming, Data Analysis, SpaCy, Team Leadership, BERT, Custom BERT, Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, Pandas, Leadership, Content Writing, Research, Generative Artificial Intelligence (GenAI), Notion, Architecture, Technical Writing, Blogging, Data Reporting, AI Design, Algorithms, Selenium, CSV File Processing, LangChain, NLU, Google BigQuery, Generative AI, Statistical Modeling, Python, Text to Speech (TTS), Custom Models, AI Research, Exploratory Data Analysis, AI Model Training, uplift modeling, OpenCV, AI Modeling, Data Extraction, Databases, Statistics, Data

Data Scrapper & Collector

2023 - 2023
Nixtla Inc.
  • Identified and developed new sources: searched for reliable data sources on the Internet that provide time series datasets relevant to the business objectives.
  • Scraped various data sources and processed time series data from various online sources.
  • Worked closely with data scientists and machine learning engineers to provide high-quality data and contributed to analytics and predictive modeling projects.
  • Maintained comprehensive documentation that describes data sources, data transformations, and any challenges encountered during the process.
Technologies: Python, NumPy, Data Scraping, Pandas, Data Science, Industrial Internet of Things (IIoT), Object-oriented Programming (OOP), Data Extraction, Databases, Cloud, Data

NLP Machine Learning Developer

2023 - 2023
FirmPilot AI Inc
  • Developed a complete tech strategy and detailed specs for developers to build the product, leveraging OpenAI's ChatGPT, Google's Bard, PaLM2, and Anthropic Claude2, as well as custom open LLM.
  • Researched state-of-the-art tech solutions and recommended optimal choices, maximizing business impact.
  • Developed an innovative approach that utilized adversarial LLM training and fine-tuning.
Technologies: Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning, Python, Support Vector Machines (SVM), pgvector, ChatGPT, Architecture, Technical Writing, Llama 2, Prompt Engineering, AI Design, Algorithms, Reporting, Applied Research, Training, Large Language Models (LLMs), Fine-tuning, Generative AI, GPU Computing, Transformer Models, AI Model Training, AI Modeling, PDF, Hugging Face, Open-source LLMs, Data

Data Science Founder in Residence

2020 - 2021
Entrepreneur First & AptaDeep
  • Joined EF, a highly competitive program that only selects potential tech founders with top-notch skills, as one of the top 3%.
  • Provided weekly reports to entrepreneurs in residence and VC partners and eventually pitched to the investment committee for pre-seed funding.
  • Developed a full-stack MVP of SaaS artificial intelligence aptamer development platform.
  • Coordinated with C-level executives of aptamer companies and secured POC/pilots.
  • Oversaw topics such as business models, financial modeling, B2B sales, OKRs, market sizing, competition and defensibility analysis, early-stage growth, fundraising, investor decks, venture economics, communication, and customer development.
  • Performed online data gathering for the 360 analysis of various startup and news trends, leveraging Python for data manipulation, scraping, data analysis, and modeling.
Technologies: Communication, Business, Financial Modeling, Market Opportunity Analysis, Data Science, Python, Amazon Web Services (AWS), Docker, Pandas, Scikit-learn, Keras, Deep Learning, Websites, Data Scraping, Computational Biology, Biology, Genomics, ETL, Machine Learning, PyTorch, TensorFlow, Data Strategy, Data Visualization, Optimization, Statistical Analysis, API Integration, Data Modeling, Forecasting, Classification, Data Pipelines, Pricing Models, Tableau, Artificial Intelligence (AI), Neural Networks, Web Design, Jupyter, CTO, AI Programming, Programming, User Interface (UI), Integration, Data Analysis, MySQL, Team Leadership, Software Architecture, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, NumPy, Regression Modeling, Quantitative Analysis, Leadership, APIs, R&D, Quantum Computing, Content Writing, Research, Generative Artificial Intelligence (GenAI), Notion, Architecture, Data Reporting, AI Design, Healthcare, Data Mining, Algorithms, Reporting, Applied Research, Statistical Modeling, SWOT Analysis, Transformer Models, Exploratory Data Analysis, AI Model Training, Object-oriented Programming (OOP), AI Modeling, Data Extraction, Databases, Cloud, Statistics, Data

Co-founder | CTO

2019 - 2020
NewsPill (ex-Sysmo)
  • Ensured stock market volatility prediction by machine learning was applied to anomaly indicators on scraped internet chatter and technical and contextual data.
  • Redesigned a legacy algorithmic trading system; reusable and structured code architecture, best practices, and design patterns.
  • Supervised numerous data science powered case studies such as Trump Mood Predictor (featured on French TV).
  • Built infrastructure with AWS, Docker, Redis, SQL, Python, Flask, Gunicorn, Nginx, and GitLab.
  • Built a chatbot framework for the easy creation of rule-based chatbots.
  • Pitched the startup and contributed to securing funding with BPI & Rockstart AI. Our startup was featured on the BFM Business TV channel (French Bloomberg).
Technologies: Data Science, Time Series, Options, Scraping, Data Engineering, Amazon Web Services (AWS), Redis, SQL, Flask, Gunicorn, GitLab, Docker, Communication, Fundraising, Chatbots, ETL, Machine Learning, Data Strategy, Data Visualization, Optimization, Statistical Analysis, Real-time Data, Natural Language Processing (NLP), API Integration, Data Modeling, Forecasting, Classification, Text Classification, Data Pipelines, Data-driven Marketing, OCR, Artificial Intelligence (AI), Neural Networks, Web Design, Financial Modeling, Jupyter, CTO, Chatbot Conversation Design, AI Programming, Programming, User Interface (UI), Integration, ChatGPT, Data Analysis, Machine Learning Operations (MLOps), Natural Language Toolkit (NLTK), MySQL, SpaCy, Team Leadership, PostgreSQL, Software Architecture, Sentiment Analysis, TensorFlow, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, Data Scraping, Pandas, NumPy, Regression Modeling, Quantitative Analysis, Leadership, APIs, R&D, Web Scraping, Content Writing, Research, Architecture, Technical Writing, Data Reporting, Amazon Machine Learning, AI Design, Data Mining, Algorithms, Reporting, Backtesting Trading Strategies, Trading, Selenium, Data Matching, CSV File Processing, Bots, Logistic Regression, Applied Research, NLU, Futures & Options, Finance, Quantitative Research, Statistical Modeling, Python, AI Chatbots, Custom Models, SWOT Analysis, AI Research, Exploratory Data Analysis, AI Model Training, A/B Testing, Object-oriented Programming (OOP), AI Modeling, Signal Processing, Unsupervised Learning, Data Extraction, Databases, PDF, Cloud, Quantitative Finance, Statistics, Capital Markets, Data, Multithreading

Senior Data Scientist

2018 - 2019
Dataswati AI for Manufacturing
  • Built predictive models for large French manufacturers for an unevenly sampled time series with uncertainty quantification.
  • Built various automated data pipelines from raw data to automated cross-validation-based feature generation and selection to predictions.
  • Integrated SOTA deep learning: CNN, LSTM, auto-encoders, and transfer learning.
  • Served as a technology evangelist by delivering a blog on medium.com, talks at meetups, and collaborations with the French Institute for Research in Computer Science and Automation (Inria).
  • Customized algorithm implementations via optimization by Differential Evolution, a causal model of regime change, Wasserstein distance-based anomaly detection, and a new method of multi-domain transfer learning.
  • Collected and scraped data from diverse online sources to intelligently augment data and enhance machine learning models with essential external data.
Technologies: Deep Learning, Time Series Analysis, Convolutional Neural Networks (CNN), LSTM, ETL, Machine Learning, PyTorch, TensorFlow, Azure, Time Series, Data Visualization, Optimization, Linear Optimization, Statistical Analysis, API Integration, Data Modeling, Forecasting, Classification, Text Classification, Data Pipelines, OCR, Artificial Intelligence (AI), Neural Networks, Jupyter, AI Programming, Programming, User Interface (UI), Integration, Data Analysis, Natural Language Toolkit (NLTK), MySQL, Software Architecture, Computer Vision, Image Processing, Image Analysis, Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, Data Scraping, Pandas, NumPy, Regression Modeling, Quantitative Analysis, APIs, Generative Adversarial Networks (GANs), R&D, Web Scraping, Content Writing, Research, Technical Writing, Blogging, Data Reporting, Google Cloud Platform (GCP), Google Cloud, AI Design, Data Mining, Algorithms, Reporting, Selenium, Data Matching, CSV File Processing, Pricing, Logistic Regression, Applied Research, Fine-tuning, Statistical Modeling, Python, GPU Computing, Custom Models, SWOT Analysis, AI Research, Exploratory Data Analysis, AI Model Training, Causal Inference, uplift modeling, A/B Testing, OpenCV, Industrial Internet of Things (IIoT), Object-oriented Programming (OOP), AI Modeling, Signal Processing, Unsupervised Learning, Data Extraction, Databases, PDF, Cloud, Statistics, Data

Researcher in Computational Biology and Neuroscience

2013 - 2017
Inria
  • Developed a data-driven model of how biological neurons learn using various datasets, data cleaning, parsing, transformation, and modeling. Conducted numerical simulations of differential equations, optimization, and sensitivity analysis.
  • Published five scientific papers in top journals: eLife, Scientific Reports, Nature.
  • Used Python for data analysis (NumPy, SciPy, Pandas, scikit-learn, matplotlib, etc.) and numerical optimization (PyGMO). Redesigned the calculation module to use Python with F2PY (100x faster than Python + SciPy + NumPy).
Technologies: Python, Pandas, Scikit-learn, F2PY, Sensitivity Analysis, Data Cleaning, Numerical Optimization, Science, Writing & Editing, Matplotlib, Machine Learning, Time Series Analysis, Data Visualization, Optimization, Linear Optimization, Statistical Analysis, Data Modeling, Forecasting, Classification, Data Pipelines, Neural Networks, Web Design, Jupyter, Programming, Data Analysis, Natural Language Toolkit (NLTK), MySQL, Image Processing, Image Analysis, Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Data Analytics, Data Manipulation, Analytics, Data Scraping, NumPy, Regression Modeling, Quantitative Analysis, R&D, Content Writing, Research, Technical Writing, Blogging, Data Reporting, Healthcare, Data Mining, Algorithms, Reporting, Data Matching, CSV File Processing, Logistic Regression, Applied Research, Statistical Modeling, Custom Models, AI Research, Exploratory Data Analysis, AI Model Training, A/B Testing, State Machines, AI Modeling, Signal Processing, Unsupervised Learning, Data Extraction, Statistics, Data, Multithreading

3D Reconstruction and Computer Vision Engineer

2012 - 2012
Riken
  • Developed computer vision algorithms for two-photon microscopy images.
  • Architected 3D reconstruction algorithms from a stack of two-photon microscopy images.
  • Collaborated with researchers and management to adjust software and adapt it to various use cases.
Technologies: 3D Reconstruction, Point Cloud Data, Point Clouds, Exploratory Data Analysis, Data

Trump Mood Predictor

A fun web app that predicts the mood of the next Trump tweet.

It was used as a marketing tool and an illustration of the power of sentiment analysis for the stock market for my first startup. It is known that markets are driven by the so-called animal spirits of fear and greed. During the Trump presidency, his actions and tweets were moving the markets and rippling throughout the economy. We built this web app to illustrate some of the unstructured data processing and modeling techniques that we used to predict stock market volatility.

AptaDeep

Developed the POC of a SaaS platform combining molecular and AI to replace expensive antibodies with aptamers for an AI drug discovery startup. AI predicts aptamer properties and helps to:
• Develop 10x better aptamers (affinity, specificity, stability, or conformational changes)
• Optimize pre-SELEX, SELEX, post-SELEX, and post-production of aptamers, as well as custom non-SELEX processes

DeepProPhoto

DeepProPhoto is an AI tool that transforms regular photos into professional ones in one minute. This app helps users to increase professional visibility and find a dream job while saving money and time.

In this project, I worked on the back and front end, AI model training, and data scraping.

PsyTrainer

https://t.me/psychotrainerbot
Unleash your full communication potential with PsyTrainer, your personal AI psychologist. Sculpted with OpenAI's tech and Falcon 7B LLM fine-tuned on real psychologist-client dialogues.

I contributed to the full-stack AI development. Technologies used are Telegram, Python, SQL, Metabase dashboards, Heroku/AWS, and Falcon, fine-tuned with LoRa, OpenAI's tech.

PsyTrainer—evolve your conversations, transform your beliefs, unlock your potential, and unfold the power of communication.

Personalized Books for Kids

I redefined personalized children's books, taking customization to new heights through AI-driven content creation. Drawing inspiration from platforms like Wonderbly.com, I harnessed an advanced tech stack to deliver an unparalleled experience.

CONTRIBUTIONS
• Full-stack Development: I employed this to ensure a seamless user experience.
• Cloud Infrastructure: I relied on AWS for scalability and reliability.
• AI-powered Content Creation: I used Python, PyTorch, TensorFlow, spaCy, and scikit-learn for AI-driven text and illustration generation.
• Data Insights: Metabase facilitated data visualization and business intelligence.
• Marketing: Google Ads enhanced marketing strategy for customer outreach.

KEY ADVANCEMENTS
• AI Illustrations: AI-generated personalized, captivating illustrations—your kid placed within a book.
• AI-generated Text: NLP models crafted engaging, educational narratives.
• Recommendations: ML algorithms offered tailored book suggestions.

Languages

Python, SQL, R, Snowflake, C++, Python 3

Frameworks

Selenium, Streamlit, Flask

Libraries/APIs

Pandas, Scikit-learn, Natural Language Toolkit (NLTK), PyTorch, TensorFlow, SpaCy, NumPy, OpenCV, PySpark, LSTM, Keras, Matplotlib

Tools

Jupyter, Notion, Amazon SageMaker, Tableau, Slack, MATLAB, GitLab, Google AI Platform, AWS CLI

Paradigms

Data Science, ETL, Quantitative Research, Object-oriented Programming (OOP)

Platforms

Amazon Web Services (AWS), Docker, Google Cloud Platform (GCP), Databricks, Azure, Linux, Visual Studio Code (VS Code), Heroku

Storage

Data Pipelines, MySQL, PostgreSQL, Google Cloud, Databases, Redis, Elasticsearch

Other

Optimization, Data Cleaning, Scientific Computing, Science, Deep Learning, Time Series Analysis, Time Series, Chatbots, Data Scraping, Research, Machine Learning, Data Analysis, Data Visualization, Computational Biology, Data Analytics, Artificial Intelligence (AI), Data Reporting, Linear Optimization, Statistical Analysis, Natural Language Processing (NLP), API Integration, OpenAI GPT-4 API, Data Modeling, Forecasting, Classification, Text Classification, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Neural Networks, CTO, Chatbot Conversation Design, AI Programming, Programming, User Interface (UI), Integration, Machine Learning Operations (MLOps), Language Models, ChatGPT, Team Leadership, Software Architecture, Computer Vision, Sentiment Analysis, BERT, Image Processing, Image Analysis, Large Language Models (LLMs), Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Manipulation, Analytics, Regression Modeling, Quantitative Analysis, OpenAI, APIs, Generative Pre-trained Transformer 3 (GPT-3), Generative Adversarial Networks (GANs), R&D, Generative Artificial Intelligence (GenAI), Architecture, Technical Writing, Blogging, Llama 2, Vertex, Amazon Machine Learning, Google Cloud Machine Learning, Prompt Engineering, AI Design, Data Mining, Algorithms, Reporting, Backtesting Trading Strategies, Trading, Data Matching, CSV File Processing, Bots, Pricing, Logistic Regression, PEFT, LangChain, LoRa, Applied Research, NLU, Fine-tuning, Training, Generative AI, Elementor, Text to Speech (TTS), Statistical Modeling, GPU Computing, Marketing Mix Modeling, AI Chatbots, Custom Models, Image Generation, SWOT Analysis, AI Research, Text to Image, Point Cloud Data, Point Clouds, Transformer Models, Exploratory Data Analysis, AI Model Training, Causal Inference, uplift modeling, A/B Testing, Industrial Internet of Things (IIoT), AI Modeling, Signal Processing, Unsupervised Learning, Data Extraction, PDF, Cloud, Quantitative Finance, Statistics, Capital Markets, Hugging Face, Open-source LLMs, Data, Multithreading, FastAPI, Convolutional Neural Networks (CNN), Data Engineering, Financial Modeling, Biology, Genomics, Recommendation Systems, Data Strategy, Dashboards, Web Scraping, Real-time Data, PDF Scraping, GPT, Pricing Models, Data-driven Marketing, OCR, Metabase, Custom BERT, Leadership, Content Writing, Futures & Options, Finance, Google BigQuery, Phonemes, eCommerce, State Machines, Physics, 3D Reconstruction, F2PY, Sensitivity Analysis, Numerical Optimization, Options, Scraping, Gunicorn, Communication, Fundraising, Community, Business, Market Opportunity Analysis, Websites, Writing & Editing, Telegram Bots, Google Ads, Quantum Computing, Support Vector Machines (SVM), pgvector, Speech to Text

Industry Expertise

Healthcare, Web Design

2013 - 2016

Ph.D. in Computer Science

Inria Rhône-Alpes︱INSA - Lyon, France

2009 - 2013

Master's Degree in Physics

University of Nizhny Novgorod - Nizhny Novgorod, Russia

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring