Ilya Prokin, Developer in Bordeaux, France
Ilya is available for hire
Hire Ilya

Ilya Prokin

Verified Expert  in Engineering

Data Science Developer

Location
Bordeaux, France
Toptal Member Since
September 7, 2022

Ilya is a researcher (Ph.D.), data scientist, CTO, and entrepreneur with expertise in applied data science and machine learning in manufacturing, finance, and biotech. He has published five scientific papers, improved stock market volatility prediction, developed MVPs, pitched startups, and built a strong data science community to discuss state-of-the-art DS topics. Ilya enjoys improving businesses with data, developing innovative ways to apply data science, and geeking out about optimization.

Portfolio

LoanSnap - AI for US mortgage
Data Science, Python, Amazon Web Services (AWS), Google Cloud AI, Docker...
Data Breakfast France
Data Science, Community, Communication, Biology, Machine Learning...
Entrepreneur First & AptaDeep
Communication, Business, Financial Modeling, Market Opportunity Analysis...

Experience

Availability

Part-time

Preferred Environment

Linux, Visual Studio Code (VS Code), Python, Slack

The most amazing...

...part of my ride was building and exiting startups: VC-backed, end-to-end AI products and building a strong data science community that spread across France.

Work Experience

Lead Data Scientist

2021 - PRESENT
LoanSnap - AI for US mortgage
  • Coordinated a team of data scientists and engineers. Conducted daily standups and project management.
  • Provided weekly reports to senior leadership (CTO, directors of capital markets, and product).
  • Drove the data science section at company meetings presentations and enabled cross-company collaborations on data science initiatives.
  • Participated in strategic initiatives planning and coordinated the execution effort. The data team's marketing recommendations increased lead volume by two times.
  • Developed custom models optimizing revenue and cost-critical decisions across the entire sales pipeline and secondary market hedging activities.
  • Gathered data from multiple online sources leveraging customized web strapping solutions and performed competitor intelligence analysis.
Technologies: Data Science, Python, Amazon Web Services (AWS), Google Cloud AI, Docker, Pandas, Scikit-learn, Data Scraping, ETL, Machine Learning, Recommendation Systems, Data Strategy, Data Visualization, Optimization, Linear Optimization, Statistical Analysis, API Integration, OpenAI GPT-4 API, Streamlit, Data Modeling, Forecasting, Amazon SageMaker, Classification, Text Classification, Data Pipelines, GPT, Pricing Models, Data-driven Marketing, Generative Pre-trained Transformers (GPT), OCR, OpenAI GPT-3 API, Tableau, PySpark, Artificial Intelligence (AI), Jupyter, AI Programming, Programming, User Interface (UI), Integration, Language Models, Data Analysis, Machine Learning Operations (MLOps), MySQL, Team Leadership, PostgreSQL, Software Architecture, Sentiment Analysis, Large Language Model (LLM), Data Engineering, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, NumPy

Founder and Community Organizer

2019 - PRESENT
Data Breakfast France
  • Built a strong data science community that meets every week to discuss state-of-the-art data science.
  • Grew a great data science ecosystem with access to various deep expertise, including accomplished researchers, math Olympiad winners, and strong, competitive data scientists.
  • Connected with experts across the country and helped data people find jobs.
Technologies: Data Science, Community, Communication, Biology, Machine Learning, Recommendation Systems, Data Visualization, Natural Language Processing (NLP), Forecasting, Classification, Text Classification, Data Pipelines, PySpark, Artificial Intelligence (AI), CTO, Programming, Data Analysis, SpaCy, Team Leadership, BERT, Custom BERT, Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, Pandas

Data Science Founder in Residence

2020 - 2021
Entrepreneur First & AptaDeep
  • Chosen as one of the top 3% to join EF, a highly competitive program that only selects potential tech founders with top-notch skills.
  • Provided weekly reports to entrepreneurs in residence and VC partners and eventually pitched to the investment committee for pre-seed funding.
  • Developed an MVP using Python, HTML, CSS, and Bootstrap to create a SaaS artificial intelligence aptamer development platform.
  • Coordinated with C-level executives of aptamer companies and secured POC/pilots.
  • Oversaw topics such as business models, financial modeling, B2B sales, OKRs, market sizing, competition and defensibility analysis, early-stage growth, fundraising, investor decks, venture economics, communication, and customer development.
  • Performed online data gathering for 360 analysis of various startup and news trends leveraging Python for data manipulation, scraping, data analysis, and modeling.
Technologies: Communication, Business, Financial Modeling, Market Opportunity Analysis, Data Science, Python, Amazon Web Services (AWS), Docker, Pandas, Scikit-learn, Keras, Deep Learning, Websites, Data Scraping, Computational Biology, Biology, Genomics, ETL, Machine Learning, PyTorch, TensorFlow, Data Strategy, Data Visualization, Optimization, Statistical Analysis, API Integration, Data Modeling, Forecasting, Classification, Data Pipelines, Pricing Models, Tableau, Artificial Intelligence (AI), Neural Networks, Web Design, Jupyter, CTO, AI Programming, Programming, User Interface (UI), Integration, Data Analysis, MySQL, Team Leadership, Software Architecture, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, NumPy

Co-founder and CTO

2019 - 2020
NewsPill (ex-Sysmo)
  • Improved stock market volatility prediction by machine learning applied to anomaly indicators on scrapped internet chatter, technical, and contextual data.
  • Redesigned a legacy algorithmic trading system; reusable and structured code architecture, best practices, and design patterns.
  • Supervised numerous data science powered case studies such as Trump Mood Predictor (featured on French TV).
  • Built infrastructure with AWS, Docker, Redis, SQL, Python, Flask, Gunicorn, Nginx, and GitLab.
  • Built a chatbot framework for the easy creation of rule-based chatbots.
  • Pitched the startup and contributed to securing funding with BPI & Rockstart AI. Our startup was featured on the BFM Business TV channel (French Bloomberg).
Technologies: Data Science, Time Series, Options, Scraping, Data Engineering, Amazon Web Services (AWS), Redis, SQL, Flask, Gunicorn, GitLab, Docker, Communication, Fundraising, Chatbots, ETL, Machine Learning, Data Strategy, Data Visualization, Optimization, Statistical Analysis, Real-time Data, Natural Language Processing (NLP), API Integration, Data Modeling, Forecasting, Classification, Text Classification, Data Pipelines, Data-driven Marketing, OCR, Artificial Intelligence (AI), Neural Networks, Web Design, Financial Modeling, Jupyter, CTO, Chatbot Conversation Design, AI Programming, Programming, User Interface (UI), Integration, ChatGPT, Data Analysis, Machine Learning Operations (MLOps), Natural Language Toolkit (NLTK), MySQL, SpaCy, Team Leadership, PostgreSQL, Software Architecture, Sentiment Analysis, TensorFlow, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, Data Scraping, Pandas, NumPy

Senior Data Scientist

2018 - 2019
Dataswati AI for Manufacturing
  • Built predictive models for large French manufacturers for an unevenly sampled time series with uncertainty quantification.
  • Built various automated data pipelines from raw data to automated cross-validation-based feature generation and selection to predictions.
  • Integrated SOTA deep learning: CNN, LSTM, auto-encoders, and transfer learning.
  • Served as a technology evangelist by delivering a blog on medium.com, talks at meetups, and collaborations with the French Institute for Research in Computer Science and Automation (Inria).
  • Customized algorithm implementations via optimization by Differential Evolution, a causal model of regime change, Wasserstein distance-based anomaly detection, and a new method of multi-domain transfer learning.
  • Collecting and scraping data from diverse online sources to intelligently augment data and enhance machine learning models with essential external data.
Technologies: Deep Learning, Time Series Analysis, Convolutional Neural Networks, LSTM, ETL, Machine Learning, PyTorch, TensorFlow, Azure, Time Series, Data Visualization, Optimization, Linear Optimization, Statistical Analysis, API Integration, Data Modeling, Forecasting, Classification, Text Classification, Data Pipelines, OCR, Artificial Intelligence (AI), Neural Networks, Jupyter, AI Programming, Programming, User Interface (UI), Integration, Data Analysis, Natural Language Toolkit (NLTK), MySQL, Software Architecture, Computer Vision, Image Processing, Image Analysis, Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Analytics, Data Manipulation, Analytics, Data Scraping, Pandas, NumPy

Researcher in Computational Biology and Neuroscience

2013 - 2017
Inria
  • Developed a data-driven model of how biological neurons learn using various datasets, data cleaning, parsing, transformation, and modeling. Conducted numerical simulations of differential equations, optimization, and sensitivity analysis.
  • Published five scientific papers in top journals: eLife, Scientific Reports, Nature.
  • Used Python for data analysis (NumPy, SciPy, Pandas, scikit-learn, matplotlib, etc.) and numerical optimization (PyGMO). Redesigned the calculation module to use Python with F2PY (100x faster than Python + SciPy + NumPy).
Technologies: Python, Pandas, Scikit-learn, F2PY, Sensitivity Analysis, Data Cleaning, Numerical Optimization, Writing & Editing, Science, Matplotlib, Machine Learning, Time Series Analysis, Data Visualization, Optimization, Linear Optimization, Statistical Analysis, Data Modeling, Forecasting, Classification, Data Pipelines, Neural Networks, Web Design, Jupyter, Programming, Data Analysis, Natural Language Toolkit (NLTK), MySQL, Image Processing, Image Analysis, Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Data Analytics, Data Manipulation, Analytics, Data Scraping, NumPy

Trump Mood Predictor

A fun web app that predicts the mood of the next Trump tweet.

It was used as a marketing tool and an illustration of the power of sentiment analysis for the stock market for my first startup. It is known that markets are driven by the so-called animal spirits of fear and greed. During the Trump presidency, his actions and tweets were moving the markets and rippling throughout the economy. We built this web app to illustrate some of the unstructured data processing and modeling techniques that we used to predict stock market volatility.

AptaDeep

Developed the POC of a SaaS platform combining molecular and AI to replace expensive antibodies with aptamers for an AI drug discovery startup. AI predicts aptamer properties and helps to:
• Develop 10x better aptamers (affinity, specificity, stability, or conformational changes)
• Optimize pre-SELEX, SELEX, post-SELEX, and post-production of aptamers, as well as custom non-SELEX processes

DeepProPhoto

DeepProPhoto is an AI tool to transform regular photos into professional ones in 1 minute. This app helps users to increase professional visibility and find a dream job while saving money and time.

In this project, I worked on the back and front end, AI model training, and data scrapping.

PsyTrainer

https://t.me/psychotrainerbot
Unleash your full communication potential with PsyTrainer, your personal AI psychologist. Sculpted with OpenAI's tech and Falcon 7B LLM fine-tuned on real psychologist-client dialogues.

I contributed to the full-stack AI development. Technologies used are Telegram, Python, SQL, Metabase dashboards, Heroku/AWS, and Falcon, fine-tuned with LoRa, OpenAI's tech.

PsyTrainer—evolve your conversations, transform your beliefs, unlock your potential, and unfold the power of communication.

Personalized Books for Kids

I redefined personalized children's books, taking customization to new heights through AI-driven content creation. Drawing inspiration from platforms like Wonderbly.com, I harnessed an advanced tech stack to deliver an unparalleled experience.

CONTRIBUTIONS
• Full-stack Development: I employed this to ensure a seamless user experience.
• Cloud Infrastructure: I relied on AWS for scalability and reliability.
• AI-powered Content Creation: I used Python, PyTorch, TensorFlow, spaCy, and scikit-learn for AI-driven text and illustration generation.
• Data Insights: Metabase facilitated data visualization and business intelligence.
• Marketing: Google Ads enhanced marketing strategy for customer outreach.

KEY ADVANCEMENTS
• AI Illustrations: AI-generated personalized, captivating illustrations—your kid placed within a book.
• AI-generated Text: NLP models crafted engaging, educational narratives.
• Recommendations: ML algorithms offered tailored book suggestions.

Languages

Python, SQL, R, C++, Python 3

Libraries/APIs

Pandas, Scikit-learn, Natural Language Toolkit (NLTK), TensorFlow, SpaCy, NumPy, PyTorch, PySpark, LSTM, Keras, Matplotlib

Tools

Jupyter, Amazon SageMaker, Tableau, Slack, MATLAB, GitLab, Google Cloud AI, AWS CLI

Paradigms

Data Science, ETL

Storage

Data Pipelines, MySQL, PostgreSQL, Redis

Other

Optimization, Data Cleaning, Scientific Computing, Science, Deep Learning, Time Series Analysis, Time Series, Chatbots, Data Scraping, Research, Machine Learning, Data Analysis, Data Visualization, Computational Biology, Data Analytics, Artificial Intelligence (AI), Data Reporting, Linear Optimization, Statistical Analysis, Natural Language Processing (NLP), API Integration, OpenAI GPT-4 API, Data Modeling, Forecasting, Classification, Text Classification, OpenAI GPT-3 API, Neural Networks, CTO, Chatbot Conversation Design, AI Programming, Programming, User Interface (UI), Integration, Machine Learning Operations (MLOps), Language Models, ChatGPT, Team Leadership, Software Architecture, Computer Vision, Sentiment Analysis, Image Processing, Image Analysis, Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Manipulation, Analytics, Convolutional Neural Networks, Data Engineering, Financial Modeling, Biology, Genomics, Recommendation Systems, Data Strategy, Dashboards, Web Scraping, Real-time Data, PDF Scraping, Streamlit, GPT, Pricing Models, Data-driven Marketing, Generative Pre-trained Transformers (GPT), OCR, Metabase, BERT, Custom BERT, Large Language Model (LLM), Physics, 3D Reconstruction, F2PY, Sensitivity Analysis, Numerical Optimization, Options, Scraping, Gunicorn, Communication, Fundraising, Community, Business, Market Opportunity Analysis, Websites, Writing & Editing, Telegram Bots, Google Ads

Platforms

Azure, Linux, Amazon Web Services (AWS), Docker, Visual Studio Code (VS Code), Heroku

Frameworks

Flask

Industry Expertise

Web Design

2013 - 2016

Ph.D. in Computer Science

Inria Rhône-Alpes︱INSA - Lyon, France

2009 - 2013

Master's Degree in Physics

University of Nizhny Novgorod - Nizhny Novgorod, Russia