Ilya Prokin
Verified Expert in Engineering
Data Science Developer
Bordeaux, France
Toptal member since September 7, 2022
Ilya is an AI and LLM RAG advisor, AI strategist and architect, researcher, data scientist, CTO, and AI tech entrepreneur with a PhD. An expert in applied data science, machine learning, AI, and LLM fine-tuning, he has published scientific papers, scaled various MVPs to full-featured products, pitched startups, and built a strong AI community that spread to six cities. Ilya enjoys improving businesses with data, developing innovative ways to apply data and AI, and geeking out about optimization.
Portfolio
Experience
Availability
Preferred Environment
Linux, Visual Studio Code (VS Code), Python, Slack
The most amazing...
...part of my ride was building and exiting startups with VC-backed, end-to-end AI products and building a strong data science community that spread across France.
Work Experience
Text-to-speech and Speech-to-text LLM Developer
ImbaMed
- Developed an MVP of a virtual voice call sales agent, including text-to-speech, speech-to-text, and LLM sales logic.
- Used OpenAI GPT to implement custom open LLM for sales agent logic.
- Iterated over different alternative TTS solutions: ElevenLabs, Coqui, Bark, Silero, Piper, Edge TTS, Tortoise, WhisperSpeech, and OpenVoice.
- Leveraged faster Whisper for speech-to-text and WhisperX for diarization.
AI Tech Lead
Coeus Metis Labs
- Created, iterated, and refined prompts for AI email marketing assistants and live chat support, ensuring high-quality and relevant responses.
- Designed and built model-agnostic large language model (LLM) pipelines leveraging Gemini, GPT-3.5, GPT-4, GPT-4o, and Claude 3, as well as various custom open-source LLMs.
- Collaborated with cross-functional teams to ensure alignment of AI capabilities with business objectives and various experts.
Data Science Leader
LoanSnap - AI for US mortgage
- Coordinated a team of data scientists and engineers. Conducted daily standups and project management.
- Provided weekly reports to senior leadership, specifically CTO, directors of capital markets, and product.
- Drove the data science section at company meetings presentations and enabled cross-company collaborations on data science initiatives.
- Participated in strategic initiatives planning and coordinated the execution effort. The data team's marketing recommendations increased lead volume by two times.
- Developed custom models optimizing revenue and cost-critical decisions across the entire sales pipeline and secondary market hedging activities.
- Gathered data from multiple online sources leveraging customized web strapping solutions and performed competitor intelligence analysis.
- Developed empathetic LLM agents endowed with distinct personalities for personalized customer service. Used OpenAI's GPT-3.5 and GPT-4 API and customized LLMs.
Founder | Community Organizer
Data Brunch
- Built a strong data science community that meets weekly to discuss state-of-the-art data science.
- Grew an excellent data science ecosystem with access to various deep expertise, including accomplished researchers, math Olympiad winners, and strong, competitive data scientists.
- Connected with experts across the country and helped data people find jobs.
AI Architect
Stacks, Inc
- Consulted on competitive AI tech strategy on Google Cloud as well as future-proof design, avoiding vendor lock-in.
- Designed an LLM, knowledge/cognitive graph RAG-based system.
- Developed scalable AI architecture from MVP to enterprise version from concept to detailed product plan.
Data Scraper & Collector
Nixtla Inc.
- Identified and developed new sources: searched for reliable data sources on the Internet that provide time series datasets relevant to the business objectives.
- Scraped various data sources and processed time series data from various online sources.
- Worked closely with data scientists and machine learning engineers to provide high-quality data and contributed to analytics and predictive modeling projects.
- Maintained comprehensive documentation that describes data sources, data transformations, and any challenges encountered during the process.
NLP Machine Learning Developer
FirmPilot AI Inc
- Developed a complete tech strategy and detailed specs for developers to build the product, leveraging OpenAI's ChatGPT, Google's Bard, PaLM2, and Anthropic Claude2, as well as custom open LLM.
- Researched state-of-the-art tech solutions and recommended optimal choices, maximizing business impact.
- Developed an innovative approach that utilized adversarial LLM training and fine-tuning.
Data Science Founder in Residence
Entrepreneur First & AptaDeep
- Joined EF, a highly competitive program that only selects potential tech founders with top-notch skills, as one of the top 3%.
- Provided weekly reports to entrepreneurs in residence and VC partners and eventually pitched to the investment committee for pre-seed funding.
- Developed a full-stack MVP of SaaS artificial intelligence aptamer development platform.
- Coordinated with C-level executives of aptamer companies and secured POC/pilots.
- Oversaw topics such as business models, financial modeling, B2B sales, OKRs, market sizing, competition and defensibility analysis, early-stage growth, fundraising, investor decks, venture economics, communication, and customer development.
- Performed online data gathering for the 360 analysis of various startup and news trends, leveraging Python for data manipulation, scraping, data analysis, and modeling.
Co-founder | CTO
NewsPill (ex-Sysmo)
- Ensured stock market volatility prediction by machine learning was applied to anomaly indicators on scraped internet chatter and technical and contextual data.
- Redesigned a legacy algorithmic trading system; reusable and structured code architecture, best practices, and design patterns.
- Supervised numerous data science powered case studies such as Trump Mood Predictor (featured on French TV).
- Built infrastructure with AWS, Docker, Redis, SQL, Python, Flask, Gunicorn, Nginx, and GitLab.
- Built a chatbot framework for the easy creation of rule-based chatbots.
- Pitched the startup and contributed to securing funding with BPI & Rockstart AI. Our startup was featured on the BFM Business TV channel (French Bloomberg).
Senior Data Scientist
Dataswati AI for Manufacturing
- Built predictive models for large French manufacturers for an unevenly sampled time series with uncertainty quantification.
- Built various automated data pipelines from raw data to automated cross-validation-based feature generation and selection to predictions.
- Integrated SOTA deep learning: CNN, LSTM, auto-encoders, and transfer learning.
- Served as a technology evangelist by delivering a blog on medium.com, talks at meetups, and collaborations with the French Institute for Research in Computer Science and Automation (Inria).
- Customized algorithm implementations via optimization by Differential Evolution, a causal model of regime change, Wasserstein distance-based anomaly detection, and a new method of multi-domain transfer learning.
- Collected and scraped data from diverse online sources to intelligently augment data and enhance machine learning models with essential external data.
Researcher in Computational Biology and Neuroscience
Inria
- Developed a data-driven model of how biological neurons learn using various datasets, data cleaning, parsing, transformation, and modeling. Conducted numerical simulations of differential equations, optimization, and sensitivity analysis.
- Published five scientific papers in top journals: eLife, Scientific Reports, Nature.
- Used Python for data analysis (NumPy, SciPy, Pandas, scikit-learn, matplotlib, etc.) and numerical optimization (PyGMO). Redesigned the calculation module to use Python with F2PY (100x faster than Python + SciPy + NumPy).
3D Reconstruction and Computer Vision Engineer
Riken
- Developed computer vision algorithms for two-photon microscopy images.
- Architected 3D reconstruction algorithms from a stack of two-photon microscopy images.
- Collaborated with researchers and management to adjust software and adapt it to various use cases.
Experience
Trump Mood Predictor
It was used as a marketing tool and an illustration of the power of sentiment analysis for the stock market for my first startup. It is known that markets are driven by the so-called animal spirits of fear and greed. During the Trump presidency, his actions and tweets were moving the markets and rippling throughout the economy. We built this web app to illustrate some of the unstructured data processing and modeling techniques that we used to predict stock market volatility.
AptaDeep
• Develop 10x better aptamers (affinity, specificity, stability, or conformational changes)
• Optimize pre-SELEX, SELEX, post-SELEX, and post-production of aptamers, as well as custom non-SELEX processes
DeepProPhoto
In this project, I worked on the back and front end, AI model training, and data scraping.
PsyTrainer
https://t.me/psychotrainerbotI contributed to the full-stack AI development. Technologies used are Telegram, Python, SQL, Metabase dashboards, Heroku/AWS, and Falcon, fine-tuned with LoRa, OpenAI's tech.
PsyTrainer—evolve your conversations, transform your beliefs, unlock your potential, and unfold the power of communication.
Personalized Books for Kids
CONTRIBUTIONS
• Full-stack Development: I employed this to ensure a seamless user experience.
• Cloud Infrastructure: I relied on AWS for scalability and reliability.
• AI-powered Content Creation: I used Python, PyTorch, TensorFlow, spaCy, and scikit-learn for AI-driven text and illustration generation.
• Data Insights: Metabase facilitated data visualization and business intelligence.
• Marketing: Google Ads enhanced marketing strategy for customer outreach.
KEY ADVANCEMENTS
• AI Illustrations: AI-generated personalized, captivating illustrations—your kid placed within a book.
• AI-generated Text: NLP models crafted engaging, educational narratives.
• Recommendations: ML algorithms offered tailored book suggestions.
Education
Ph.D. in Computer Science
Inria Rhône-Alpes︱INSA - Lyon, France
Master's Degree in Physics
University of Nizhny Novgorod - Nizhny Novgorod, Russia
Skills
Libraries/APIs
Pandas, Scikit-learn, Natural Language Toolkit (NLTK), PyTorch, TensorFlow, SpaCy, NumPy, OpenCV, PySpark, LSTM, Keras, Matplotlib
Tools
Jupyter, ChatGPT, Notion, Azure OpenAI Service, Google Bard, Microsoft Copilot, Whisper, Amazon SageMaker, Tableau, Slack, MATLAB, GitLab, Google AI Platform, AWS CLI
Languages
Python, SQL, C#, CSS, JavaScript, HTML, R, Snowflake, C++, Python 3
Frameworks
Selenium, Streamlit, DSPy, Flask, LlamaIndex
Paradigms
ETL, Quantitative Research, Object-oriented Programming (OOP), Anomaly Detection
Platforms
Amazon Web Services (AWS), Docker, Google Cloud Platform (GCP), Databricks, Azure Functions, Azure, Azure AI Studio, Linux, Visual Studio Code (VS Code), Heroku, Google Ads
Storage
Data Pipelines, MySQL, PostgreSQL, Google Cloud, Databases, Graph Databases, Neo4j, Redis, Elasticsearch
Industry Expertise
Bioinformatics, Healthcare, Web Design
Other
Optimization, Data Science, Data Cleaning, Scientific Computing, Science, Deep Learning, Time Series Analysis, Time Series, Chatbots, Data Scraping, Research, Machine Learning, Data Analysis, Data Visualization, Computational Biology, Web Scraping, Data Analytics, Artificial Intelligence (AI), Data Reporting, Linear Optimization, Statistical Analysis, Natural Language Processing (NLP), API Integration, OpenAI GPT-4 API, Data Modeling, Forecasting, Classification, Text Classification, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Neural Networks, CTO, Chatbot Conversation Design, AI Programming, Programming, User Interface (UI), Integration, Machine Learning Operations (MLOps), Language Models, Team Leadership, Software Architecture, Computer Vision, Sentiment Analysis, BERT, Image Processing, Image Analysis, Large Language Models (LLMs), Deep Reinforcement Learning, Predictive Modeling, Probability Theory, Predictive Analytics, Frameworks, Data Manipulation, Analytics, Regression Modeling, Quantitative Analysis, OpenAI, APIs, Generative Pre-trained Transformer 3 (GPT-3), Generative Adversarial Networks (GANs), R&D, Generative Artificial Intelligence (GenAI), Architecture, Technical Writing, Blogging, Llama 2, Vertex, Amazon Machine Learning, Google Cloud Machine Learning, Prompt Engineering, AI Design, Data Mining, Algorithms, Reporting, Backtesting Trading Strategies, Trading, Data Matching, CSV File Processing, Bots, Pricing, Logistic Regression, PEFT, LangChain, LoRa, Applied Research, NLU, Fine-tuning, Training, Elementor, Text to Speech (TTS), Statistical Modeling, Speech to Text, GPU Computing, Marketing Mix Modeling, AI Chatbots, Custom Models, Image Generation, SWOT Analysis, AI Research, Text to Image, Point Cloud Data, Point Clouds, Transformer Models, Exploratory Data Analysis, AI Model Training, Causal Inference, Uplift Modeling, A/B Testing, Industrial Internet of Things (IIoT), AI Modeling, Signal Processing, Unsupervised Learning, Data Extraction, PDF, Cloud, Quantitative Finance, Statistics, Capital Markets, Hugging Face, Open-source LLMs, Data, Multithreading, FastAPI, Retrieval-augmented Generation (RAG), Scalable Web Services, Minimum Viable Product (MVP), Multimodal Models, Embeddings from Language Models (ELMo), Llama 3, Mistral AI, CSV Export, Pinecone, Product Ownership, Scalable Vector Databases, Product Design, Speech to Text AI, Algorithmic Trading, DNA Sequencing, Workshops, Coaching, Stock Market, Facial Recognition, Medical Imaging, Biostatistics, Active Learning, English, Gemini, AI Agents, Technical Leadership, Convolutional Neural Networks (CNNs), Data Engineering, Financial Modeling, Websites, Biology, Genomics, Recommendation Systems, Data Strategy, Dashboards, Real-time Data, PDF Scraping, Pricing Models, Data-driven Marketing, OCR, Metabase, Custom BERT, Leadership, Content Writing, Futures & Options, Finance, Google BigQuery, Phonemes, eCommerce, State Machines, Semantic Search, Stable Diffusion, Mathematics, Claude, Outbound Marketing, Physics, 3D Reconstruction, F2PY, Sensitivity Analysis, Numerical Optimization, Options, Scraping, Gunicorn, Communication, Fundraising, Community, Business, Market Opportunity Analysis, Writing & Editing, Telegram Bots, Quantum Computing, Support Vector Machines (SVM), Pgvector, Advanced Analytics, Groq
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring