Yaroslav is available for hire

Yaroslav Kopotilov

Verified Expert in Engineering

Data Scientist and Developer

Belgrade, Serbia

Toptal member since April 9, 2020

Expertise

Artificial Intelligence Machine Learning Data Science Data Analysis Algorithms Python NumPy Database Mathematics Data Engineering Quantitative Development Data Visualization

Bio

Yaroslav is a senior data scientist with extensive experience in business analysis, predictive modeling, data visualization, data orchestration, and deployment. He has a proven track record of managing complex data science projects and leading small, agile developer teams.

Portfolio

Data Sanity

Lean Project Management, Team Leadership, Business Development, Public Speaking...

US Fintech Startup

Python, Machine Learning, Rust, Forecasting, Statistical Modeling...

Inteleos, Inc. - Main

Statistics, Data Science, Python, Snowflake, SQL...

Experience

Python - 8 years
Machine Learning - 8 years
Time Series Analysis - 6 years
Statistics - 5 years
Data Engineering - 4 years
SQL - 4 years
Data Visualization - 3 years
Stakeholder Engagement - 3 years

Preferred Environment

Git, Jupyter, Linux, Visual Studio Code (VS Code), SQL, Python, MacOS, NoSQL, Docker

The most amazing...

...thing I designed and built is an algorithmic strategy combining multiple data pipelines, capable of sub-second execution (Python, SQL, AMQP, Docker)

Work Experience

Founder | CEO | Speaker

2025 - PRESENT

Data Sanity

Organized four international AI conferences in Serbia and the UK, attracting up to 200 participants each, plus numerous smaller events.
Co-authored and lectured in the open course “Intro to AI Agents” at the University of Belgrade, attracting 100 students and professionals.
Assembled and managed a team of 10+ contractors and volunteers.

Technologies: Lean Project Management, Team Leadership, Business Development, Public Speaking, Website Design, Artificial Intelligence (AI), LangGraph, RAG Systems, AI Agents, AI Model Training, Training, LangChain, Community, Claude Code, Agentic AI

Senior ML Engineer | Quant Researcher

2026 - 2026

US Fintech Startup

Analyzed and compared trading signals from different data sources in US prediction markets.
Independently implemented and backtested an ML-driven trading strategy to verify the previous backtest results.
Advised on the ML infrastructure and the company's long-term development.

Technologies: Python, Machine Learning, Rust, Forecasting, Statistical Modeling, Random Forests, XGBoost, Trading, Algorithmic Trading, Time Series Forecasting

ML/AI Architect

2025 - 2026

Inteleos, Inc. - Main

Architected AWS-based pipelines (S3, DynamoDB, Lambda, ECR, and Snowflake) to train and deploy 20+ time series forecasting models.
Created guidelines for machine learning (ML) research and deployment, helping to translate findings into scalable production systems.
Mentored a junior data scientist, guiding code reviews, model evaluation, and best practices for reproducible research.
Designed a proof-of-concept AI assistant for data search and summarization (AWS and Snowflake).

Technologies: Statistics, Data Science, Python, Snowflake, SQL, AWS Command Line Interface (CLI), Amazon Web Services (AWS), System Design, Mentorship, Time Series, Time Series Analysis, Machine Learning Operations (MLOps), Data Architecture, AWS Deployment, CI/CD Pipelines, RAG Systems, AI Agents, AI Architecture, Amazon Bedrock, Time Series Forecasting, AWS IoT, Time Series Data

Python and Machine Learning Developer

2025 - 2025

codeValet Inc.

Contributed to the redesign of the ML architecture, drastically simplifying and improving the efficiency of an AI platform that integrates NLP and mathematical reasoning.
Streamlined asynchronous graph and optimization computations using PyTorch, accelerating AI pipelines by over 50x.
Developed a GraphRAG extension extracting relevant nodes from large codebases in under a second.

Technologies: Python, SQL, Machine Learning Operations (MLOps), Natural Language Processing (NLP), Machine Learning, SpaCy, Algorithms, NumPy, Data Structures, Ubuntu, Hugging Face, AI Algorithms, Cython, Mathematics, PyTorch, Architecture, Solution Architecture, AI-generated Code, Containerization, Artificial Intelligence (AI), Retrieval-augmented Generation (RAG), Vector Search, Graphs, Optimization, Combinatorial Optimization, RAG Systems

Prompt and Software Engineer (via Toptal)

2024 - 2024

Invisible Technologies Inc

Developed a Python pipeline for large-scale prompt prototyping, reducing experimentation time severalfold.
Architected and refined methods for evaluating large language model (LLM) responses based on correctness, safety, and relevance.
Tested a variety of closed-source and open-source LLMs.

Technologies: Prompt Engineering, Artificial Intelligence (AI), Machine Learning, Data Science, Python, Software Engineering, Asyncio, Large Language Models (LLMs), Data Structures, Natural Language Processing (NLP), Minimum Viable Product (MVP), AI Prompts, LangChain

Senior Data Scientist | Python Developer

2024 - 2024

Bumbee Labs Ab

Designed an improved version of a visit count algorithm that reduces the out-of-sample model error by 50%.
Developed a framework for robust machine learning (ML) model prototyping and evaluation by the data science team.
Accelerated historical sample data processing in Python by more than 50x through the use of more efficient functions and just-in-time compilation. This reduced the processing time for one day of sample data from one hour to one minute.
Built a 24/7 data pipeline consuming wifi sample data from multiple sensor installations and saving them in an SQL database for historical data analysis and model evaluation.

Technologies: Data Science, Python, Machine Learning, Applied Physics, PostGIS, AMQP, Asyncio, Data Pipelines, WiFi, Time Series Analysis, Heroku, PostgreSQL, AI Consulting, Integration, Data Structures, Architecture, Feature Engineering, GIS, Geospatial Data, Data Cleansing, Algorithm Design, Containerization, Software Architecture, Random Forests, Spatial Analysis, Optimization, Real-time Systems, Time Series Forecasting

Lead Data Scientist and Developer | CEO | Founder

2022 - 2024

YAFinData

Designed and built a financial data and data analytics platform. The data is shipped in a unified, user-friendly format and can be accessed via a web app and REST API.
Managed a remote team of up to five developers. Determined the overall direction of product development.
Analyzed trading opportunities in the UK electricity markets. Backtested several short-term algorithmic strategies. Estimated PnL and risks, accounting for slippage and market impact.
Created several 24/7 ETL pipelines that collect, clean, and save data for the UK electricity market. Implemented downstream features that are continuously computed from the data feeds in less than 10 ms.
Developed CI/CD, a backup raw file storage, a parallel redundancy, and a monitoring system to ensure the data collection functions smoothly 24/7.

Technologies: Python, SQL, IT Project Management, IT Product Management, Data Analytics, Data Science, Machine Learning, Web Dashboards, Deep Learning, Data Pipelines, Metrics, Dashboards, Team Leadership, Leadership, Technical Leadership, Finance, Time Series, CTO, PostgreSQL, Algorithmic Trading, Market Risk, Data Structures, Ubuntu, Machine Learning Operations (MLOps), Minimum Viable Product (MVP), Architecture, Finance APIs, PyTorch, Feature Engineering, API Integration, Agile Software Development, REST APIs, Financial Market Data, Real-time Data, Data Cleansing, Trading Systems, Solution Architecture, Cloud, DevOps, GitOps, Containerization, Trading, Software Architecture, Back-end Development, XGBoost, Data Architecture, CI/CD Pipelines, Infrastructure, Prefect, Hetzner, AI Architecture, Time Series Forecasting, Remote Team Leadership, High-frequency Trading (HFT)

Systematic Trading - Data Scientist and Developer

2020 - 2021

TickUp AB

Analyzed and unified multiple datasets for US equity markets.
Developed an ML model and several data pipelines for an algorithmic trading strategy.
Wrote and reviewed both research notebooks and production code.
Organized a 7-day company meetup, which helped boost team productivity and collaboration.

Technologies: Algorithms, Python, Statistics, Trading, Financial Markets, Data Mining, Algorithmic Trading, Time Series Analysis, Equity Market Data, Docker, Jupyter Notebook, Data Visualization, Financial Data, Code Review, SQL, Git, GitHub, Data Analysis, Regression, Statistical Analysis, Data Science, Forecasting, Data Analytics, Backtesting Trading Strategies, Trade Finance, NumPy, Numba, NVIDIA CUDA, Data Pipelines, Metrics, Quantitative Research, Finance, Time Series, PostgreSQL, Stock Market, Data Structures, Minimum Viable Product (MVP), Finance APIs, Feature Engineering, Financial Market Data, Data Cleansing, Trading Systems, Statistical Modeling, XGBoost, Random Forests, Time Series Forecasting

Equity Trading - Quant Researcher

2020 - 2020

Independent Client

Analyzed financial and fundamental data on publicly traded companies.
Identified and improved a trading signal for a daily equity strategy.
Presented strategy backtest results and handed off research for implementation.

Technologies: Trading, Data Science, Machine Learning, Python, Systematic Trading, Equities, Time Series Forecasting, Financial Markets

Energy Trading - Data Scientist

2019 - 2020

Vitol

Created market analysis tools and systematic strategies for coal, power, and crude desks. Covered all phases of a data science project, including project setup, data pipelines, modeling, and deployment.
Analyzed the firm-wide trading market impact under different execution styles.
Worked with both small (50 data points) and large (several terabytes) datasets.
Contributed individually and in collaboration with the data science and IT teams.
Assisted Vitol's employees in Python and machine learning training.

Technologies: ActiveBatch, Kibana, Amazon Athena, Amazon S3 (AWS S3), Git, Oracle SQL, Python, Time Series Analysis, Machine Learning, Data Science, Software Development, Data Engineering, Jupyter Notebook, Pandas, Algorithmic Trading, Data Visualization, Bitbucket, Dashboards, Amazon Web Services (AWS), Dash, Web Dashboards, Big Data, Data Analysis, Financial Data, Regression, Statistical Analysis, Forecasting, Data Analytics, Backtesting Trading Strategies, Trade Finance, NumPy, Data Pipelines, Metrics, Finance, Time Series, Mentorship & Coaching, Bayesian Statistics, Bayesian Inference & Modeling, Coaching, Workshops, PyCharm, Data Structures, AI Algorithms, Minimum Viable Product (MVP), Mathematical Modeling, Monte Carlo, Monte Carlo Simulations, Finance APIs, Feature Engineering, Financial Market Data, Real-time Data, Data Cleansing, Trading Systems, Trading, Statistical Modeling, XGBoost, Random Forests, Parquet, Seaborn, Training, Probabilistic Modeling, Time Series Forecasting, Logistic Regression, AWS IoT

Model Validation, Commodities - Associate

2017 - 2018

JPMorgan

Implemented a custom version of the extended Kalman filter from scratch to calibrate exotic option pricing models that outperformed the existing calibration methods.
Reviewed ten pricing models' options and their implementations in commodities and credit.
Measured and mitigated numerous model risks in collaboration with the desk and developers.
Mentored junior employees during their review work.

Technologies: Python, Derivative Pricing, Stochastic Modeling, Time Series Analysis, Machine Learning, Quantitative Analysis, Quantitative Modeling, Quantitative Finance, Quantitative Risk Analysis, Data Analysis, Financial Data, Forecasting, Data Analytics, Financial Modeling, NumPy, Reports, Quantitative Research, Finance, Mentorship & Coaching, AI Algorithms, Mathematical Modeling, Monte Carlo, Monte Carlo Simulations, Financial Market Data, Risk Management, Time Series Forecasting, Financial Markets

Algorithmic Trading - Quant Researcher

2016 - 2016

Credit Suisse

Designed and implemented two mid-frequency trading strategies for the commodity desk.
Analyzed portfolio hedging strategies using risk factors for the equity desk.
Implemented a data pipeline that cleaned and transformed tabular data for the equity desk.

Technologies: MATLAB, R, SQL, Python, Machine Learning, Time Series Analysis, Data Analysis, Financial Data, Regression, Statistical Analysis, Data Science, Forecasting, Data Analytics, Backtesting Trading Strategies, Trade Finance, NumPy, Data Pipelines, Quantitative Research, Finance, Time Series, Stock Market, Finance APIs, Feature Engineering, Financial Market Data, Data Cleansing, Trading Systems, Trading, XGBoost, Time Series Forecasting, Logistic Regression, Financial Markets

ML Research (Intern)

2015 - 2015

Novosibirsk State University

Wrote a research paper describing a metric that uses Fourier descriptors to compare shapes with internal gaps.
Implemented a classification algorithm that achieved 98% accuracy on a dataset with 19 classes of images.
Presented the results at the scientific conference MNSK 2015, Novosibirsk.

Technologies: OpenCV, Python, Computer Vision, Mathematics, Machine Learning, Jupyter Notebook, Data Analysis, NumPy, Feature Engineering

Experience

Cancer Treatment Research

https://www.milner.cam.ac.uk/machinelearning/

Research exploring the potential of metal-organic frameworks to enhance targeted cellular delivery of small molecules and large macromolecule drug combinations where I contributed to the initial stage of the research by leveraging embeddings and large language models (LLMs) to identify drug candidates with the greatest potential to save lives.

OpenAI GPT Telegram Bot

A Telegram bot powered by GPT with additional features.

I contributed as a data scientist (GPT model benchmarking, prompt engineering, and text embeddings), software developer (asynchronous Python code and the OpenAI API), project manager, and mentor to junior data scientists.

Stranger News

A website with news about events in a fictional universe, where the fictional news articles and images are generated daily based on real-world news using the OpenAI GPT model. The readers can influence how the story is told as the news unfolds.

Top 1 in Time Series Forecast Competition on Kaggle

https://www.kaggle.com/myster/eda-prophet-winning-solution-3-0

In 2018, before starting my career as a data scientist, I was diving into machine learning textbooks and experimenting with newly learned methods through various mini-projects. That’s when I discovered a Kaggle competition about predicting store sales. Time series analysis has always been one of my favorite topics, so I decided to jump in.

Exploring and visualizing the dataset was both fun and rewarding, as I uncovered interesting quirks in the data. Notably, I soon realized the dataset had been synthetically generated, which provided a crucial clue for solving the problem. In the end, my analysis paid off — my team secured first place!

Data Sanity Talks Website

https://datasanity.dev/

I designed and created a website for Data Sanity Talks, a series of data science events that aim to unite experts from diverse communities, industries, and nations and encourage the exchange of unique experiences and insights.

Interactive Website

https://datascienceforhire.net/

This is a simple personal website powered by Flask and Dash. It is run in a Docker container and has monitoring systems tracking web activity and errors. While I'm not specialized in web development, the ability to create a simple web interface to visualize data or machine learning model predictions can be very handy.

Data Science Examples

https://github.com/mysterious-ben/ds-examples/

A collection of Jupyter notebooks exploring various Data Science topics and I created multiple examples of how to apply machine learning, deep learning, experiment design, efficient numerical computation, and visualization. This is a work in progress.

Python Data Pipelining Tools

https://github.com/mysterious-ben/apipe

An open-source Python package to create data pipelines based on the Dask package. It features lazy computation and cache loading, pickle and parquet serialization, and support for hashing of NumPy arrays and pandas DataFrames.

Please check out my GitHub page to see other data science and data engineering packages.

Publication

Embeddings in Machine Learning: Making Complex Data Simple

https://www.toptal.com/developers/machine-learning/embeddings-in-machine-learning

Education

2015 - 2016

Master's Degree in Financial Mathematics

Université Pierre et Marie Curie - Paris, France

2012 - 2016

Master's Degree in Applied Mathematics

École Polytechnique - Paris, France

2012 - 2015

Master's Degree in Mathematics and Computer Science

Novosibirsk State University - Novosibirsk, Russia

2008 - 2012

Bachelor's Degree in Probability and Statistics

Novosibirsk State University - Novosibirsk, Russia

Skills

Libraries/APIs

Scikit-learn, Pandas, NumPy, Matplotlib, XGBoost, OpenCV, REST APIs, SQLAlchemy, SciPy, Python Asyncio, Dask, PyTorch, TensorFlow, Asyncio, AMQP, SpaCy, OpenAI API

Tools

Jupyter, Git, StatsModels, PyCharm, ChatGPT, Algorithm Design, AI Prompts, Claude, Claude Code, Amazon Athena, ActiveBatch, MATLAB, Kibana, Plotly, Boto 3, Ansible, GitHub, Bitbucket, Grafana, GIS, Tableau, AWS Command Line Interface (CLI), AWS Deployment, Prefect, Seaborn

Languages

Python, SQL, R, C++, Java, HTML, CSS, XML, JavaScript, Snowflake, Rust

Storage

Data Pipelines, Oracle SQL, PostgreSQL, Amazon S3 (AWS S3), SQLite, MongoDB, PostGIS, NoSQL

Frameworks

LightGBM, Spark, Flask, LangGraph

Paradigms

Object-oriented Programming (OOP), Quantitative Research, Agile Software Development, Functional Analysis, STOMP, DevOps, Real-time Systems

Platforms

Amazon Web Services (AWS), Jupyter Notebook, AWS IoT, Docker, Linux, MacOS, Visual Studio Code (VS Code), NVIDIA CUDA, Heroku, Ubuntu

Industry Expertise

Project Management, Trading Systems, High-frequency Trading (HFT)

Other

Predictive Modeling, Forecasting, Artificial Intelligence (AI), Data Analysis, Predictive Analytics, Data Science, Statistics, Machine Learning, Supervised Learning, Algorithmic Trading, Regression, Data Analytics, Backtesting Trading Strategies, Minimum Viable Product (MVP), Feature Engineering, Data Cleansing, Statistical Modeling, Time Series, Web Dashboards, Machine Learning Operations (MLOps), Algorithms, Time Series Analysis, Mathematics, Data Visualization, Stakeholder Engagement, Data Engineering, Option Pricing, Unsupervised Learning, Finance, Trading, Financial Markets, Remote Team Leadership, Financial Data, Dashboards, Quantitative Analysis, Quantitative Modeling, Quantitative Finance, Quantitative Risk Analysis, Statistical Analysis, Natural Language Processing (NLP), Financial Modeling, Numba, Financial Software, OpenAI, Metrics, Prompt Engineering, Technical Leadership, Bayesian Statistics, Large Language Models (LLMs), Stock Market, Retrieval-augmented Generation (RAG), AI Consulting, Data Structures, AI Algorithms, Mathematical Modeling, Monte Carlo, Monte Carlo Simulations, Architecture, Finance APIs, Financial Market Data, Real-time Data, Solution Architecture, LangChain, Agentic AI, Random Forests, Optimization, RAG Systems, AI Agents, Risk Management, AI Architecture, AI Model Training, Training, Time Series Forecasting, Logistic Regression, Code Deployment, Futures & Options, Energy, Systematic Trading, Deep Learning, Probability Theory, Mathematical Analysis, Applied Mathematics, Derivative Pricing, Chemistry, Stochastic Modeling, Stochastic Differential Equations, Econometrics, Economics, Computer Vision, Software Development, Genetic Algorithms, Dash, Data Mining, Equity Market Data, Cloud Services, Technical Hiring, Code Review, IT Project Management, Team Leadership, Big Data, APIs, OpenAI GPT-3 API, OpenAI GPT-4 API, Telegram Bots, IT Product Management, Trade Finance, Audio Processing, Numerical Methods, Reports, Applied Physics, Mentorship, Leadership, Mentorship & Coaching, Bayesian Inference & Modeling, CTO, ChatGPT Prompts, Coaching, Workshops, WiFi, Market Risk, Software Engineering, Open-source LLMs, Biotechnology, Investment Banking, Integration, Vector Databases, Geospatial Data, API Integration, AI Chatbots, Chatbots, Product Management, Lean Project Management, Communication, Community, Public Speaking, Hugging Face, Cython, Gemini, Cloud, Front-end Development, AI-generated Code, GitOps, Containerization, Business Development, Website Design, Web Development, Vector Search, Graphs, Linear Regression, System Design, Software Architecture, Back-end Development, Equities, Data Architecture, Spatial Analysis, Combinatorial Optimization, CI/CD Pipelines, Infrastructure, Parquet, Hetzner, Amazon Bedrock, Probabilistic Modeling, Time Series Data

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring