Yaroslav Kopotilov, Developer in Belgrade, Serbia
Yaroslav is available for hire
Hire Yaroslav

Yaroslav Kopotilov

Verified Expert  in Engineering

Bio

Yaroslav is a full-stack data scientist with experience in business analysis, predictive modeling, data visualization, data orchestration, and deployment. He leverages a wide range of machine learning methods, statistics, and business insights to find just the right solution for a problem. Above everything else, Yaroslav aims to deliver a project that would be truly useful for his clients.

Portfolio

Invisible Technologies Inc
Prompt Engineering, Artificial Intelligence (AI), Machine Learning...
Bumbee Labs Ab
Data Science, Python, Machine Learning, Applied Physics, PostGIS, AMQP, Asyncio...
YAFinData
Python, SQL, IT Project Management, IT Product Management, Data Analytics...

Experience

Availability

Part-time

Preferred Environment

Git, Jupyter, PyCharm, Linux, Visual Studio Code (VS Code), SQL, Python

The most amazing...

...thing I've developed is an algorithmic trading strategy powered by multiple data pipelines and one ML model running 24/7.

Work Experience

Prompt and Software Engineer (via Toptal)

2024 - 2024
Invisible Technologies Inc
  • Developed an internal tool for prompt prototyping at a scale in Python.
  • Architected and refined several methods for the evaluation of LLM responses.
  • Worked with a variety of closed-source and open-source LLM models.
Technologies: Prompt Engineering, Artificial Intelligence (AI), Machine Learning, Data Science, Python, Software Engineering, Asyncio, Large Language Models (LLMs)

Senior Data Scientist

2024 - 2024
Bumbee Labs Ab
  • Analyzed visit count computation and suggested an algorithm that reduces the out-of-sample model error by 2x.
  • Accelerated historical sample data processing in Python by more than 50x using more efficient functions and just-in-time compilation. Reduced the processing time for one day of sample data from one hour to one minute.
  • Built a 24/7 data pipeline that consumed WiFi sample data from multiple installations via Advanced Message Queuing Protocol (AMQP).
Technologies: Data Science, Python, Machine Learning, Applied Physics, PostGIS, AMQP, Asyncio, Data Pipelines, WiFi, Time Series Analysis, Heroku, PostgreSQL

Founder | Lead Developer

2022 - 2024
YAFinData
  • Designed and built a financial data and data analytics platform. The data is shipped in a unified, user-friendly format and can be accessed via a web app and REST API.
  • Managed a remote team of up to five developers. Determined the overall direction of product development.
  • Analyzed trading opportunities in the UK electricity markets. Backtested several short-term algorithmic strategies. Estimated PnL and risks, accounting for slippage and market impact.
  • Created several 24/7 ETL pipelines that collect, clean, and save data for the UK electricity market. Implemented downstream features that are continuously computed from the data feeds in less than 10 ms.
  • Developed CI/CD, a backup raw file storage, a parallel redundancy, and a monitoring system to ensure the data collection functions smoothly 24/7.
Technologies: Python, SQL, IT Project Management, IT Product Management, Data Analytics, Data Science, Machine Learning, Web Dashboards, Deep Learning, Data Pipelines, Metrics, Dashboards, Team Leadership, Leadership, Technical Leadership, Finance, Time Series, CTO, PostgreSQL, Algorithmic Trading, Market Risk

Developer | Analyst

2020 - 2021
TickUp AB
  • Analyzed and unified multiple datasets for US equity markets.
  • Developed an ML model and several data pipelines of an algorithmic trading strategy.
  • Wrote and reviewed both research notebooks and production code.
  • Organized a seven-day company meetup, which helped boost team productivity and collaboration.
Technologies: Algorithms, Python, Statisticians, Trading, Financial Markets, Data Mining, Algorithmic Trading, Time Series Analysis, Equity Market Data, Docker, Jupyter Notebook, Data Visualization, Financial Data, Code Review, SQL, Git, GitHub, Data Analysis, Regression, Statistical Analysis, Data Science, Forecasting, Data Analytics, Backtesting Trading Strategies, Trade Finance, NumPy, Numba, NVIDIA CUDA, Data Pipelines, Metrics, Quantitative Research, Finance, Time Series, PostgreSQL, Stock Market

Energy Trading - Data Scientist

2019 - 2020
Vitol
  • Created market analysis tools and systematic strategies for coal, power, and crude desks. Covered all phases of a data science project, including project setup, data pipelines, modeling, and deployment.
  • Analyzed the firm-wide trading market impact under different execution styles.
  • Worked with both small (50 data points) and large (several terabytes) datasets.
  • Contributed individually and in collaboration with the data science and IT teams.
  • Assisted Vitol's employees in Python and machine learning training.
Technologies: ActiveBatch, Kibana, Amazon Athena, Amazon S3 (AWS S3), Git, Oracle SQL, Python, Time Series Analysis, Machine Learning, Data Science, Software Development, Data Engineering, Jupyter Notebook, Pandas, Algorithmic Trading, Data Visualization, Bitbucket, Dashboards, Amazon Web Services (AWS), Dash, Web Dashboards, Big Data, Data Analysis, Financial Data, Regression, Statistical Analysis, Forecasting, Data Analytics, Backtesting Trading Strategies, Trade Finance, NumPy, Data Pipelines, Metrics, Finance, Time Series, Mentorship & Coaching, Bayesian Statistics, Bayesian Inference & Modeling, Coaching, Workshops

Model Validation, Commodities - Associate

2017 - 2018
JPMorgan
  • Implemented a custom version of the extended Kalman filter from scratch to calibrate exotic option pricing models that outperformed the existing calibration methods.
  • Reviewed ten pricing models' options and their implementations in commodities and credit.
  • Measured and mitigated numerous model risks in collaboration with the desk and developers.
  • Mentored junior employees during their review work.
Technologies: Python, Derivative Pricing, Stochastic Modeling, Time Series Analysis, Machine Learning, Quantitative Analysis, Quantitative Modeling, Quantitative Finance, Quantitative Risk Analysis, Data Analysis, Financial Data, Forecasting, Data Analytics, Financial Modeling, NumPy, Reports, Quantitative Research, Finance, Mentorship & Coaching

Algorithmic Trading (Intern)

2016 - 2016
Credit Suisse
  • Designed and implemented two mid-frequency trading strategies for the commodity desk.
  • Analyzed portfolio hedging strategies using risk factors for the equity desk.
  • Implemented a data pipeline that cleaned and transformed tabular data for the equity desk.
Technologies: MATLAB, R, SQL, Python, Machine Learning, Time Series Analysis, Data Analysis, Financial Data, Regression, Statistical Analysis, Data Science, Forecasting, Data Analytics, Backtesting Trading Strategies, Trade Finance, NumPy, Data Pipelines, Quantitative Research, Finance, Time Series, Stock Market

Research (Intern)

2015 - 2015
Novosibirsk State University
  • Wrote a research paper describing a metric that uses Fourier descriptors to compare shapes with internal gaps.
  • Implemented a classification algorithm that achieved 98% accuracy on a dataset with 19 classes of images.
  • Presented the results at the scientific conference MNSK 2015, Novosibirsk.
Technologies: OpenCV, Python, Computer Vision, Mathematics, Machine Learning, Jupyter Notebook, Data Analysis, NumPy

Interactive Website

https://datascienceforhire.net/
This is a simple personal website powered by Flask and Dash. It is run in a Docker container and has monitoring systems tracking web activity and errors. While I'm not specialized in web development, the ability to create a simple web interface to visualize data or machine learning model predictions can be very handy.

Yet Another XML Parser

https://github.com/mysterious-ben/xmlrecords
This is a simple yet efficient Python package to parse XML. The package is written specifically for the fast extraction of tabular data (unlike xmltodict, which handles XML of any structure but slower). XML is not the most data science-friendly format, so the ability to transform it to Pandas or SQL can be very handy.

Top 1 in Time Series Forecast Competition on Kaggle

https://www.kaggle.com/myster/eda-prophet-winning-solution-3-0
In 2018, before I started to work as a data scientist, I was studying textbooks on machine learning and testing the newly learned methods in various mini-projects. That's when I found this competition about predicting store sales on Kaggle. Time series is one of my favorite subjects, so I jumped in.
It was very fun to explore and visualize the dataset, to find interesting quirks in it. In particular, soon it became clear that this data had been synthetically generated, which gave out an important clue on how to solve this problem. And it was very exciting that in the end, my analysis paid off and I scored the first place!
Also, I was working on this project with my ex-colleague, so it was a good collaborative experience with just a touch of project management. Of course, it was far from the complexity of managing a real data science project—still, it gave me at least some sense of what might be waiting ahead.

Python Data Pipelining Tools

https://github.com/mysterious-ben/apipe
An open-source Python package to create data pipelines based on the Dask package. It features lazy computation and cache loading, pickle and parquet serialization, and support for hashing of NumPy arrays and pandas DataFrames.

GPT Telegram Bot

https://t.me/ok_gpt_bot
A Telegram bot powered by GPT with additional features such as custom roles.

I contributed as a data scientist (GPT model benchmarking, prompt engineering, and text embeddings), software developer (asynchronous Python code and the OpenAI API), project manager, and mentor to junior data scientists.

Data Science Examples

https://github.com/mysterious-ben/ds-examples/
A collection of Jupyter notebooks exploring various Data Science topics and I created multiple examples of how to apply machine learning, deep learning, experiment design, efficient numerical computation, and visualization. This is a work in progress.

Stranger News

https://stranger.news/
A website with news about events in a fictional universe.

The fictional news articles and images are generated daily based on real-world news using the OpenAI GPT model. The readers can influence how the story is told as the news unfolds.

Story-driven Text-based Game

A story-driven game where the player's decisions influence how the events in a fictional kingdom play out.

LLMs generate events and possible player choices. The player's decisions determine how the world will change and, eventually, what destiny awaits the kingdom.
2015 - 2016

Master's Degree in Financial Mathematics

Université Pierre et Marie Curie - Paris, France

2012 - 2016

Master's Degree in Applied Mathematics

École Polytechnique - Paris, France

2012 - 2015

Master's Degree in Mathematics and Computer Science

Novosibirsk State University - Novosibirsk, Russia

2008 - 2012

Bachelor's Degree in Probability and Statistics

Novosibirsk State University - Novosibirsk, Russia

Libraries/APIs

Scikit-learn, Pandas, NumPy, Matplotlib, OpenCV, REST APIs, SQLAlchemy, SciPy, Python Asyncio, Dask, PyTorch, TensorFlow, Asyncio, AMQP

Tools

Jupyter, Git, StatsModels, PyCharm, ChatGPT, Amazon Athena, ActiveBatch, MATLAB, Kibana, Plotly, Boto 3, Ansible, GitHub, Bitbucket, Grafana

Languages

Python, SQL, R, C++, Java, HTML, CSS, XML

Storage

Data Pipelines, Oracle SQL, PostgreSQL, Amazon S3 (AWS S3), SQLite, MongoDB, PostGIS

Frameworks

LightGBM, Spark, Flask

Paradigms

Object-oriented Programming (OOP), Quantitative Research, Agile Software Development, Functional Analysis, STOMP

Platforms

Jupyter Notebook, Docker, Linux, MacOS, Amazon Web Services (AWS), Visual Studio Code (VS Code), NVIDIA CUDA, Heroku

Industry Expertise

Project Management

Other

Predictive Modeling, Forecasting, Data Analysis, Predictive Analytics, Data Science, Statisticians, Machine Learning, Supervised Learning, Algorithmic Trading, Regression, Data Analytics, Backtesting Trading Strategies, Time Series, Web Dashboards, Artificial Intelligence (AI), Time Series Analysis, Mathematics, Data Visualization, Stakeholder Engagement, Data Engineering, Option Pricing, Unsupervised Learning, Finance, Trading, Financial Data, Dashboards, Quantitative Analysis, Quantitative Finance, Quantitative Risk Analysis, Statistical Analysis, Financial Modeling, Numba, Financial Software, OpenAI, Metrics, Prompt Engineering, Bayesian Statistics, Stock Market, Machine Learning Operations (MLOps), Code Deployment, Algorithms, Futures & Options, Energy, Systematic Trading, Deep Learning, Probability Theory, Mathematical Analysis, Applied Mathematics, Derivative Pricing, Chemistry, Stochastic Modeling, Stochastic Differential Equations, Econometrics, Economics, Computer Vision, Software Development, Genetic Algorithms, Dash, Financial Markets, Data Mining, Equity Market Data, Cloud Services, Remote Team Leadership, Technical Hiring, Code Review, IT Project Management, Team Leadership, Quantitative Modeling, Big Data, APIs, OpenAI GPT-3 API, OpenAI GPT-4 API, Telegram Bots, Natural Language Processing (NLP), IT Product Management, Trade Finance, Audio Processing, Numerical Methods, Reports, Applied Physics, Mentorship, Leadership, Technical Leadership, Mentorship & Coaching, Bayesian Inference & Modeling, CTO, Large Language Models (LLMs), ChatGPT Prompts, Coaching, Workshops, WiFi, Market Risk, Software Engineering

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring