Sergei Markochev, Developer in London, United Kingdom
Sergei is available for hire
Hire Sergei

Sergei Markochev

Verified Expert  in Engineering

Bio

Sergei is a lead data science and AI/ML developer with extensive experience—over 15 years' worth. He has led end-to-end project delivery and provided technical expertise for complex decision problems for FTSE 100 companies and SME businesses. Sergei possesses a PhD in Physics, has one patent and six academic papers, and recently won 1st place in an international data science competition.

Portfolio

Ultraspeed Digital Limited
Data Science, Data Analysis, Python, SQL, Microsoft Power BI, ETL...
12435136 Canada Inc.
Data Science, Predictive Modeling, Pricing Models, SQL, Python, Dynamic Pricing...
CI&T
Amplitude, Jupyter Notebook, Snowflake, Data Analytics, A/B Testing, SQL, Jira...

Experience

  • Data Analysis - 14 years
  • Applied Mathematics - 14 years
  • Data Science - 10 years
  • Digital Signal Processing - 10 years
  • Software Development - 10 years
  • Machine Learning - 10 years
  • Nonlinear Optimization - 6 years
  • Deep Learning - 4 years

Availability

Part-time

Preferred Environment

Jupyter Notebook, Windows, Linux, Git, Python, Amazon Web Services (AWS), Visual Studio Code (VS Code)

The most amazing...

...algorithm I've developed was ranked number one at an aircraft localization data science competition hosted by AIcrowd.

Work Experience

Data Scientist/Data Analyst

2024 - PRESENT
Ultraspeed Digital Limited
  • Developed software and algorithms to simulate sensors' response to a person's footsteps.
  • Conducted research and carried out mathematical modeling of sprint athletes' kinematics.
  • Performed experiments with sensor equipment and preprocessed and analyzed data.
Technologies: Data Science, Data Analysis, Python, SQL, Microsoft Power BI, ETL, Data Analytics, Modeling, Applied Mathematics, Applied Physics, Analytics

Data Scientist

2022 - PRESENT
12435136 Canada Inc.
  • Developed a revenue management system for predicting optimal prices and revenue for some real estate properties.
  • Built a pricing engine for predicting the market price and elasticity of real estate properties.
  • Created data processing pipelines to increase data quality.
  • Implemented a chat interface using the ChatGPT model to allow users to retrieve modeling results.
Technologies: Data Science, Predictive Modeling, Pricing Models, SQL, Python, Dynamic Pricing, Forecasting, Machine Learning, ChatGPT, AI Agents, Large Language Models (LLMs), AWS Serverless Application Model (SAM), Amazon Web Services (AWS), Software Development, Monte Carlo Simulations, Mathematical Modeling, GitLab CI/CD, Data Preprocessing, Applied Mathematics, Revenue Modeling, LangChain, Transformer Models, Docker, FastAPI, AWS Lambda, Random Number Generation, AI Chatbots, OpenAI GPT-3 API, OpenAI GPT-4 API, AI Model Training, OpenAI, Prompt Engineering, Conversational AI, APIs, Data Architecture, Data Engineering, Reporting, Analytics, Unstructured Data Analysis, Artificial Intelligence (AI)

Data Science and Analytics Manager

2022 - PRESENT
CI&T
  • Developed a personalized upselling suggestion recommendation system for one of the top five worldwide fast food chain's mobile apps.
  • Deployed the new recommendation system for upsells into the production environment using Terraform and AWS services.
  • Planned and performed A/B testing of the new recommendation system for upsells, which showed over 50% improvement.
  • Led data analytics and data quality projects for CI&T international clients.
  • Championed data science in CI&T UK and developed PoC using generative AI models.
Technologies: Amplitude, Jupyter Notebook, Snowflake, Data Analytics, A/B Testing, SQL, Jira, Management, AWS Serverless Application Model (SAM), Recommendation Systems, AWS Step Functions, Amazon Web Services (AWS), Big Data, Data Scraping, Terraform, Python, Time Series, CI/CD Pipelines, Data Quality Analysis, ARIMA, Algorithms, Solution Architecture, XGBoost, Machine Learning Operations (MLOps), Docker, Transformer Models, FastAPI, AWS Lambda, Collaborative Filtering, AWS IoT, Data-informed Recommendations, Amazon SageMaker, Random Number Generation, OpenAI GPT-3 API, OpenAI GPT-4 API, AI Model Training, OpenAI, Prompt Engineering, APIs, Data Architecture, Data Engineering, Reporting, Analytics, Unstructured Data Analysis, Artificial Intelligence (AI), Computer Vision

Software Developer

2021 - 2023
Tellusant
  • Improved data quality and filled data gaps using machine learning and custom modeling.
  • Reviewed code and helped to build an MVP. Implemented time-series prediction.
  • Investigated opportunities to predict audiences for specific products through analysis of global data.
Technologies: Python, Mathematics, Algorithms, Azure, PostgreSQL, ARIMA, Time Series, Time Series Analysis, Analytics

Senior Data Scientist

2022 - 2022
Kainos
  • Developed a data-informed recommendation system for extraction, manipulation, and search of helpful information from employees' resumes. Used natural language processing (NLP) techniques.
  • Led data investigation and prototype model development for the client (a construction company).
  • Presented some advanced topics on application deployment on AWS for an internal deep dive session.
Technologies: Python, Python 3, Azure, Azure SQL, Jupyter Notebook, Machine Learning, Statistics, Dash, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Software, SpaCy, Amazon Web Services (AWS), Data Scraping, Recommendation Systems, Data-informed Recommendations, Document Parsing, Unstructured Data Analysis, Artificial Intelligence (AI), Computer Vision

Machine Learning (ML) Engineer

2021 - 2022
Bowen & Associates Ltd.
  • Developed a state-of-the-art ML model to predict commercial property prices.
  • Deployed the ML model on AWS to test its predictions.
  • Advised the client on advances and limitations of the model, data quality, and deployment for testing.
Technologies: Machine Learning, Classification Algorithms, Regression Modeling, AI Model Training, APIs, Reporting, Analytics, Artificial Intelligence (AI)

Lead Data Scientist

2018 - 2022
GroupM
  • Productionized three apps related to investigating and optimizing global TV ad schedules.
  • Developed a cross-media data fusion model with an external deduplication data set.
  • Predicted digital behavior for target audiences defined by TV show viewership and vice versa using ML techniques.
  • Created Looker dashboards to present POCs and data insights.
  • Developed deep learning models of reach curves for individual TV channels and other combinations.
  • Carried out a bespoke analysis for multibillion-dollar stakeholders.
  • Communicated results to stakeholders and product managers. Managed and hired data scientists.
Technologies: Machine Learning, Data Analysis, SQL, Data Cleaning, Cython, Software Development, Agile, R, Looker, Deep Learning, Nonlinear Optimization, Clustering, Git, Pandas, Keras, Artificial Intelligence (AI), Data Science, Quantitative Research, Amazon Web Services (AWS), Data Visualization, ETL, MySQL, Python, Data Analytics, Scikit-learn, Dashboard Development, Time Series, NumPy, SciPy, Jupyter Notebook, Statistics, Unsupervised Learning, Big Data, XGBoost, AWS Lambda, Collaborative Filtering, Random Number Generation, AI Model Training, Data Engineering, Reporting, Analytics

Data Scientist (Python)

2021 - 2021
Applied AI LLC
  • Developed an ML model to classify the content of industry-specific PDF documents.
  • Investigated different approaches (ML, NLP, and statistical) to the modeling of document content.
  • Assisted the client on best practices and models during the project.
Technologies: Python, PDF Scraping, Data Science, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Data Analytics, Machine Learning, Deep Learning, Transformer Models, Document Parsing, AI Model Training, Unstructured Data Analysis, Artificial Intelligence (AI), Computer Vision

Battery Analytics Scientist

2015 - 2018
BBOXX LTD
  • Invented and deployed a patented state-of-the-art algorithm for remote capacity estimation of lead-acid batteries by their telemetry.
  • Produced insights on battery performance and customer usage patterns to reduce battery failure maintenance.
  • Developed advanced alerting and anomaly detection systems to monitor over 100,000 solar panels’ performance (broken sensors, tampering, heavy usage, and so on).
  • Developed a Bayesian survival model for the prediction of battery failure rate in the future.
Technologies: Digital Signal Processing, Data Analysis, Machine Learning, Nonlinear Optimization, Data Cleaning, Cython, Software Development, Agile, Linux, Monte Carlo Simulations, SQL, Bayesian Inference & Modeling, Clustering, Git, Pandas, Object-oriented Programming (OOP), Mathematics, Data Science, Amazon Web Services (AWS), Data Visualization, MySQL, Python, Data Analytics, Scikit-learn, Dashboard Development, Time Series, NumPy, SciPy, PostgreSQL, Jupyter Notebook, Unsupervised Learning, Big Data, Random Number Generation, Analytics, Artificial Intelligence (AI)

Assistant

2009 - 2014
Moscow Institute of Physics and Technology
  • Supported and organized the educational process, conducted courses, and supervised bachelor degree routes.
  • Organized and provided the department’s section at the annual university conference.
  • Led laboratory courses and seminars on atomic physics and optics.
Technologies: University Teaching, LaTeX, Applied Physics

Senior Research Associate

2007 - 2014
Central Institute of Chemistry and Mechanics
  • Led the experimental research on rare nuclear decays (published in five academic papers and reported on in four international conferences).
  • Developed a fully automated digital spectroscopic system for the investigation of rare nuclear decays (Ph.D. thesis).
  • Carried out data analyses and Monte Carlo simulations.
Technologies: Data Analysis, Monte Carlo Simulations, University Teaching, C++, Digital Signal Processing, Software Development, Data Cleaning, Applied Mathematics, Applied Physics, MATLAB, Object-oriented Programming (OOP), Mathematics, Statistics

Aircraft Localization Competition

https://github.com/smarkochev/Aircraft_localization_competition_round_2
In this competition, participants determine the aircraft positions based on time of arrival and signal strength measurements reported by many low-cost crowdsourced sensors. Only some receivers provide GPS-synchronized timestamps, while others experience strong clock drifts or provide fully broken timestamps.

The competition was organized by the Swiss Cyber-Defence Campus of Armasuisse Science and Technology. The data was collected by the OpenSky Network, a large-scale ADS-B sensor network for research.

• https://www.aicrowd.com/challenges/cyd-campus-aircraft-localization-competition/leaderboards

Prediction of Customer Spending

https://github.com/smarkochev/ds_notebooks/
A data analysis of customer purchase history and prediction on their total spending in the future using Bayesian modelling and Monte Carlo simulation.

Notebook:
• Prediction of customer spending.ipynb

Expedia Hotel Sales | Kaggle Competition

https://www.kaggle.com/c/hotelsales/
A Kaggle indoor competition aimed at predicting hotel sales for the first 10 days for a subset of new Expedia hotels (for which Expedia has no historical data).

I was ranked #1 among 19 teams proposing a combination of machine learning models.

Rail-ticket Price Prediction

https://github.com/smarkochev/ds_notebooks
Ticket prices change based on demand and time, and there can be a significant difference in price. In these two notebooks, I investigated the possibility of developing a pricing monitoring system for Spanish high-speed trains using data from Kaggle datasets.

Notebooks:
• Rail_ticket_price_prediction_IDE.ipynb
• Rail_ticket_price_prediction_modelling.ipynb

Statoil Kaggle Competition

https://github.com/smarkochev/ds_notebooks
Drifting icebergs present threats to navigation and activities in areas such as offshore of the East Coast of Canada. In this competition, I was challenged to build an algorithm that automatically identifies if a remotely sensed target is a ship or iceberg.

Notebooks:
• Statoil_Kaggle_competition_main.ipynb
• Statoil_Kaggle_competition_google_colab_notebook.ipynb
• Statoil_Kaggle_competition_DL_comparison.ipynb
2008 - 2013

Ph.D. in Nuclear Physics

Moscow Institute of Physics and Technology - Moscow, Russia

2006 - 2008

Master's Degree in Applied Mathematics and Physics

Moscow Institute of Physics and Technology - Moscow, Russia

2002 - 2006

Bachelor's Degree in Applied Mathematics and Physics

Moscow Institute of Physics and Technology - Moscow, Russia

OCTOBER 2019 - PRESENT

Probabilistic Graphical Models Specialization

Stanford University | via Coursera

JULY 2019 - PRESENT

Advanced Data Science with IBM Specialization

IBM | via Coursera

Libraries/APIs

Pandas, Scikit-learn, NumPy, SciPy, XGBoost, Matplotlib, Keras, PyMC, Spark ML, TensorFlow, PySpark, SpaCy

Tools

ARIMA, AWS Step Functions, Terraform, ChatGPT, GitLab CI/CD, MATLAB, Looker, Git, LaTeX, Microsoft Power BI, Spark SQL, Amazon SageMaker, LaunchDarkly, Jira

Languages

SQL, Python, Python 3, Snowflake, Octave, R, C++

Paradigms

Quantitative Research, Management, Agile, Object-oriented Programming (OOP), ETL

Platforms

Jupyter Notebook, Amazon Web Services (AWS), Visual Studio Code (VS Code), AWS IoT, Linux, Docker, AWS Lambda, Azure

Storage

MySQL, PostgreSQL, Azure SQL

Frameworks

AWS Serverless Application Model (SAM), Spark

Other

Applied Mathematics, Data Analysis, Digital Signal Processing, Machine Learning, Data Cleaning, Nonlinear Optimization, University Teaching, Software Development, Clustering, Applied Physics, Mathematics, Scientific Data Analysis, Data Analytics, Data Science, Data Visualization, Time Series, Artificial Intelligence (AI), Dash, Software, Classification Algorithms, Regression Modeling, Algorithms, Recommendation Systems, Revenue Management, Pricing, Big Data, Data Quality Analysis, Solution Architecture, Pricing Models, Dynamic Pricing, Forecasting, Large Language Models (LLMs), Mathematical Modeling, Data Preprocessing, Revenue Modeling, Modeling, Time Series Analysis, Random Number Generation, Data-informed Recommendations, Statistical Modeling, Analytics, Monte Carlo Simulations, Deep Learning, Cython, Bayesian Inference & Modeling, Predictive Modeling, Dashboard Development, Computer Vision, Multithreading, Unsupervised Learning, Statistics, Natural Language Processing (NLP), A/B Testing, Generative Pre-trained Transformers (GPT), Data Scraping, CI/CD Pipelines, AI Agents, LangChain, Machine Learning Operations (MLOps), Transformer Models, Collaborative Filtering, AI Chatbots, Principal Component Analysis (PCA), Document Parsing, OpenAI GPT-3 API, OpenAI GPT-4 API, AI Model Training, OpenAI, Prompt Engineering, APIs, Data Architecture, Data Engineering, Reporting, Unstructured Data Analysis, PDF Scraping, Amplitude, mParticle, FastAPI, Content-based Filtering, Sensor Data, Conversational AI

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring