
Eugene Balkind
Verified Expert in Engineering
Data Scientist and ML Developer
London, United Kingdom
Toptal member since June 13, 2022
Eugene is a skilled data scientist with a strong academic and industrial background in time series analysis, LLMs, and other ML technologies. Eugene has created classification models that predict positive or negative outcomes of COVID-19 tests and models that determine whether a company is a good acquisition. He has also built data hubs, completed cross-validation testing, and adjusted and improved models to adapt to quickly changing requirements. He is also proficient in OpenAI API.
Portfolio
Experience
- Python - 9 years
- Machine Learning - 9 years
- Time Series Forecasting - 8 years
- Artificial Intelligence (AI) - 5 years
- Marketing Mix Modeling - 5 years
- Large Language Models (LLMs) - 4 years
- Agentic AI - 3 years
- Databricks - 2 years
Preferred Environment
Python 3, TensorFlow, Pandas, Mathematics, Regression, Amazon Web Services (AWS), SQL, ChatGPT, Amadeus, Azure
The most amazing...
...project I've done was COVID-19 testing automation. Lab performance improved from 300 analyzed samples a day to 30,000, with the ability to go up to 100,000.
Work Experience
Senior Data Scientist
McKinsey & Company
- Owned end-to-end delivery and technical architecture for a firm-wide AI agent, deployed to internal users across enterprise workflows.
- Built LangChain-based agentic workflows combining RAG, multi-step reasoning, database querying, and tool orchestration.
- Designed retrieval patterns, database interaction, orchestration logic, evaluation flows, and reliability controls for complex enterprise use cases.
- Implemented observability, logging, guardrails, fallback behaviors, and Opik-based quality tracking to improve reliability and support continuous iteration.
- Built automated evaluation and testing workflows to compare model and agent variants, detect regressions, assess reasoning quality, and identify hallucination risks, edge cases, and failure modes.
- Partnered with engineering teams to deploy the system using Docker, GitHub CI/CD, APIs, and maintainable software engineering practices.
Marketing Mix Modeling Data Scientist
Wheelhouse Interactive, LLC
- Migrated an existing MMM solution from PyMC to Google Meridian, adapting model structure, workflow logic, and implementation patterns for production use.
- Partnered with DevOps to deploy the solution within the client’s infrastructure and engineering standards.
- Improved maintainability, reproducibility, and production readiness of the MMM codebase for ongoing marketing effectiveness analysis.
Senior Data Scientist
Next
- Designed and delivered a modular Bayesian Marketing Mix Modeling suite using PyMC and Google Meridian, enabling rapid iteration across brands, channels, and planning scenarios.
- Built production-ready MMM workflows in Databricks with MLflow experiment tracking and model registry, improving reproducibility, auditability, and model comparison.
- Owned end-to-end MMM delivery from data preparation and model diagnostics through to scenario planning, budget optimization, and stakeholder recommendations.
- Applied Causal Impact analysis where data volume or campaign structure was insufficient for robust MMM, enabling pragmatic measurement of campaign uplift and business impact.
- Partnered with marketing and finance stakeholders to embed MMM outputs into budget planning, ROI analysis, and post-campaign evaluation.
Lead Data Scientist
Tropicana Brands - Main
- Led the machine learning function, setting strategic direction across different business areas.
- Enhanced supply chain planning accuracy by 13% by building a demand forecasting system using Databricks on Azure, deep learning, LightGBM, and Prophet, leading to reduced operational costs and environmental impact.
- Launched NLP customer-complaints analytics using SQL, FastAPI, and HuggingFace to identify churn drivers, product-quality issues, and customer dissatisfaction from unstructured feedback.
- Delivered a planogram extraction tool using the OpenAI API to automate in-store layout checks and compliance review.
- Led development, evaluation, and deployment of an enterprise LLM assistant for Tropicana Brands Group using Llama, LangChain, RAG, pgvector, Chroma, and Elasticsearch.
- Designed prompts, retrieval patterns, source-reference behaviors, and evaluation criteria for correctness, reasoning quality, and robustness.
- Used Opik to compare LLM and retrieval variants, identify hallucination risks, weak retrieval behaviors, reasoning gaps, and failure modes, and then iterated on system design to improve reliability.
- Packaged the LLM assistant as a FastAPI service and deployed it on Azure using production-focused engineering practices.
- Set technical priorities and coordinated delivery across applied AI, commercial analytics, and stakeholder-facing data science workstreams.
Marketing Mix Modeling Data Scientist
Minoro LTD
- Reviewed and validated TensorFlow-based Marketing Mix Models to confirm model integrity, improve confidence in performance outputs, and support data-driven marketing investment decisions.
- Designed a new suite of Marketing Mix Models using Bayesian statistical methods and PyMC to improve model interpretability, uncertainty quantification, and decision reliability.
- Modeled full-funnel sales performance to evaluate the impact of marketing activity across the customer journey, from awareness through to conversion.
- Applied adstock and saturation transformations to capture lagged media effects, diminishing returns, and the non-linear relationship between investment and sales response.
- Built repeatable modeling workflows and documentation to improve transparency, governance, and scalability of Marketing Mix Modeling capabilities.
Senior AI/ML Predictive Modeling Engineer
What Are the Chances
- Developed an NLP algorithm using Transformers and PyTorch that identifies rude and bullying responses. This involved understanding the nuances of language and identifying harmful interactions.
- Created an algorithm based on the OpenAI API (GPT-3.5-Turbo, which powers ChatGPT) that predicts the approximate probability of any event.
- Designed an ecosystem to process and store data using SQL, pandas, and AWS. This allowed for streamlined data management.
- Deployed the model on AWS using both Lambda and Flask.
Assistant Director in Data Science and Machine Learning
EY
- Devised a classification model for imbalanced financial data that predicted whether a company is a good acquisition candidate using scikit-learn, imbalanced-learn, TPOT, and TensorFlow via the Keras interface.
- Improved the number of potential M&A clients by approximately 80% compared to the previous, personal experience-motivated approach.
- Deployed the model with Azure, Databricks, and MLflow.
- Collaborated with data engineers and DevOps to handle data correctly. Used SQL and PySpark to pull and format data from local and external sources.
- Formulated external data requests for the data manager.
- Validated the model with recall and F1 metrics. Employed cross-validation for further tests.
- Participated in regular meetings with stakeholders to formulate and reformulate the problem.
Online Tutor
University of London
- Tutored data analysis with Python employing Pandas, Matplotlib, Seaborn, and Scikit-Learn.
- Taught a theoretical course in artificial intelligence.
- Tutored the field of neural networks with TensorFlow and Keras. Tutoring involved assisting students with their technical queries while keeping close contact with a senior lecturer.
Data Scientist
University of Southampton
- Sped up the testing process in the first lab in the UK, where COVID-19 testing can be fully automated. We moved the lab from a prototype processing several hundred tests daily to 30,000—potentially increasing to 100,000 daily.
- Built a model (classification with scikit-learn, imblearn, and TensorFlow via Keras interface) that predicts positive or negative outcomes of a COVID-19 test.
- Developed SQL database solutions to store and retrieve data. Migrated data from legacy systems (local file systems) to new solutions (PostgreSQL and AWS), leading to significant performance improvements.
- Improved the existing Python codebase responsible for the automation of the laboratory information management system (LIMS) and data collection from the robots and biomedical professionals to support larger data volumes—up to 100,000 items per day.
- Contributed to the LIMS' back end and Flask app endpoints.
- Collaborated closely with testers and biomedical scientists to adjust the LIMS app and model to their changing requirements.
Online Lecturer
StackwisR
- Created several online courses in machine learning (regression, classification, clustering, deep learning, time series, marketing mix modeling, and computer vision) with Python.
- Filmed several online courses in machine learning (regression, classification, clustering, deep learning, time series, marketing mix modeling, and computer vision) with python.
- Included basic courses in NumPy, Pandas, Scikit-Learn, Matplotlib, and TensorFlow with Keras.
Co-founder
EUCOIN
- Built an ecosystem to analyze the crypto exchange stream.
- Created algorithmic cryptocurrency and trading algorithms.
- Used machine learning to analyze cryptocurrency data.
Data Scientist
MC&C Media
- Built machine learning models (time series analysis via marketing mix modeling regression with scikit-learn) to analyze the performance of the clients' advertising and optimize their advertising budget.
- Created a data hub that now stores all the company and clients' data, making the analysis process easier using SQL, Python, and R.
- Collected and analyzed data from various sources (clients' databases) using exploratory data analysis (EDA) with SQL, Pandas, Matplotlib, and Seaborn.
- Collaborated closely with the marketing team and advertising consultants.
PhD Student
Royal Holloway
- Tutored all the university maths to year one, year two, and year three students. Tutoring included example classes, lecturing, and marking. Obtained a Teaching Commendation award for excellence in teaching in 2014.
- Created a mathematical model of magnetic skyrmions on Fourier lattice with Python.
- Deployed the mathematical model of magnetic skyrmions on Fourier lattice with AWS.
Experience
Marketing Mixed Modeling for Advertising
To build the linear regression model, I performed feature engineering, hyperparameters tuning, and lag and adstock adjustments to ensure that the model accurately predicted the client's ROI. Once the model worked, I used it to answer clients' questions about ROI and provided them with actionable insights.
I regularly updated the model with new data to provide valuable long-term insights to the client. Through this project, I demonstrated my expertise in data analysis and statistical modeling and my ability to apply this knowledge to real-world business problems.
Cryptocurrency Stream Analysis and Arbitrage Bot
My responsibilities included collecting and formatting the data from various cryptocurrency streams to ensure the data was compatible with the algorithm. I then conducted extensive data analysis to identify trends and patterns in the data and used this information to suggest optimal trading strategies.
The algorithm was designed to identify arbitrage opportunities between different cryptocurrencies, including BTC (or ETH), altcoins, and USDT.
In addition to the aforementioned algorithm that analyzed cryptocurrency streams, I used LSTM to predict future rates of cryptocurrencies. By incorporating LSTM into the algorithm, I created a more sophisticated model that could make more accurate predictions based on historical data.
The LSTM model was trained on historical cryptocurrency data, allowing it to learn patterns and trends in the data. This information was then used to predict the future values of the cryptocurrencies, allowing for more informed trading decisions.
Recommendation System for a Building Company
To begin the project, I collected and formatted client data to ensure compatibility with the recommendation system. I then conducted extensive feature engineering to identify key features that could be used in the clustering model.
Using the identified features, I built a clustering model capable of accurately identifying and grouping clients based on their needs and preferences. Once the clustering model was working, I suggested recommended projects to the existing clients based on the needs and preferences of similar clients in the cluster.
Job Search App
I incorporated NLP techniques to improve skills matching to further enhance the script's accuracy. By analyzing the job descriptions and identifying keywords related to data science skills, the script was able to identify suitable job postings that matched the skills and requirements of the client.
Once the suitable jobs were identified, they were added to the database for future analysis. This allowed for easier tracking of suitable job postings and ensured clients were quickly informed of potential job opportunities.
App to Find All Connections from Point A to Point B
I collected and processed data from various sources, including APIs, Amadeus API, and web scraping using Selenium. This allowed for a wide range of transportation options in the app.
Although the app was initially developed as a prototype, there is potential to expand it and make it available to a broader audience. This would require further development and data collection, but the initial prototype provides a solid foundation for future work in this area.
Education
PhD in Computational Theoretical Physics
Royal Holloway University of London - London, UK
Master's Degree in Theoretical Physics
University of Manchester - Manchester, UK
Skills
Libraries/APIs
Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, TensorFlow, Imbalanced-learn, PySpark, PyBrain, XGBoost, Keras, REST APIs, Google Analytics API, Spark ML, PyTorch, OpenAI API, Llama API, Claude API, PyMC, JAX
Tools
LaTeX, Git, Mathematica, Pytest, Jira, Tree-Based Pipeline Optimization Tool (TPOT), Seaborn, MATLAB, gnuplot, Hidden Markov Model, AutoML, Amazon SageMaker, Jupyter, Google Analytics, ChatGPT, GitHub Copilot, Claude, Terminal
Languages
Python 3, Python, SQL, C++11, C++, Bash, R, YAML, Bash Script, Snowflake
Platforms
Ubuntu, Linux, Azure, Databricks, Amazon Web Services (AWS), Jupyter Notebook, Blockchain, Docker
Paradigms
Object-oriented Programming (OOP), Testing, Automation, Agile, Agile Software Development, B2B, ETL, Model Context Protocol (MCP)
Storage
Database Migration, Databases, Database Modeling, PostgreSQL, Amazon S3 (AWS S3), Redis
Frameworks
Flask, Selenium, Spark, Apache Spark, LightGBM, Agentic Frameworks
Industry Expertise
Bioinformatics
Other
Mathematics, Regression, Physics, University Teaching, Mathematical Modeling, Marketing Mix Modeling, Machine Learning, Linear Regression, Advanced Physics, Calculus, Quantitative Calculus, Statistics, Statistical Methods, Probability Theory, Differential Equations, Partial Differential Equations, Computational Physics, Eigenvectors, Linear Algebra, Mathematical Analysis, Applied Mathematics, Mathematical Programming, Matrix Algebra, Time Series, Time Series Analysis, Data Visualization, Data Analysis, EDA, Data Science, Artificial Intelligence (AI), Neural Networks, Deep Neural Networks (DNNs), Artificial Neural Networks (ANN), Predictive Modeling, Large Language Models (LLMs), Agentic AI, Agentic AI Systems, Deep Learning, Data Migration, Data Governance, Data Management, Computational Biological Physics, Markov Model, Geolocation, MLflow, Algorithmic Trading, Algorithmic Trading Analysis, Cryptocurrency, Bitcoin, Quantum Computing, Stochastic Differential Equations, Computational Biology, Fluid Dynamics, Electrodynamics, Complex Networks, Statistical Significance, Statistical Analysis, Econometrics, Pitch Presentations, Client Presentations, Random Forests, APIs, Trading, Arbitrage, Data Engineering, Cross-selling, Clustering, Recommendation Systems, Data Modeling, Natural Language Processing (NLP), Image Recognition, Computer Vision, Classification, Videos, Recording, Tutoring, Online Tutoring, Fourier Analysis, Training, Research, Software Development, Big Data, Cloud, Data Processing, Data Processing Automation, Version Control, Feature Engineering, Marketing Attribution, Attribution Modeling, Business to Business (B2B), Data Analytics, Web Scraping, Amadeus, Data Reporting, Generative Pre-trained Transformers (GPT), Data Cleaning, Data Cleansing, Amazon RDS, Scientific Data Analysis, Scientific Computing, Dashboards, OpenAI GPT-3 API, Chatbots, Hugging Face, OpenAI GPT-4 API, Generative Pre-trained Transformer 3 (GPT-3), Data Scraping, Text Classification, Classification Algorithms, OpenAI, Large Language Model Operations (LLMOps), Forecasting, Bayesian Machine Learning, Open-source LLMs, Meta Llama, Llama 3, LangChain, Azure Databricks, FastAPI, Time Series Data, Time Series Forecasting, Predictive Analytics, Statistical Modeling, Scenario Analysis, Demand Forecasting, Healthcare Data Science, Light LLMs, Cursor AI, Opik, RAG Systems, RAG Pipelines, RAG Architecture, Agentic RAG Systems, Retrieval-augmented Generation (RAG), AI Agents, AI Compliance Agents, Vector Databases, Prompt Engineering, Benchmarking, Agentic Coding, Agentic Workflow Design, AI Tools, AI Testing, Generative Artificial Intelligence (GenAI), Anthropic, ChatGPT API, ChatGPT Prompts, YAML Pipelines, Agent Evaluation, API Integration, Optical Character Recognition (OCR), Bayesian Inference & Modeling, Marketing Science, Marketing Mix, Probabilistic Modeling, Bayesian Statistics, Regression Modeling, A/B Testing, Multivariate Statistical Modeling, Logistics & Supply Chain, Supply Chain, Supply Chain Optimization, SAP Supply Chain Management (SCM), Inventory Management, Inventory, Sales Forecasting, Gradient Boosting, Decision Trees, lightgbm, Healthcare Services, Healthcare IT, K-means Clustering, Clustering Algorithms, Google Meridian, Causal Inference, CRM, ROI, Incrementality Testing, Marketing Analytics, Data Analytics (Marketing), Analysis of Variance (ANOVA), Funnel Marketing, Hyperparameter Tuning, Model Evaluation, Model Validation, Machine Learning Operations (MLOps), Xarray, SSH, AWS SSH Keys, DuckDB
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring