Eugene Balkind, Developer in London, United Kingdom
Eugene is available for hire
Hire Eugene

Eugene Balkind

Verified Expert  in Engineering

Data Scientist and ML Developer

Location
London, United Kingdom
Toptal Member Since
June 13, 2022

Eugene is a skilled data scientist with a strong academic and industrial background in time series analysis, LLMs, and other ML technologies. Eugene has created classification models that predict positive or negative outcomes of COVID-19 tests and models that determine whether a company is a good acquisition. He has also built data hubs, completed cross-validation testing, and adjusted and improved models to adapt to quickly changing requirements. He is also proficient in OpenAI API.

Portfolio

What Are the Chances
Python, Machine Learning, Predictive Modeling, TensorFlow, Pandas, Scikit-learn...
University of London
Data Analysis, Data Visualization, Neural Networks, Machine Learning, Tutoring...
University of Southampton
Python, Pandas, SQL, Scikit-learn, TensorFlow, Linux, Bash, Flask, Git, Pytest...

Experience

Availability

Part-time

Preferred Environment

Python 3, TensorFlow, Pandas, Mathematics, Regression, Amazon Web Services (AWS), SQL, ChatGPT, Amadeus, Azure

The most amazing...

...project I've done was COVID-19 testing automation. Lab performance improved from 300 analyzed samples a day to 30,000, with the ability to go up to 100,000.

Work Experience

Senior AI/ML Predictive Modeling Engineer

2023 - 2023
What Are the Chances
  • Developed an NLP algorithm using Transformers and PyTorch that identifies rude and bullying responses. This involved understanding the nuances of language and identifying harmful interactions.
  • Created an algorithm based on the OpenAI API (GPT-3.5-Turbo, which powers ChatGPT) that predicts the approximate probability of any event.
  • Designed an ecosystem to process and store data using SQL, pandas, and AWS. This allowed for streamlined data management.
  • Deployed the model on AWS using both Lambda and Flask.
Technologies: Python, Machine Learning, Predictive Modeling, TensorFlow, Pandas, Scikit-learn, Natural Language Processing (NLP), OpenAI GPT-3 API, Chatbots, Generative Pre-trained Transformers (GPT), Hugging Face, PyTorch, SQL, Amazon Web Services (AWS), Artificial Intelligence (AI), OpenAI GPT-4 API, GPT, Generative Pre-trained Transformer 3 (GPT-3), Data Science, Text Classification, Classification, Classification Algorithms, ChatGPT, OpenAI, Large Language Models (LLMs)

Online Tutor

2021 - 2022
University of London
  • Tutored data analysis with Python employing Pandas, Matplotlib, Seaborn, and Scikit-Learn.
  • Taught a theoretical course in artificial intelligence.
  • Tutored the field of neural networks with TensorFlow and Keras. Tutoring involved assisting students with their technical queries while keeping close contact with a senior lecturer.
Technologies: Data Analysis, Data Visualization, Neural Networks, Machine Learning, Tutoring, Online Tutoring, Training, Jupyter, Jupyter Notebook, Python 3, Artificial Intelligence (AI), Data Science, Classification, Classification Algorithms

Data Scientist

2021 - 2022
University of Southampton
  • Sped up the testing process in the first lab in the UK where COVID-19 testing can be fully automated. We moved the lab from a prototype processing several hundred tests daily to 30,000—potentially increasing to 100,000 daily.
  • Built a model (classification with scikit-learn, imblearn, and TensorFlow via Keras interface) that predicts positive or negative outcomes of a COVID-19 test.
  • Developed SQL database solutions to store and retrieve data. Migrated data from legacy systems (local file systems) to new solutions (PostgreSQL and AWS), leading to significant performance improvements.
  • Improved the existing Python codebase responsible for the automation of the laboratory information management system (LIMS) and data collection from the robots and biomedical professionals to support larger data volumes—up to 100,000 items per day.
  • Contributed to the LIMS' back end and Flask app endpoints.
  • Collaborated closely with testers and biomedical scientists to adjust the LIMS app and model to their changing requirements.
Technologies: Python, Pandas, SQL, Scikit-learn, TensorFlow, Linux, Bash, Flask, Git, Pytest, Imbalanced-learn, Machine Learning, Selenium, APIs, Object-oriented Programming (OOP), Data Science, Data Visualization, Matplotlib, Time Series, Jira, REST APIs, Testing, Artificial Intelligence (AI), Artificial Neural Networks (ANN), Neural Networks, Deep Neural Networks, Deep Learning, Data Engineering, Data Analysis, Data Modeling, Databases, Amazon Web Services (AWS), PostgreSQL, Research, Automation, Software Development, Agile, Agile Software Development, Big Data, Cloud, Data Processing, Data Processing Automation, Version Control, Time Series Analysis, Feature Engineering, Data Analytics, Data Reporting, Scientific Data Analysis, Data Migration, Database Migration, Data Governance, Data Management, Python 3, ETL, Classification, Classification Algorithms

Online Lecturer

2020 - 2020
StackwisR
  • Created several online courses in machine learning (regression, classification, clustering, deep learning, time series, marketing mix modeling, and computer vision) with Python.
  • Filmed several online courses in machine learning (regression, classification, clustering, deep learning, time series, marketing mix modeling, and computer vision) with python.
  • Included basic courses in NumPy, Pandas, Scikit-Learn, Matplotlib, and TensorFlow with Keras.
Technologies: Machine Learning, Python, Regression, Linear Regression, Pandas, Scikit-learn, TensorFlow, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Image Recognition, Computer Vision, Amazon Web Services (AWS), Neural Networks, Clustering, Classification, Data Visualization, Data Analysis, LaTeX, Videos, Recording, Tutoring, University Teaching, Training, Jupyter, Jupyter Notebook, Python 3, Classification Algorithms

Assistant Director in Data Science and Machine Learning

2019 - 2019
EY
  • Devised a classification model for imbalanced financial data that predicted whether a company is a good acquisition candidate using scikit-learn, imbalanced-learn, TPOT, and TensorFlow via the Keras interface.
  • Improved the number of potential M&A clients by approximately 80% compared to the previous, personal experience-motivated approach.
  • Deployed the model with Azure, Databricks, and MLflow.
  • Collaborated with data engineers and DevOps to handle data correctly. Used SQL and PySpark to pull and format data from local and external sources.
  • Formulated external data requests for the data manager.
  • Validated the model with recall and F1 metrics. Employed cross-validation for further tests.
  • Participated in regular meetings with stakeholders to formulate and reformulate the problem.
Technologies: Python, Scikit-learn, TensorFlow, Pandas, NumPy, SciPy, SQL, PySpark, Flask, Object-oriented Programming (OOP), Bash, Azure, Databricks, Git, Jira, Tree-Based Pipeline Optimization Tool (TPOT), Imbalanced-learn, MLflow, Machine Learning, Seaborn, AutoML, Data Science, Data Visualization, EDA, Client Presentations, XGBoost, Random Forests, Keras, Pytest, REST APIs, Testing, Artificial Intelligence (AI), Neural Networks, Deep Neural Networks, Artificial Neural Networks (ANN), Deep Learning, Predictive Modeling, Data Modeling, Databases, Database Modeling, Data Analysis, Feature Engineering, Agile, Version Control, Spark, Data Analytics, Data Cleaning, Data Cleansing, Data Governance, Data Management, Python 3, Data Processing, ETL, Spark ML, Apache Spark, Classification, Classification Algorithms

Co-founder

2018 - 2019
EUCOIN
  • Built an ecosystem to analyze the crypto exchange stream.
  • Created algorithmic cryptocurrency and trading algorithms.
  • Used machine learning to analyze cryptocurrency data.
Technologies: Python, Machine Learning, Regression, Pandas, Matplotlib, NumPy, SciPy, Scikit-learn, TensorFlow, Algorithmic Trading, Algorithmic Trading Analysis, Cryptocurrency, Bitcoin, Mathematics, Time Series, Object-oriented Programming (OOP), Flask, Pytest, Git, Data Science, Data Visualization, EDA, Data Analysis, Time Series Analysis, Statistical Methods, Seaborn, Testing, Trading, Arbitrage, Artificial Intelligence (AI), Predictive Modeling, Data Engineering, Data Modeling, Amazon Web Services (AWS), Data Governance, Data Management, Python 3, Data Processing, Amazon S3 (AWS S3), ETL, Blockchain, PySpark, Apache Spark

Data Scientist

2017 - 2018
MC&C Media
  • Built machine learning models (time series analysis via marketing mix modeling regression with scikit-learn) to analyze the performance of the clients' advertising and optimize their advertising budget.
  • Created a data hub that now stores all the company and clients' data, making the analysis process easier using SQL, Python, and R.
  • Collected and analyzed data from various sources (clients' databases) using exploratory data analysis (EDA) with SQL, Pandas, Matplotlib, and Seaborn.
  • Collaborated closely with the marketing team and advertising consultants.
Technologies: Python, Pandas, NumPy, SciPy, Scikit-learn, Matplotlib, Seaborn, PyBrain, SQL, R, Marketing Mix Modeling, Regression, Markov Model, Geolocation, Machine Learning, Linear Regression, Statistics, Hidden Markov Model, Statistical Methods, Statistical Significance, Statistical Analysis, Time Series, Time Series Analysis, Econometrics, Applied Mathematics, Data Visualization, Data Analysis, EDA, Pitch Presentations, Client Presentations, Data Science, Pytest, Artificial Intelligence (AI), Predictive Modeling, Data Engineering, Data Modeling, Marketing Attribution, Attribution Modeling, Google Analytics, Google Analytics API, B2B, Business to Business (B2B), Data Analytics, Data Reporting, Data Cleaning, Data Cleansing, Dashboards, Data Migration, Database Migration, Data Governance, Data Management, Python 3, Data Processing Automation

PhD Student

2012 - 2017
Royal Holloway
  • Tutored all the university maths to year one, year two, and year three students. Tutoring included example classes, lecturing, and marking. Obtained a Teaching Commendation award for excellence in teaching in 2014.
  • Created a mathematical model of magnetic skyrmions on Fourier lattice with Python.
  • Deployed the mathematical model of magnetic skyrmions on Fourier lattice with AWS.
Technologies: Tutoring, University Teaching, Mathematica, Python, NumPy, SciPy, Pandas, Linear Algebra, Calculus, Computational Physics, Mathematics, Applied Mathematics, Fourier Analysis, Amazon Web Services (AWS), Linux, LaTeX, Training, Scientific Data Analysis, Scientific Computing, Python 3

Marketing Mixed Modeling for Advertising

I worked as a data scientist for a client's advertising analysis project. I collected data from various sources, such as the clients' CSVs, SQL databases, and public data. I formatted the data to single time series standards and conducted extensive data analysis to identify potential lag, adstock (carry-over effect), and diminishing returns.

To build the linear regression model, I performed feature engineering, hyperparameters tuning, and lag and adstock adjustments to ensure that the model accurately predicted the client's ROI. Once the model worked, I used it to answer clients' questions about ROI and provided them with actionable insights.

I regularly updated the model with new data to provide valuable long-term insights to the client. Through this project, I demonstrated my expertise in data analysis and statistical modeling and my ability to apply this knowledge to real-world business problems.

Cryptocurrency Stream Analysis and Arbitrage Bot

I developed an algorithm that analyzed cryptocurrency streams from a crypto exchange (Binance), formatted the data, and suggested an optimal trading (arbitrage) strategy. The algorithm focused on BTC, ETH, altcoins, and USDT.

My responsibilities included collecting and formatting the data from various cryptocurrency streams to ensure the data was compatible with the algorithm. I then conducted extensive data analysis to identify trends and patterns in the data and used this information to suggest optimal trading strategies.

The algorithm was designed to identify arbitrage opportunities between different cryptocurrencies, including BTC (or ETH), altcoins, and USDT.

In addition to the aforementioned algorithm that analyzed cryptocurrency streams, I used LSTM to predict future rates of cryptocurrencies. By incorporating LSTM into the algorithm, I created a more sophisticated model that could make more accurate predictions based on historical data.

The LSTM model was trained on historical cryptocurrency data, allowing it to learn patterns and trends in the data. This information was then used to predict the future values of the cryptocurrencies, allowing for more informed trading decisions.

Recommendation System for a Building Company

As a data scientist, I developed a cutting-edge recommendation system based on clustering that suggested projects to existing clients. My responsibilities included collecting and formatting client data, performing feature engineering, and building a clustering model and recommendation system.

To begin the project, I collected and formatted client data to ensure compatibility with the recommendation system. I then conducted extensive feature engineering to identify key features that could be used in the clustering model.

Using the identified features, I built a clustering model capable of accurately identifying and grouping clients based on their needs and preferences. Once the clustering model was working, I suggested recommended projects to the existing clients based on the needs and preferences of similar clients in the cluster.

Job Search App

I created a powerful script that scraped major UK job boards and filtered for suitable data science contracts. The script was written in Python using the Selenium library, allowing efficient and automated web scraping. The script was designed to scrape job boards like Indeed and TotalJobs and filter for data science contracts matching specific criteria.

I incorporated NLP techniques to improve skills matching to further enhance the script's accuracy. By analyzing the job descriptions and identifying keywords related to data science skills, the script was able to identify suitable job postings that matched the skills and requirements of the client.

Once the suitable jobs were identified, they were added to the database for future analysis. This allowed for easier tracking of suitable job postings and ensured clients were quickly informed of potential job opportunities.

App to Find All Connections from Point A to Point B

I created a powerful app that allowed users to find all possible connections from one postcode to another. This included flights, trains, buses, and intercity connections, making it a comprehensive and valuable tool for travelers.

I collected and processed data from various sources, including APIs, Amadeus API, and web scraping using Selenium. This allowed for a wide range of transportation options in the app.

Although the app was initially developed as a prototype, there is potential to expand it and make it available to a broader audience. This would require further development and data collection, but the initial prototype provides a solid foundation for future work in this area.
2012 - 2016

PhD in Computational Theoretical Physics

Royal Holloway University of London - London, UK

2008 - 2012

Master's Degree in Theoretical Physics

University of Manchester - Manchester, UK

Languages

Python 3, Python, SQL, C++11, C++, Bash, R

Libraries/APIs

Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, TensorFlow, PySpark, PyBrain, XGBoost, Keras, REST APIs, Google Analytics API, Spark ML, PyTorch

Tools

LaTeX, Git, Mathematica, Pytest, Jira, Tree-Based Pipeline Optimization Tool (TPOT), Seaborn, MATLAB, gnuplot, Hidden Markov Model, AutoML, Amazon SageMaker, Jupyter, Google Analytics, ChatGPT

Paradigms

Data Science, Object-oriented Programming (OOP), Testing, Automation, Agile, Agile Software Development, B2B, ETL

Other

Mathematics, Regression, Physics, University Teaching, Mathematical Modeling, Marketing Mix Modeling, Machine Learning, Linear Regression, Advanced Physics, Calculus, Quantitative Calculus, Statistics, Statistical Methods, Probability Theory, Differential Equations, Partial Differential Equations, Computational Physics, Eigenvectors, Linear Algebra, Mathematical Analysis, Applied Mathematics, Mathematical Programming, Matrix Algebra, Time Series, Time Series Analysis, Data Visualization, Data Analysis, EDA, Artificial Intelligence (AI), Neural Networks, Deep Neural Networks, Artificial Neural Networks (ANN), Predictive Modeling, Deep Learning, Imbalanced-learn, Data Migration, Data Governance, Data Management, Computational Biological Physics, Markov Model, Geolocation, MLflow, Algorithmic Trading, Algorithmic Trading Analysis, Cryptocurrency, Bitcoin, Quantum Computing, Stochastic Differential Equations, Computational Biology, Fluid Dynamics, Electrodynamics, Complex Networks, Statistical Significance, Statistical Analysis, Econometrics, Pitch Presentations, Client Presentations, Random Forests, APIs, Trading, Arbitrage, Data Engineering, Cross-selling, Clustering, Recommendation Systems, Data Modeling, Natural Language Processing (NLP), Image Recognition, Computer Vision, Classification, Videos, Recording, Tutoring, Online Tutoring, Fourier Analysis, Training, Research, Software Development, Big Data, Cloud, Data Processing, Data Processing Automation, Version Control, Feature Engineering, Marketing Attribution, Attribution Modeling, Business to Business (B2B), Data Analytics, Web Scraping, Amadeus, Data Reporting, GPT, Generative Pre-trained Transformers (GPT), Data Cleaning, Data Cleansing, Amazon RDS, Scientific Data Analysis, Scientific Computing, Dashboards, OpenAI GPT-3 API, Chatbots, Hugging Face, OpenAI GPT-4 API, Generative Pre-trained Transformer 3 (GPT-3), Data Scraping, Text Classification, Classification Algorithms, OpenAI, Large Language Models (LLMs)

Storage

Database Migration, Databases, Database Modeling, PostgreSQL, Amazon S3 (AWS S3)

Frameworks

Flask, Selenium, Spark, Apache Spark

Platforms

Linux, Azure, Databricks, Amazon Web Services (AWS), Jupyter Notebook, Blockchain

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring