Eugene Balkind
Verified Expert in Engineering
Data Scientist and ML Developer
London, United Kingdom
Toptal member since June 13, 2022
Eugene is a skilled data scientist with a strong academic and industrial background in time series analysis, LLMs, and other ML technologies. Eugene has created classification models that predict positive or negative outcomes of COVID-19 tests and models that determine whether a company is a good acquisition. He has also built data hubs, completed cross-validation testing, and adjusted and improved models to adapt to quickly changing requirements. He is also proficient in OpenAI API.
Portfolio
Experience
Availability
Preferred Environment
Python 3, TensorFlow, Pandas, Mathematics, Regression, Amazon Web Services (AWS), SQL, ChatGPT, Amadeus, Azure
The most amazing...
...project I've done was COVID-19 testing automation. Lab performance improved from 300 analyzed samples a day to 30,000, with the ability to go up to 100,000.
Work Experience
Senior AI/ML Predictive Modeling Engineer
What Are the Chances
- Developed an NLP algorithm using Transformers and PyTorch that identifies rude and bullying responses. This involved understanding the nuances of language and identifying harmful interactions.
- Created an algorithm based on the OpenAI API (GPT-3.5-Turbo, which powers ChatGPT) that predicts the approximate probability of any event.
- Designed an ecosystem to process and store data using SQL, pandas, and AWS. This allowed for streamlined data management.
- Deployed the model on AWS using both Lambda and Flask.
Online Tutor
University of London
- Tutored data analysis with Python employing Pandas, Matplotlib, Seaborn, and Scikit-Learn.
- Taught a theoretical course in artificial intelligence.
- Tutored the field of neural networks with TensorFlow and Keras. Tutoring involved assisting students with their technical queries while keeping close contact with a senior lecturer.
Data Scientist
University of Southampton
- Sped up the testing process in the first lab in the UK where COVID-19 testing can be fully automated. We moved the lab from a prototype processing several hundred tests daily to 30,000—potentially increasing to 100,000 daily.
- Built a model (classification with scikit-learn, imblearn, and TensorFlow via Keras interface) that predicts positive or negative outcomes of a COVID-19 test.
- Developed SQL database solutions to store and retrieve data. Migrated data from legacy systems (local file systems) to new solutions (PostgreSQL and AWS), leading to significant performance improvements.
- Improved the existing Python codebase responsible for the automation of the laboratory information management system (LIMS) and data collection from the robots and biomedical professionals to support larger data volumes—up to 100,000 items per day.
- Contributed to the LIMS' back end and Flask app endpoints.
- Collaborated closely with testers and biomedical scientists to adjust the LIMS app and model to their changing requirements.
Online Lecturer
StackwisR
- Created several online courses in machine learning (regression, classification, clustering, deep learning, time series, marketing mix modeling, and computer vision) with Python.
- Filmed several online courses in machine learning (regression, classification, clustering, deep learning, time series, marketing mix modeling, and computer vision) with python.
- Included basic courses in NumPy, Pandas, Scikit-Learn, Matplotlib, and TensorFlow with Keras.
Assistant Director in Data Science and Machine Learning
EY
- Devised a classification model for imbalanced financial data that predicted whether a company is a good acquisition candidate using scikit-learn, imbalanced-learn, TPOT, and TensorFlow via the Keras interface.
- Improved the number of potential M&A clients by approximately 80% compared to the previous, personal experience-motivated approach.
- Deployed the model with Azure, Databricks, and MLflow.
- Collaborated with data engineers and DevOps to handle data correctly. Used SQL and PySpark to pull and format data from local and external sources.
- Formulated external data requests for the data manager.
- Validated the model with recall and F1 metrics. Employed cross-validation for further tests.
- Participated in regular meetings with stakeholders to formulate and reformulate the problem.
Co-founder
EUCOIN
- Built an ecosystem to analyze the crypto exchange stream.
- Created algorithmic cryptocurrency and trading algorithms.
- Used machine learning to analyze cryptocurrency data.
Data Scientist
MC&C Media
- Built machine learning models (time series analysis via marketing mix modeling regression with scikit-learn) to analyze the performance of the clients' advertising and optimize their advertising budget.
- Created a data hub that now stores all the company and clients' data, making the analysis process easier using SQL, Python, and R.
- Collected and analyzed data from various sources (clients' databases) using exploratory data analysis (EDA) with SQL, Pandas, Matplotlib, and Seaborn.
- Collaborated closely with the marketing team and advertising consultants.
PhD Student
Royal Holloway
- Tutored all the university maths to year one, year two, and year three students. Tutoring included example classes, lecturing, and marking. Obtained a Teaching Commendation award for excellence in teaching in 2014.
- Created a mathematical model of magnetic skyrmions on Fourier lattice with Python.
- Deployed the mathematical model of magnetic skyrmions on Fourier lattice with AWS.
Experience
Marketing Mixed Modeling for Advertising
To build the linear regression model, I performed feature engineering, hyperparameters tuning, and lag and adstock adjustments to ensure that the model accurately predicted the client's ROI. Once the model worked, I used it to answer clients' questions about ROI and provided them with actionable insights.
I regularly updated the model with new data to provide valuable long-term insights to the client. Through this project, I demonstrated my expertise in data analysis and statistical modeling and my ability to apply this knowledge to real-world business problems.
Cryptocurrency Stream Analysis and Arbitrage Bot
My responsibilities included collecting and formatting the data from various cryptocurrency streams to ensure the data was compatible with the algorithm. I then conducted extensive data analysis to identify trends and patterns in the data and used this information to suggest optimal trading strategies.
The algorithm was designed to identify arbitrage opportunities between different cryptocurrencies, including BTC (or ETH), altcoins, and USDT.
In addition to the aforementioned algorithm that analyzed cryptocurrency streams, I used LSTM to predict future rates of cryptocurrencies. By incorporating LSTM into the algorithm, I created a more sophisticated model that could make more accurate predictions based on historical data.
The LSTM model was trained on historical cryptocurrency data, allowing it to learn patterns and trends in the data. This information was then used to predict the future values of the cryptocurrencies, allowing for more informed trading decisions.
Recommendation System for a Building Company
To begin the project, I collected and formatted client data to ensure compatibility with the recommendation system. I then conducted extensive feature engineering to identify key features that could be used in the clustering model.
Using the identified features, I built a clustering model capable of accurately identifying and grouping clients based on their needs and preferences. Once the clustering model was working, I suggested recommended projects to the existing clients based on the needs and preferences of similar clients in the cluster.
Job Search App
I incorporated NLP techniques to improve skills matching to further enhance the script's accuracy. By analyzing the job descriptions and identifying keywords related to data science skills, the script was able to identify suitable job postings that matched the skills and requirements of the client.
Once the suitable jobs were identified, they were added to the database for future analysis. This allowed for easier tracking of suitable job postings and ensured clients were quickly informed of potential job opportunities.
App to Find All Connections from Point A to Point B
I collected and processed data from various sources, including APIs, Amadeus API, and web scraping using Selenium. This allowed for a wide range of transportation options in the app.
Although the app was initially developed as a prototype, there is potential to expand it and make it available to a broader audience. This would require further development and data collection, but the initial prototype provides a solid foundation for future work in this area.
Education
PhD in Computational Theoretical Physics
Royal Holloway University of London - London, UK
Master's Degree in Theoretical Physics
University of Manchester - Manchester, UK
Skills
Libraries/APIs
Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, TensorFlow, PySpark, PyBrain, XGBoost, Keras, REST APIs, Google Analytics API, Spark ML, PyTorch
Tools
LaTeX, Git, Mathematica, Pytest, Jira, Tree-Based Pipeline Optimization Tool (TPOT), Seaborn, MATLAB, gnuplot, Hidden Markov Model, AutoML, Amazon SageMaker, Jupyter, Google Analytics, ChatGPT
Languages
Python 3, Python, SQL, C++11, C++, Bash, R
Paradigms
Object-oriented Programming (OOP), Testing, Automation, Agile, Agile Software Development, B2B, ETL
Storage
Database Migration, Databases, Database Modeling, PostgreSQL, Amazon S3 (AWS S3)
Frameworks
Flask, Selenium, Spark, Apache Spark
Platforms
Linux, Azure, Databricks, Amazon Web Services (AWS), Jupyter Notebook, Blockchain
Other
Mathematics, Regression, Physics, University Teaching, Mathematical Modeling, Marketing Mix Modeling, Machine Learning, Linear Regression, Advanced Physics, Calculus, Quantitative Calculus, Statistics, Statistical Methods, Probability Theory, Differential Equations, Partial Differential Equations, Computational Physics, Eigenvectors, Linear Algebra, Mathematical Analysis, Applied Mathematics, Mathematical Programming, Matrix Algebra, Time Series, Time Series Analysis, Data Visualization, Data Analysis, EDA, Data Science, Artificial Intelligence (AI), Neural Networks, Deep Neural Networks (DNNs), Artificial Neural Networks (ANN), Predictive Modeling, Deep Learning, Imbalanced-learn, Data Migration, Data Governance, Data Management, Computational Biological Physics, Markov Model, Geolocation, MLflow, Algorithmic Trading, Algorithmic Trading Analysis, Cryptocurrency, Bitcoin, Quantum Computing, Stochastic Differential Equations, Computational Biology, Fluid Dynamics, Electrodynamics, Complex Networks, Statistical Significance, Statistical Analysis, Econometrics, Pitch Presentations, Client Presentations, Random Forests, APIs, Trading, Arbitrage, Data Engineering, Cross-selling, Clustering, Recommendation Systems, Data Modeling, Natural Language Processing (NLP), Image Recognition, Computer Vision, Classification, Videos, Recording, Tutoring, Online Tutoring, Fourier Analysis, Training, Research, Software Development, Big Data, Cloud, Data Processing, Data Processing Automation, Version Control, Feature Engineering, Marketing Attribution, Attribution Modeling, Business to Business (B2B), Data Analytics, Web Scraping, Amadeus, Data Reporting, Generative Pre-trained Transformers (GPT), Data Cleaning, Data Cleansing, Amazon RDS, Scientific Data Analysis, Scientific Computing, Dashboards, OpenAI GPT-3 API, Chatbots, Hugging Face, OpenAI GPT-4 API, Generative Pre-trained Transformer 3 (GPT-3), Data Scraping, Text Classification, Classification Algorithms, OpenAI, Large Language Models (LLMs)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring