Raisa Dzhamtyrova, PhD, Developer in Cardiff, United Kingdom
Raisa is available for hire
Hire Raisa

Raisa Dzhamtyrova, PhD

Verified Expert  in Engineering

Machine Learning Developer

Location
Cardiff, United Kingdom
Toptal Member Since
October 7, 2022

Raisa is a seasoned full-stack data scientist with over six years of expertise in developing and deploying machine learning models. She has a PhD in machine learning, with several publications in leading journals. Raisa is highly skilled in utilizing a diverse set of technologies, including Python (Polars, pandas, NumPy, scikit-learn, LightGBM, XGBoost, PyTorch), R, SQL, Flyte, Git, Databricks, MLflow, and AWS.

Portfolio

RVA Energy LLC
Machine Learning, Statistical Analysis, Statistical Modeling, Python 3, Python...
Royal Holloway, University of London
Databases, SQL, Machine Learning, Data Science, Mathematics, Python 3...
The Alan Turing Institute
Machine Learning, Anomaly Detection, Risk Models, Python, Data Analysis...

Experience

Availability

Full-time

Preferred Environment

Python 3, R, SQL, Scikit-learn, Machine Learning, Polars, Pandas, NumPy, PyTorch, XGBoost

The most amazing...

...project I've developed is a new method of aggregating anomaly detection algorithms that improved the cybersecurity practices of digital identity providers.

Work Experience

Data Analyst | Scientist

2023 - 2024
RVA Energy LLC
  • Developed forecasting models for energy futures using various statistical, machine learning, and deep learning techniques. Used unsupervised learning to identify outliers in the data to build a more robust model.
  • Worked on models that improved on the existing approach by a few percentage points in terms of various metrics (MAE, RMSE, and sMAPE) and contributed to the financial portfolio decision-making and risk management.
  • Built reproducible machine learning pipelines using Flyte, a cloud native platform for data processing. This allowed the automation of the entire data science workflow from data ingestion to model training and deployment.
  • Implemented the models on AWS, allowing for scalable and efficient deployment. Docker was used to containerize the models and their dependencies. The implemented models re-train daily using the latest data, ensuring they are up-to-date.
Technologies: Machine Learning, Statistical Analysis, Statistical Modeling, Python 3, Python, Data Modeling, SQL, PostgreSQL, R, Deep Learning, Amazon Web Services (AWS), Amazon S3 (AWS S3), Amazon EC2, Amazon Elastic Container Registry (ECR), Docker, Flyte, Data Pipelines, Data Science, Scikit-learn, Visualization, Databases, Data Analysis, Predictive Modeling, Predictive Analytics, Time Series, Time Series Analysis, Regression Modeling, Regression, Version Control, Git, Statistics, Jupyter, Data Analytics, CI/CD Pipelines, GitHub Actions, Data Reporting, PyTorch, Polars, Artificial Intelligence (AI), Forecasting, Statistical Data Analysis, Data Engineering, Futures & Options, Trading, Quantitative Research, Linux, Quantitative Finance, Analytics, Data Manipulation, Data Scientist, Bash Script, LightGBM, Automation, Unsupervised Learning

Computer Science Lecturer

2021 - 2022
Royal Holloway, University of London
  • Conducted and published research focused on probabilistic prediction and ensemble methods. Presented the work to help attract collaboration from the industry.
  • Taught CS database systems (SQL/PostgreSQL) to MSc students and supervised final-year projects.
  • Co-organized research seminars on advanced topics in Big Data.
Technologies: Databases, SQL, Machine Learning, Data Science, Mathematics, Python 3, Scikit-learn, Real-time Data, Python, Data Analysis, Data Visualization, NumPy, Pandas, Ensemble Methods, Predictive Modeling, Predictive Analytics, Time Series, Time Series Analysis, Regression Modeling, Classification, Regression, Statistical Analysis, Version Control, Git, Statistics, Jupyter, Research, Technical Writing, Statistical Modeling, Data Analytics, Predictive Learning, Linear Regression, PostgreSQL, MySQL, PyTorch, Artificial Intelligence (AI), Forecasting, Statistical Data Analysis, Linux, Generative AI, Analytics, Data Manipulation, Data Scientist, Bash Script, LightGBM, Unsupervised Learning

Postdoctoral Research Associate

2020 - 2021
The Alan Turing Institute
  • Collaborated with digital identity providers to improve their cybersecurity practices.
  • Conducted research in identifying anomalous activity that later was used by identity providers. The research was published in Springer.
  • Presented my work at conferences, providing more exposure to the project and attracting more collaboration.
Technologies: Machine Learning, Anomaly Detection, Risk Models, Python, Data Analysis, Data Visualization, Matplotlib, StatsModels, Visualization, Scikit-learn, Mathematics, Python 3, Real-time Data, Risk, Amazon S3 (AWS S3), NumPy, Pandas, Ensemble Methods, Predictive Modeling, Predictive Analytics, Time Series, Time Series Analysis, Regression Modeling, Classification, Regression, Statistical Analysis, Version Control, Git, Statistics, Jupyter, Amazon EC2, AWS CloudTrail, Amazon QuickSight, Boto 3, Research, Technical Writing, Amazon Athena, Statistical Modeling, Data Analytics, Predictive Learning, Linear Regression, Docker, Big Data, Data Reporting, Amazon Web Services (AWS), Amazon Machine Learning, Data Modeling, Artificial Intelligence (AI), Forecasting, Statistical Data Analysis, Linux, Analytics, Data Manipulation, Excel VBA, Data Scientist, Bash Script, Unsupervised Fraud Detection, Unsupervised Learning

Teaching Assistant

2016 - 2019
Royal Holloway
  • Assisted with CS Data Analysis, Machine Learning, and Data Visualization MSc courses. Taught various machine learning courses, including supervised/unsupervised shallow and deep learning.
  • Marked the assignments and prepared the coursework materials.
  • Provided supervision for final-year MSc students. Helped students with issues on their dissertations.
Technologies: Data Analysis, Machine Learning, Data Visualization, Artificial Intelligence (AI), Statistical Data Analysis, Linux, Analytics, Excel VBA, LightGBM, Unsupervised Learning

Data Scientist

2016 - 2018
Lindgren Laboratories Limited
  • Developed models for the prediction of outcomes of football matches in online mode using statistical and machine learning methods in R and Python.
  • Improved the existing model and state-of-the-art methods, which increased revenue by several percentage points.
  • Managed one of the team members by guiding his work and supervising project timelines.
Technologies: Data Science, Databases, Machine Learning, Deep Learning, SQL, Python 3, R, Python, Data Analysis, Data Visualization, Matplotlib, Visualization, Scikit-learn, Real-time Data, NumPy, Pandas, Ensemble Methods, Predictive Modeling, Predictive Analytics, Time Series, Time Series Analysis, Regression Modeling, Classification, TensorFlow, Regression, Statistical Analysis, Version Control, Git, Statistics, Jupyter, Research, Statistical Modeling, Data Analytics, Predictive Learning, Sports, Linear Regression, Microsoft Excel, Spreadsheets, PostgreSQL, XGBoost, Big Data, Data Reporting, Tableau, MySQL, ETL, Data Modeling, Artificial Intelligence (AI), Forecasting, Statistical Data Analysis, Linux, Analytics, Data Manipulation, Excel VBA, Data Scientist, Logistic Regression, Excel 365, LightGBM, Automation

Chief Risk Specialist | Data Scientist

2014 - 2015
Promsvyazbank
  • Developed new methods for collection, fraud, and application risk models that improved the bank collection strategies and decreased loan default rates.
  • Helped increase collaboration between the risk and the collection departments, leading to an improved risk-based collection strategy.
  • Performed mathematical and financial analyses for the risk committee and senior management, affecting the bank's future policies.
Technologies: Credit Risk, Databases, SAS, Risk Models, SQL, Data Analysis, Real-time Data, Finance, Risk, Data Visualization, Predictive Modeling, Predictive Analytics, Time Series Analysis, Regression Modeling, Classification, Economics, Financial Data, Regression, Statistical Analysis, Version Control, Statistics, Statistical Modeling, Data Analytics, Predictive Learning, Linear Regression, Microsoft Excel, Spreadsheets, Software Engineering, PostgreSQL, Data Reporting, MySQL, ETL, Financial Modeling, Risk Analysis, Data Modeling, Statistical Data Analysis, Data Engineering, Analytics, Data Manipulation, Excel VBA, Data Scientist, Logistic Regression, Excel 365, Automation, Unsupervised Learning

Risk Analyst

2012 - 2014
National Bank TRUST
  • Improved the bank's marketing campaigns through efficient client segmentation by applying unsupervised clustering methods.
  • Developed credit and fraud detection scoring models, decreasing the default and fraud rates on the bank's loans.
  • Delivered various analytics reports on the financial situation at the time, which helped to guide the department's strategies.
Technologies: Credit Risk, Databases, Risk Models, SQL, SAS, Data Science, Data Analysis, Real-time Data, Finance, Risk, Data Visualization, Predictive Modeling, Predictive Analytics, Time Series Analysis, Regression Modeling, Classification, Economics, Financial Data, Regression, Statistical Analysis, Version Control, Statistics, Statistical Modeling, Data Analytics, Predictive Learning, Linear Regression, Microsoft Excel, Spreadsheets, Software Engineering, PostgreSQL, Data Reporting, MySQL, Financial Modeling, Risk Analysis, Data Modeling, Statistical Data Analysis, Analytics, Data Manipulation, Excel VBA, Logistic Regression, Excel 365, Unsupervised Learning

LLM-powered application consulting

I consulted for the project of building an LLM-powered application and advised on the use of streaming, memory and LangChain in the OpenAI GPT model to help to improve users' experience and potentially reduce the cost.

Competitive Online Algorithms for Probabilistic Prediction

https://www.researchgate.net/profile/Raisa_Dzhamtyrova/research
Most of my research was devoted to developing adaptive ensembles of machine-learning models in real time. An essential property of these ensembles is that at any time in the future, their performance will be close to the best model in this ensemble. These ensembles are highly adaptive to the newly arrived data, which is particularly important in real-time. My research was published in leading journals, such as Machine Learning, Data Mining and Knowledge Discovery, and Neurocomputing.

Open-source Contributions

https://github.com/pandas-dev/pandas/commits?author=raisadz
I am proficient with Git workflow and have made open-source contributions to pandas and scikit-learn. In pandas, I identified unclear error messages, so I read the contributing guidelines, added tests, fixed the error messages, passed all CI checks, and opened a pull request that was merged. In scikit-learn, I replaced pandas with Polars-engineered time-series lagged features in the repository's examples (https://github.com/scikit-learn/scikit-learn/commits?author=raisadz).

Deploying a Machine Learning Model on Heroku with FastAPI

https://github.com/raisadz/deployment_project
The project aims to deploy an ML classification model on Heroku using FastAPI. Data Version Control (DVC) on AWS S3 is used for data versioning. API tests and unit tests to monitor the model performance on various data slices were implemented and incorporated into a CI/CD framework using GitHub Actions.

Real-time Anomaly Detection

https://github.com/alan-turing-institute/anomaly_with_experts
The increasing connectivity of data and cyber-physical systems has resulted in a growing number of cyber-attacks. Real-time detection of such attacks, through the identification of anomalous activity, is required so that mitigation and contingent actions can be effectively and rapidly deployed.

I created a new approach for aggregating unsupervised anomaly detection algorithms, which is to be used by digital identity provider companies and I developed the prototype in Python. The preprint is available at arxiv.org/pdf/2010.03857.pdf

Building ML Pipeline for Short-term Rental Prices in NYC

https://github.com/raisadz/build-ml-pipeline-for-short-term-rental-prices
The project aims to build a reproducible ML pipeline for estimating a property rental price using MLflow and Weights & Biases. New data comes every week, and the model needs regular retraining. An end-to-end reusable pipeline will enable an easy retraining process and reduce the time-to-production.

Dynamic Risk Assessment System

https://github.com/raisadz/model_diagnostics
The goal of the project is to set up processes and scripts to retrain, redeploy, monitor, and report on the ML model that estimates the attrition risk of a company. The project implements automatic data ingestion, training, scoring, deploying, and diagnostics.

Dynamic Cyber Risk Estimation

https://github.com/alan-turing-institute/dynamic_cyber_risk
I developed a new approach for dynamic cyber risk estimation. This new method assesses the maximum number of hacking attempts with the desired confidence. I designed the prototype in R and published the research in Springer, which can be read at doi.org/10.1007/s10618-021-00814-z.

Languages

Python 3, R, SQL, Python, Excel VBA, Bash Script, SAS

Frameworks

Data Lakehouse, Apache Spark, LightGBM, Flask, Streamlit

Libraries/APIs

Scikit-learn, Matplotlib, NumPy, Pandas, XGBoost, PyTorch, PySpark, TensorFlow, REST APIs

Tools

Git, Jupyter, Boto 3, Microsoft Excel, Spreadsheets, StatsModels, AWS CloudTrail, Amazon QuickSight, Amazon Athena, Pytest, Tableau, Cron, Amazon Elastic Container Registry (ECR)

Paradigms

Data Science, Testing, Quantitative Research, Automation, Anomaly Detection, Unit Testing, ETL

Platforms

Databricks, Amazon EC2, Docker, Weights & Biases, Amazon Web Services (AWS), Linux, Azure, Heroku

Storage

Databases, PostgreSQL, MySQL, Amazon S3 (AWS S3), Data Pipelines

Other

Machine Learning, Visualization, Real-time Data, Risk Models, Credit Risk, Finance, Deep Learning, Data Analysis, Data Visualization, Ensemble Methods, Predictive Modeling, Predictive Analytics, Time Series, Time Series Analysis, Regression Modeling, Classification, Regression, Statistical Analysis, Version Control, Statistics, Research, Technical Writing, Statistical Modeling, Data Analytics, Predictive Learning, Linear Regression, Big Data, Data Reporting, Financial Modeling, Risk Analysis, Polars, Artificial Intelligence (AI), Forecasting, Statistical Data Analysis, Data Engineering, Production, Deployment, Security, HyperOpt, Azure Databricks, Analytics, Data Manipulation, Data Scientist, Logistic Regression, Excel 365, Unsupervised Fraud Detection, Unsupervised Learning, Outlier Detection, Mathematics, Risk, Economics, Financial Data, Sports, Software Engineering, MLflow, CI/CD Pipelines, Amazon Machine Learning, Flyte, Futures & Options, Trading, Quantitative Finance, Generative AI, GitHub Actions, FastAPI, DVC, Machine Learning Operations (MLOps), Data Modeling, Consulting, ChatGPT, Large Language Models (LLMs), LangChain, OpenAI GPT-4 API, OpenAI GPT-3 API, OpenAI

2016 - 2020

PhD in Machine Learning

Royal Holloway, University of London - Egham, United Kingdom

2015 - 2016

Master's Degree (Outstanding Thesis Award) in Computational Finance

Royal Holloway, University of London - Egham, United Kingdom

2011 - 2013

Master's Degree in Applied Mathematics and Physics

Moscow Institute of Physics and Technology - Moscow, Russia

2007 - 2011

Bachelor's Degree in Applied Mathematics and Physics

Moscow Institute of Physics and Technology - Moscow, Russia

JANUARY 2024 - JANUARY 2026

Databricks Certified Machine Learning Professional

Databricks

NOVEMBER 2022 - PRESENT

Machine Learning DevOps Engineer

Udacity

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring