Kirill Yakunin, Developer in Almaty, Almaty Province, Kazakhstan
Kirill is available for hire
Hire Kirill

Kirill Yakunin

Verified Expert  in Engineering

Bio

Kirill has a PhD in data science and natural language processing. He has broad experience in academic, scientific, and enterprise environments, including developing full-scale data-driven applications from architectural decisions, storage, ETL processes, and back- and front-end development. For several years, Kirill has been working with large language models, both for enterprise customers and as an independent researcher.

Portfolio

Metaculus
Python, Transformers, Hugging Face, Dagster, Streamlit...
JSC UpMetric
Data Science, Scraping, Scrapy, Financial Data, Scikit-learn, Python, Django...
JSC Frontier KZ
Python, Data Science, Scikit-learn, Pandas, Team Management, Churn Analysis...

Experience

  • Machine Learning - 10 years
  • Python - 9 years
  • Software Engineering - 7 years
  • Generative Pre-trained Transformers (GPT) - 3 years
  • Natural Language Processing (NLP) - 3 years
  • Large Language Models (LLMs) - 2 years
  • GPU Computing - 2 years
  • Kubernetes - 2 years

Availability

Part-time

Preferred Environment

Python, Dagster, Transformers, PyTorch, Kubernetes, Google Kubernetes Engine (GKE)

The most amazing...

...project I've developed was a mass-media monitoring and analysis system that satisfied the client's requirements and yielded publishable scientific results.

Work Experience

Machine Learning Engineer

2023 - 2024
Metaculus
  • Built infrastructure for an LLM-based information retrieval system that automates the retrieval of relevant information from public sources.
  • Designed and implemented experiments to improve information retrieval, including a unique point cloud-based candidate generation algorithm, chunks skimming, summarization, etc.
  • Maintained existing solutions hosted in GKE and migrated them to Dagster and GKE.
  • Worked on the rapid evaluation of numerous AI-related R&D hypotheses.
  • Optimized the search and discoverability of the Metaculus platform using LLMs.
Technologies: Python, Transformers, Hugging Face, Dagster, Streamlit, Google Kubernetes Engine (GKE), Kubernetes, PyTorch, Large Language Models (LLMs), Open-source LLMs, Deep Neural Networks (DNNs), Neural Networks, FastAPI

CEO

2020 - 2023
JSC UpMetric
  • Started integrating machine learning methods for geological research data interpretation for the largest uranium producer in the world, NAC Kazatomprom JSC.
  • Developed a system for the National Bank of Kazakhstan. The system predicts expectations for inflation and the level of trust in the national currency based on textual information. The indicators show a high (0.6 – 0.8) correlation with survey results.
  • Assembled a remote team of experienced specialists capable of solving various tasks, from consulting to system development, integration, and technical support.
Technologies: Data Science, Scraping, Scrapy, Financial Data, Scikit-learn, Python, Django, Docker, Elasticsearch, Pandas, PyCharm, Back-end, Front-end, Vue, Nuxt.js, Material Design, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Topic Modeling, Machine Learning, Consulting, Research, Data Research, R&D, Git, Jupyter Notebook, Apache Airflow, Software Engineering, Web Development, Databases, SQL, System Design, Software Development Management, Computer Vision, Geology, Data Processing, Artificial Intelligence (AI), PostgreSQL, Django REST Framework, Linux, NGINX, Server Administration, Management, Team Management, JavaScript, Server Management, Text Classification, Web Scraping, BERT, Text Processing, Exploratory Data Analysis, Classification, Gradient Boosting, XGBoost, Data Analysis, Expert Systems, ETL, Data Engineering, Data Analytics, Analytics, Data Cleaning, Data Cleansing, Data Pipelines, Streaming Data, NumPy, Seaborn, Statistical Analysis, Parsers, JSON, NoSQL, Unstructured Data Analysis, REST APIs, APIs, Jupyter, Deep Learning, CSV

Data Science Team Lead

2019 - 2023
JSC Frontier KZ
  • Built a churn prediction model for the largest telecommunication company in Kazakhstan, which improved their current process tenfold.
  • Provided data science-related consulting to major financial institutions, telecommunication companies, and FMCG retailers in Kazakhstan.
  • Led a team of data scientists to explore and utilize fiscalization data from the largest fiscalization provider of Kazakhstan. The work includes predictive analysis, text classification, market segmentation, and other ad-hoc research and solutions.
  • Built a lookalike model, which integrates data from JSC Magnum, the biggest FMCG retailer in Kazakhstan, and telecommunication companies, in order to predict subscribers' potential interest in one of a few dozen product segments.
  • Analyzed mobile geodata for the city hall to gain valuable insights into the COVID-19 situation in the city of Almaty.
Technologies: Python, Data Science, Scikit-learn, Pandas, Team Management, Churn Analysis, Text Classification, Exploratory Data Analysis, Customer Segmentation, Customer Analysis, Predictive Modeling, Classification, Gradient Boosting, XGBoost, Git, Jupyter Notebook, Machine Learning, Databases, SQL, Software Development Management, Research, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPU Computing, R&D, Data Processing, Artificial Intelligence (AI), Linux, Management, Web Scraping, Scrapy, BERT, Text Processing, Scraping, Financial Data, Consulting, Data Research, Data Analysis, Recommendation Systems, ETL, Data Engineering, Spark, Apache Spark, Data Analytics, Analytics, Data Cleaning, Data Cleansing, Data Pipelines, NumPy, Matplotlib, Seaborn, Statistical Analysis, Parsers, JSON, NoSQL, Unstructured Data Analysis, REST APIs, APIs, Jupyter, Deep Learning, PySpark, CSV, Apache Impala

Full-stack Developer

2017 - 2022
JSC Sagrad
  • Developed a time-tracking information system for Shymbulak Ski and Snowboard School, which improved trainers' time utilization by 30%.
  • Developed and supported payment kiosk software for JSC KMF, which provides loans for small businesses and individuals. The kiosk network contains over 150 kiosks all over Kazakhstan that serve over 200 thousand clients.
  • Participated in all stages of project development, from sales to project requirements, design, development, integration, and support. I've also hired and managed other developers and designers on several projects.
Technologies: Python, Django, Django REST Framework, JavaScript, PostgreSQL, PyCharm, Git, Software Engineering, Web Development, Databases, SQL, System Design, Linux, NGINX, Server Administration, Front-end, React, API Integration, Management, Team Management, Server Management, Back-end, Analytics, Statistical Analysis, JSON, NoSQL, Unstructured Data Analysis, REST APIs, APIs, Jupyter, Deep Learning, CSV

Lead Researcher

2018 - 2020
Institute of Information and Computational Technologies
  • Built a distributed information system for scraping and analyzing mass-media sources. I built it from scratch, starting from containerization, storage system, and ETL to scrapers, text preprocessing, analysis, and visualization.
  • Published one Q1, two Q2, and two Q3 Scopus-indexed articles and other indexed publications, in many of which I am either the first or corresponding author.
  • Developed a novel text classification approach based on topic modeling, which in many tasks requires little to no manual labeling. I validated this approach and proposed several high-level labeling methods on several problems.
  • Collected a corpus of over 7 million publications and over 2 terabytes of experimental data. The system was officially integrated in the Ministry of Education and Science of Kazakhstan.
  • Led a team of 1 to 4 data scientists and engineers as well as conducted scientific seminars and technical workshops.
Technologies: Docker, Python, Apache Airflow, Linux, Server Management, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Text Classification, Topic Modeling, Elasticsearch, Artificial Intelligence (AI), Machine Learning, Data Science, Science, Writing & Editing, Team Management, Research, Web Scraping, Scrapy, BERT, Front-end, Text Processing, Data Processing, PyCharm, Git, Jupyter Notebook, Software Engineering, Databases, Neural Networks, System Design, Software Development Management, GPU Computing, Scientific Data Analysis, R&D, Scikit-learn, Django, PostgreSQL, Django REST Framework, NGINX, Server Administration, Management, JavaScript, Exploratory Data Analysis, Predictive Modeling, Classification, Gradient Boosting, XGBoost, Scraping, Back-end, Data Research, GIS, Data Analysis, Expert Systems, Decision Support Systems, Geographic Information Systems, ETL, Data Engineering, Data Analytics, Analytics, Data Cleaning, Data Cleansing, Data Pipelines, Streaming Data, NumPy, Matplotlib, Seaborn, Statistical Analysis, Parsers, JSON, NoSQL, Unstructured Data Analysis, REST APIs, APIs, Jupyter, Deep Learning, CSV

Software Development Team Lead/Product Owner

2015 - 2019
JSC Epigraph
  • Built two commercial informational systems used by over 20 thousand students from over 30 Kazakhstani universities.
  • Led a team of two to three HTML layout coders and JavaScript developers who were creating electronic educational content.
  • Led a team of one to three developers, including a React developer, a mobile developer (React Native) and a back-end developer.
  • Managed a codebase of over 20 thousand lines of code with up to 10 different external integrations, including SMS services, cloud file and video storage, monitoring services, etc.
  • Took full ownership of the project along with all strategic and architectural decisions regarding the project.
  • Introduced new features and improvements regularly, which increased customer satisfaction and user activity by 10 to 30% yearly.
Technologies: Python, Django, PostgreSQL, Django REST Framework, Elasticsearch, Linux, NGINX, Server Administration, Front-end, React, API Integration, Management, PyCharm, Git, Software Engineering, Web Development, Databases, SQL, System Design, Software Development Management, Team Management, JavaScript, Server Management, Back-end, Analytics, Data Pipelines, Streaming Data, JSON, Unstructured Data Analysis, REST APIs, APIs

Senior Lecturer

2016 - 2018
Kazakh-British Technical University
  • Hired by a top Kazakhstani technical university to teach graduate-level students.
  • Created educational programs for undergraduate and graduate students, including artificial intelligence, algorithms and data structures, and applied probability theory.
  • Participated in educational and scientific work with professors from the University of Southampton.
Technologies: Python, Data Science, Artificial Intelligence (AI), Jupyter Notebook, Machine Learning, Data Structures, Neural Networks, Scientific Data Analysis, Scikit-learn, Science, Data Analytics, Deep Learning, Data Analysis

Programmer | Engineer

2012 - 2016
Institute of Information and Computational Technologies
  • Hired by the Institute of National Academy of Sciences as a second-year undergraduate student.
  • Performed computational experiments on the intepretation of geological research. The data was provided by the world's largest uranium producer and seller, JSC KazAtomProm. The experiments were performed in C++ and Python.
  • Built a C++ library for similarity-based classification of geological research data.
  • Contributed to the writing of several scientific articles, including works in Springer and IEEE-indexed publications.
  • Built machine learning models, which exceeded the interpretation quality of human experts by 5 to 10% while reducing interpretation time by orders of magnitude.
Technologies: C++, Machine Learning, Data Science, Geology, Python, Scikit-learn, Pandas, Research, Writing & Editing, Data Structures, Java EE, Neural Networks, Scientific Data Analysis, Data Processing, Artificial Intelligence (AI), Science, Exploratory Data Analysis, Classification, Gradient Boosting, XGBoost, Java, GIS, PostGIS, Data Analysis, Expert Systems, Decision Support Systems, Geographic Information Systems, Data Analytics, Analytics, Data Cleaning, Data Cleansing, Data Pipelines, NumPy, Matplotlib, Seaborn, Statistical Analysis, Jupyter, CSV

Junior Researcher

2011 - 2012
International Information Technology University
  • Hired by my university as a first-year undergraduate student.
  • Performed geological research data preprocessing using C++, including building a wavelet-based analysis and smoothing module from scratch.
  • Worked under the supervision of leading a geophysicist from JSC KazAtomProm subsidiary (JSC Volkovgeologiya, formerly GeoTechnoService).
Technologies: C++, Machine Learning, Geology, Data Processing, Python, Data Structures, Data Science, Neural Networks, Research, Scientific Data Analysis, Writing & Editing, Artificial Intelligence (AI), Science, Exploratory Data Analysis, Classification, Data Analysis, Expert Systems, Decision Support Systems, Statistical Analysis, JSON, Jupyter, CSV

Experience

NLPMonitor

NLPMonitor is a mass-media monitoring information system, which scrapes news publications from mass media and social networks in order to perform informational trend analysis as well as document classification using multiple evaluation criteria.

I’ve built the system from scratch, including:

1) Containerization (Docker), server administration
2) Storage – a hybrid of ElasticSearch and PostgreSQL
3) Scraping using Scrapy
4) Apache Airflow
5) Django web application
6) Text preprocessing, topic modeling, and classification pipeline

I proposed a method for creating topic-based text representation, which allows for building computationally efficient text classification models. A method for high-level labeling was proposed, which allows for classifiers with minimal manual labeling. A number of experiments were conducted including:

1) Sentiment analysis based on expert labeling of 100-200 topics
2) Propaganda identification based on expert labeling of several dozen media-sources
3) News popularity identification based on automatic labeling based on objective user engagement data

The project was integrated into the Ministry of Science and Education of Kazakhstan, the results were published in a Q1 Scopus-indexed article.

Magnum RecSys

https://github.com/web-ai-services/rec-sys
Magnum is the largest FMCG retailer of Kazakhstan, which recently started developing digital products, such as eCommerce (delivery) and loyalty programs. The recommendation system is a vital part of such infrastructure. In this project, I built a recommendation system from scratch, including ETL and model training pipelines, an analytical web tool, and an API for integration. The recommendation algorithm is based on novel Item2Vec approach, which employs ideas of distributed semantics in order to attempt to find contextually similar objects.

However, this approach on its own is not flexible enough and doesn’t intrinsically create value. For example, it can recommend just some commonly bought items like bread, milk and bananas. In order to solve this problem, a number of additional algorithms and business rules were applied in order to create three modes:

1) Complimentary items recommender
2) Upsell recommender
3) Category recommender

The system was integrated with eCommerce and loyalty applications and showed a 50 to 70% increase in user engagement compared to a popular but simple item recommender.

MIGIS

https://ieeexplore.ieee.org/document/8813086
This was a scientific project that required building a geoinformation system for decision-making support for the installation of renewable energy sources (RES) generators. A number of factors need to be taken into account:

1) Technological limitations, such as slope, available area, etc.
2) Economic feasibility – the cost of infrastructure, demand for energy in nearby areas, etc.
3) Energy potential – some integrated indicator of solar/wind/hydro/biomass energy potential

It’s also possible to install different types of RES generators with different characteristics, capacity and prices.

I helped develop the Bayesian fuzzy AHP (BaFAHP) MCDM method, which allows for the calculation of fitness for a given piece of land based on spatially-distributed data (both vector and raster) and expert assessments of factor importance. I’ve also performed a number of computational experiments, performed preparation of data from open sources (NASA SSE, OpenStreetMap, etc.), and integrated the developed algorithm in a Java Spring web application.

I also performed research with colleagues from Moscow State University, including comprehensive model result validation.

The results were published in Q1 Scopus-indexed IEEE Access journal.

Education

2018 - 2021

PhD in Data Science

Satbayev University - Almaty, Kazakhstan

2014 - 2016

Master's Degree in Software Engineering

International Information Technology University - Almaty, Kazakhstan

2010 - 2014

Bachelor's Degree in Computer Science

International Information Technology University - Almaty, Kazakhstan

Skills

Libraries/APIs

Scikit-learn, Pandas, REST APIs, React, XGBoost, Vue, NumPy, Matplotlib, PySpark, PyTorch

Tools

Apache Airflow, NGINX, PyCharm, GIS, Git, Seaborn, Jupyter, Apache Impala, Google Kubernetes Engine (GKE)

Languages

Python, C++, SQL, JavaScript, Java

Frameworks

Django, Django REST Framework, Scrapy, Streamlit, Nuxt.js, Spark, Apache Spark

Platforms

Docker, Linux, Jupyter Notebook, Java EE, Kubernetes

Storage

Elasticsearch, Data Pipelines, JSON, Databases, PostgreSQL, PostGIS, NoSQL

Paradigms

Management, ETL

Other

Machine Learning, Data Science, Research, Natural Language Processing (NLP), R&D, Data Processing, Artificial Intelligence (AI), Classification, Data Analysis, Data Analytics, Analytics, Hugging Face, Large Language Models (LLMs), Open-source LLMs, Deep Neural Networks (DNNs), Software Engineering, Web Development, Data Structures, Neural Networks, Topic Modeling, Scientific Data Analysis, Writing & Editing, Front-end, API Integration, Web Scraping, Text Processing, Churn Analysis, Exploratory Data Analysis, Scraping, Recommendation Systems, Data Cleaning, Data Cleansing, APIs, Deep Learning, CSV, Generative Pre-trained Transformers (GPT), Transformers, System Design, Software Development Management, GPU Computing, Computer Vision, Geology, Server Administration, Server Management, Text Classification, Science, Team Management, BERT, Customer Segmentation, Customer Analysis, Predictive Modeling, Gradient Boosting, Financial Data, Back-end, Material Design, Consulting, Data Research, Expert Systems, Decision Support Systems, Geographic Information Systems, Data Engineering, Streaming Data, Statistical Analysis, Parsers, Unstructured Data Analysis, Dagster, FastAPI

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring