
Kirill Yakunin
Verified Expert in Engineering
Machine Learning Developer
Almaty, Almaty Province, Kazakhstan
Toptal member since August 19, 2021
Kirill has a PhD in data science and natural language processing. He has broad experience in academic, scientific, and enterprise environments, including developing full-scale data-driven applications from architectural decisions, storage, ETL processes, and back- and front-end development. For several years, Kirill has been working with large language models, both for enterprise customers and as an independent researcher.
Portfolio
Experience
- Machine Learning - 10 years
- Python - 9 years
- Software Engineering - 7 years
- Generative Pre-trained Transformers (GPT) - 3 years
- Natural Language Processing (NLP) - 3 years
- Large Language Models (LLMs) - 2 years
- GPU Computing - 2 years
- Kubernetes - 2 years
Availability
Preferred Environment
Python, Dagster, Transformers, PyTorch, Kubernetes, Google Kubernetes Engine (GKE)
The most amazing...
...project I've developed was a mass-media monitoring and analysis system that satisfied the client's requirements and yielded publishable scientific results.
Work Experience
Machine Learning Engineer
Metaculus
- Built infrastructure for an LLM-based information retrieval system that automates the retrieval of relevant information from public sources.
- Designed and implemented experiments to improve information retrieval, including a unique point cloud-based candidate generation algorithm, chunks skimming, summarization, etc.
- Maintained existing solutions hosted in GKE and migrated them to Dagster and GKE.
- Worked on the rapid evaluation of numerous AI-related R&D hypotheses.
- Optimized the search and discoverability of the Metaculus platform using LLMs.
CEO
JSC UpMetric
- Started integrating machine learning methods for geological research data interpretation for the largest uranium producer in the world, NAC Kazatomprom JSC.
- Developed a system for the National Bank of Kazakhstan. The system predicts expectations for inflation and the level of trust in the national currency based on textual information. The indicators show a high (0.6 – 0.8) correlation with survey results.
- Assembled a remote team of experienced specialists capable of solving various tasks, from consulting to system development, integration, and technical support.
Data Science Team Lead
JSC Frontier KZ
- Built a churn prediction model for the largest telecommunication company in Kazakhstan, which improved their current process tenfold.
- Provided data science-related consulting to major financial institutions, telecommunication companies, and FMCG retailers in Kazakhstan.
- Led a team of data scientists to explore and utilize fiscalization data from the largest fiscalization provider of Kazakhstan. The work includes predictive analysis, text classification, market segmentation, and other ad-hoc research and solutions.
- Built a lookalike model, which integrates data from JSC Magnum, the biggest FMCG retailer in Kazakhstan, and telecommunication companies, in order to predict subscribers' potential interest in one of a few dozen product segments.
- Analyzed mobile geodata for the city hall to gain valuable insights into the COVID-19 situation in the city of Almaty.
Full-stack Developer
JSC Sagrad
- Developed a time-tracking information system for Shymbulak Ski and Snowboard School, which improved trainers' time utilization by 30%.
- Developed and supported payment kiosk software for JSC KMF, which provides loans for small businesses and individuals. The kiosk network contains over 150 kiosks all over Kazakhstan that serve over 200 thousand clients.
- Participated in all stages of project development, from sales to project requirements, design, development, integration, and support. I've also hired and managed other developers and designers on several projects.
Lead Researcher
Institute of Information and Computational Technologies
- Built a distributed information system for scraping and analyzing mass-media sources. I built it from scratch, starting from containerization, storage system, and ETL to scrapers, text preprocessing, analysis, and visualization.
- Published one Q1, two Q2, and two Q3 Scopus-indexed articles and other indexed publications, in many of which I am either the first or corresponding author.
- Developed a novel text classification approach based on topic modeling, which in many tasks requires little to no manual labeling. I validated this approach and proposed several high-level labeling methods on several problems.
- Collected a corpus of over 7 million publications and over 2 terabytes of experimental data. The system was officially integrated in the Ministry of Education and Science of Kazakhstan.
- Led a team of 1 to 4 data scientists and engineers as well as conducted scientific seminars and technical workshops.
Software Development Team Lead/Product Owner
JSC Epigraph
- Built two commercial informational systems used by over 20 thousand students from over 30 Kazakhstani universities.
- Led a team of two to three HTML layout coders and JavaScript developers who were creating electronic educational content.
- Led a team of one to three developers, including a React developer, a mobile developer (React Native) and a back-end developer.
- Managed a codebase of over 20 thousand lines of code with up to 10 different external integrations, including SMS services, cloud file and video storage, monitoring services, etc.
- Took full ownership of the project along with all strategic and architectural decisions regarding the project.
- Introduced new features and improvements regularly, which increased customer satisfaction and user activity by 10 to 30% yearly.
Senior Lecturer
Kazakh-British Technical University
- Hired by a top Kazakhstani technical university to teach graduate-level students.
- Created educational programs for undergraduate and graduate students, including artificial intelligence, algorithms and data structures, and applied probability theory.
- Participated in educational and scientific work with professors from the University of Southampton.
Programmer | Engineer
Institute of Information and Computational Technologies
- Hired by the Institute of National Academy of Sciences as a second-year undergraduate student.
- Performed computational experiments on the intepretation of geological research. The data was provided by the world's largest uranium producer and seller, JSC KazAtomProm. The experiments were performed in C++ and Python.
- Built a C++ library for similarity-based classification of geological research data.
- Contributed to the writing of several scientific articles, including works in Springer and IEEE-indexed publications.
- Built machine learning models, which exceeded the interpretation quality of human experts by 5 to 10% while reducing interpretation time by orders of magnitude.
Junior Researcher
International Information Technology University
- Hired by my university as a first-year undergraduate student.
- Performed geological research data preprocessing using C++, including building a wavelet-based analysis and smoothing module from scratch.
- Worked under the supervision of leading a geophysicist from JSC KazAtomProm subsidiary (JSC Volkovgeologiya, formerly GeoTechnoService).
Experience
NLPMonitor
I’ve built the system from scratch, including:
1) Containerization (Docker), server administration
2) Storage – a hybrid of ElasticSearch and PostgreSQL
3) Scraping using Scrapy
4) Apache Airflow
5) Django web application
6) Text preprocessing, topic modeling, and classification pipeline
I proposed a method for creating topic-based text representation, which allows for building computationally efficient text classification models. A method for high-level labeling was proposed, which allows for classifiers with minimal manual labeling. A number of experiments were conducted including:
1) Sentiment analysis based on expert labeling of 100-200 topics
2) Propaganda identification based on expert labeling of several dozen media-sources
3) News popularity identification based on automatic labeling based on objective user engagement data
The project was integrated into the Ministry of Science and Education of Kazakhstan, the results were published in a Q1 Scopus-indexed article.
Magnum RecSys
https://github.com/web-ai-services/rec-sysHowever, this approach on its own is not flexible enough and doesn’t intrinsically create value. For example, it can recommend just some commonly bought items like bread, milk and bananas. In order to solve this problem, a number of additional algorithms and business rules were applied in order to create three modes:
1) Complimentary items recommender
2) Upsell recommender
3) Category recommender
The system was integrated with eCommerce and loyalty applications and showed a 50 to 70% increase in user engagement compared to a popular but simple item recommender.
MIGIS
https://ieeexplore.ieee.org/document/88130861) Technological limitations, such as slope, available area, etc.
2) Economic feasibility – the cost of infrastructure, demand for energy in nearby areas, etc.
3) Energy potential – some integrated indicator of solar/wind/hydro/biomass energy potential
It’s also possible to install different types of RES generators with different characteristics, capacity and prices.
I helped develop the Bayesian fuzzy AHP (BaFAHP) MCDM method, which allows for the calculation of fitness for a given piece of land based on spatially-distributed data (both vector and raster) and expert assessments of factor importance. I’ve also performed a number of computational experiments, performed preparation of data from open sources (NASA SSE, OpenStreetMap, etc.), and integrated the developed algorithm in a Java Spring web application.
I also performed research with colleagues from Moscow State University, including comprehensive model result validation.
The results were published in Q1 Scopus-indexed IEEE Access journal.
Education
PhD in Data Science
Satbayev University - Almaty, Kazakhstan
Master's Degree in Software Engineering
International Information Technology University - Almaty, Kazakhstan
Bachelor's Degree in Computer Science
International Information Technology University - Almaty, Kazakhstan
Skills
Libraries/APIs
Scikit-learn, Pandas, REST APIs, React, XGBoost, Vue, NumPy, Matplotlib, PySpark, PyTorch
Tools
Apache Airflow, NGINX, PyCharm, GIS, Git, Seaborn, Jupyter, Apache Impala, Google Kubernetes Engine (GKE)
Languages
Python, C++, SQL, JavaScript, Java
Frameworks
Django, Django REST Framework, Scrapy, Streamlit, Nuxt.js, Spark, Apache Spark
Platforms
Docker, Linux, Jupyter Notebook, Java EE, Kubernetes
Storage
Elasticsearch, Data Pipelines, JSON, Databases, PostgreSQL, PostGIS, NoSQL
Paradigms
Management, ETL
Other
Machine Learning, Data Science, Research, Natural Language Processing (NLP), R&D, Data Processing, Artificial Intelligence (AI), Classification, Data Analysis, Data Analytics, Analytics, Hugging Face, Large Language Models (LLMs), Open-source LLMs, Deep Neural Networks (DNNs), Software Engineering, Web Development, Data Structures, Neural Networks, Topic Modeling, Scientific Data Analysis, Writing & Editing, Front-end, API Integration, Web Scraping, Text Processing, Churn Analysis, Exploratory Data Analysis, Scraping, Recommendation Systems, Data Cleaning, Data Cleansing, APIs, Deep Learning, CSV, Generative Pre-trained Transformers (GPT), Transformers, System Design, Software Development Management, GPU Computing, Computer Vision, Geology, Server Administration, Server Management, Text Classification, Science, Team Management, BERT, Customer Segmentation, Customer Analysis, Predictive Modeling, Gradient Boosting, Financial Data, Back-end, Material Design, Consulting, Data Research, Expert Systems, Decision Support Systems, Geographic Information Systems, Data Engineering, Streaming Data, Statistical Analysis, Parsers, Unstructured Data Analysis, Dagster, FastAPI
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring