Daniyar Bakir, Developer in Astana, Kazakhstan
Daniyar is available for hire
Hire Daniyar

Daniyar Bakir

Verified Expert  in Engineering

Bio

Daniyar is a data scientist and back-end engineer highly interested in and motivated to develop software to improve the services' response time and overall health. He has trained and developed speech-to-text (STT) systems with limited transcribed audio data for model training. Daniyar improved STT metrics by 5-10% by developing a new training strategy.

Portfolio

Business & Finance Consulting
Python 3, Django, JavaScript, HTMX, PostgreSQL, GitLab CI/CD, Bootstrap...
Btsdigital
Python 3, PyTorch, Django, Flask, Docker, C++, Bash, Speech Recognition...
Nazarbayev University
Python 3, JavaScript, Flask, Web Development, PostgreSQL, SQL, APIs, HTTP, Bash...

Experience

  • Python 3 - 5 years
  • Docker - 4 years
  • Machine Learning - 4 years
  • Django - 3 years
  • Speech to Text - 3 years
  • Pandas - 3 years
  • PyTorch - 3 years
  • Flask - 1 year

Availability

Part-time

Preferred Environment

Vim Text Editor, Spacemacs, Visual Studio Code (VS Code)

The most amazing...

...project I've developed is related to data science. A model training method that vastly improved our metrics and is used as the main training script.

Work Experience

Python Developer

2023 - 2024
Business & Finance Consulting
  • Replaced and structurized an old system of .docx and .xlsx files by designing and building the DB and front- and back-end systems of the web app.
  • Improved CI/CD pipeline build time by two times and reduced image space by three times via optimizing a Dockerfile and GitLab CI/CD and introducing a more sophisticated package manager.
  • Designed and developed the web scraper and a report generation system that replaced the previous manual system of data search and report preparation.
  • Reduced response time of search query by 30% via optimizing BFS and DFS search algorithms.
Technologies: Python 3, Django, JavaScript, HTMX, PostgreSQL, GitLab CI/CD, Bootstrap, CI/CD Pipelines, Selenium, Scraping

Data Scientist

2018 - 2023
Btsdigital
  • Implemented a speech-to-text (STT) code as an easy plug-in service regardless of work mode and architecture within a single STT framework.
  • Designed and implemented the scalable architecture of a face recognition project for an MVP.
  • Increased the speed of speech audio data transcription pipeline by 30% by implementing a spoken language identification model, replacing the manual language annotation with an automated system. Used PyTorch, Python, and Bash.
  • Saved up to six hours of the STT training time by restructuring the data storage system and data preparation pipeline.
  • Parsed image and text data of 33 million users and over 1,000 public groups from social networks by developing a message queue system. Used Python, Python-requests, PostgreSQL, and Celery.
  • Improved speech-to-text model metrics by developing a new training script without using any additional training data.
Technologies: Python 3, PyTorch, Django, Flask, Docker, C++, Bash, Speech Recognition, Speech to Text, Web Development, HTTP REST, PostgreSQL, SQL, RabbitMQ, Apache Kafka, APIs, HTTP, JavaScript, jQuery, Pandas, Machine Learning, Python, HTML, ETL, Git, Deep Learning, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Data Engineering, Scripting, Back-end, NumPy, Jupyter Notebook, Web Scraping, Apache Airflow, Artificial Intelligence (AI), Selenium, Scraping, GitHub

Teaching and Research Assistant

2017 - 2018
Nazarbayev University
  • Launched and developed a3ranking.com. A website to show and visualize the research on a novel academic ranking system. Used Python, Flask, JavaScript, Bootstrap, and D3.
  • Parsed and matched academic data for more than 5,000 universities from four sources by incorporating Selenium and HTTP requests tools through a self-constructed message-queue system.
  • Developed a grid-search optimization technique to estimate the regularization parameter of regularized linear discriminant analysis 10 fold faster than the conventional cross-validation optimization technique using R.
  • Studied, implemented, and performed an extensive comparative analysis of high-dimensional statistical learning methods under different sample and feature size ratios.
Technologies: Python 3, JavaScript, Flask, Web Development, PostgreSQL, SQL, APIs, HTTP, Bash, jQuery, D3.js, Pandas, Bootstrap, LaTeX, Machine Learning, Python, HTML, Git, Embedded Systems, Deep Learning, Web Scraping, Artificial Intelligence (AI), Selenium, GitHub

Experience

CRM Web Tool

A Python DRF and React-based CRM tool. The tool should help automate invoice generation and structure the internal processes of the business.

I am the sole developer working on this project. The back end is based on the Django REST Framework and PostgreSQL and utilizes the Service-Controller-Repository pattern. The front end is based on the React Refine framework. CI/CD and deployment are done via GitLab CI/CD and DigitalOcean Droplet.

Audio Data Preparation Pipeline

An automation pipeline for processing the raw audio data for the Speech Data Annotation team. This tool speeds up the batch preparation process, eliminates human interaction, and visualizes the whole process.

I was the initiator and sole developer of this project, which sped up the annotation process from 3.5 – 4 hours per batch to 45 –50 minutes per batch. It was completed using Apache Airflow, FFmpeg, Docker, in-house STT (for pseudo annotation), Python, and Flask.

Speech-to-text Model Optimization

The project involved training time optimization of the speech-to-text (STT) pipeline by restructuring scripts and storage. The problems included increasing checking hypotheses on STT and reducing scheduling problems for the GPU server. As an ML engineer, I had to find optimization points and implement them, restructure the training pipeline, and reorganize the data storage structure. It resulted in a significant reduction in training time and the removal of idle GPU server operation during the initial CPU computation steps by segregating the training pipeline into two parts: CPU and GPU.

Virtual Assistant

A virtual assistant can improve users' overall experience with everyday life tasks by completing simple tasks. The service needs the users' commands and completes their requests by passing them through a technical pipeline that consists of speech-to-text, natural language processing, and text-to-speech modules. I acted as an ML engineer, designing and implementing a specific speech-to-text system for the virtual assistant and implementing integration with the virtual assistant pipeline.

Admin Tool

A web project for internal usage only. It was developed to replace a 3rd-party vendor project with an in-house customized application.

In this project, I was the sole developer responsible for the data scraping, back end, front end, CI/CD, and deployment.

Speech-to-text Service

A simple-to-use speech-to-text training pipeline and service for a 2-language environment, Kazakh and Russian. For example, the STT service can be retrained to match a particular vocabulary used in a financial call center. As an ML engineer, I researched and implemented various STT architectures, trained and validated models on new data, developed and improved the existing STT models, and optimized the data preparation pipeline.

Cold Calling Automation

The project's goal was to increase the client's involvement in the market, optimize the sales managers' working time, and increase the company's product presence among the businesses. I was an ML engineer contributing to designing a call center's pipeline that consisted of STT, NLP, TSS, telephony, and message queue. We also created an STT system suitable for the company's jargon. It resulted in a more than four-fold increase in cold calls.

Audio Annotation Tool

It is an internal web tool for large-scale audio transcription. The audio transcription required a lot of human resources, and the existing open-source audio annotation tools were slow and difficult to use while requiring manipulation with the user's machine, that is, installation or sharing physical data. This project aimed to develop a web tool that would remove the large downsides of open-source tools by eliminating, distributing, and collecting data to and from annotators, introducing automatic data engineering, and customizing the tool for all annotators simultaneously.

As a full-stack developer, I designed and developed the architecture of the back-end system, data collection pipeline between the database and ML engineers, and UI. I also developed the front-end system and transferred the project to a new team.

Spoken Language Identification (LID)

The project involved developing an ML service to determine spoken language in an audio file by developing a spoken language identification model and integrating this model into the data preparation pipeline. As an ML engineer, I designed, implemented, validated, automated, and improved the language identification model in audio files. I significantly reduced the time spent annotating audio file language and prepared and curated audio datasets.

Computer Vision Web Application

A fun service for users to find a celebrity face. For example, a user uploads their face and selects a target celebrity. The app will display the face of a celebrity that looks like the user, then the next face that looks like the previous celebrity until the target celebrity is shown. I acted as a back-end developer and designed a scalable microservice architecture connecting several ML microservices. I also developed the app's back end and front end and implemented a production-ready code.

Social Network Parsing App

The business idea of the project was to collect user data from external sources to improve the company's products, such as the recommendation system. As the back-end developer, I explored various ways to retrieve data bypassing pagination limitations, developed the API to update existing data in the database, and implemented a single fetch of many users' data.

University Ranking System

A web project based on Python and Flask for a university ranking system. The business idea was to create an academic ranking system for universities and present it as a website. I acted as a full-stack developer managing data crawling for 4,000+ universities, developing the back-end and mobile-first front-end systems, creating an interactive map to visualize top universities per country, and deploying the project on DigitalOcean.

Education

2015 - 2017

Master's Degree in Electrical and Electronics Engineering

Nazarbayev University - Astana, Kazakhstan

2011 - 2015

Bachelor's Degree in Electrical and Electronics Engineering

Nazarbayev University - Astana, Kazakhstan

Skills

Libraries/APIs

NumPy, PyTorch, Pandas, D3.js, jQuery, Flask-RESTful, OpenCV, Vue, HTMX, React, PostgREST

Tools

MATLAB, LaTeX, Vim Text Editor, Git, Spacemacs, RabbitMQ, Asterisk, TensorBoard, Apache Airflow, Docker Compose, GitLab CI/CD, GitHub

Languages

Python, Python 3, Bash, SQL, R, C++, JavaScript, HTML, Go, TypeScript

Frameworks

Django, Flask, Bootstrap, Cypress, Selenium, Django REST Framework

Platforms

Docker, Jupyter Notebook, Arduino, Raspberry Pi, Apache Kafka, Visual Studio Code (VS Code), Kubernetes

Storage

PostgreSQL, Google Cloud

Paradigms

Microservices Architecture, ETL

Other

Machine Learning, HTTP, APIs, HTTP REST, Web Development, Speech to Text, Speech Recognition, Deep Learning, Data Engineering, Scripting, Back-end, Web Scraping, Data Scraping, Artificial Intelligence (AI), Computer Vision, Embedded Systems, Natural Language Processing (NLP), FAISS, Generative Pre-trained Transformers (GPT), CI/CD Pipelines, Scraping

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring