
Daniyar Bakir
Verified Expert in Engineering
Software Developer
Astana, Kazakhstan
Toptal member since May 17, 2022
Daniyar is a data scientist and back-end engineer highly interested in and motivated to develop software to improve the services' response time and overall health. He has trained and developed speech-to-text (STT) systems with limited transcribed audio data for model training. Daniyar improved STT metrics by 5-10% by developing a new training strategy.
Portfolio
Experience
- Python 3 - 5 years
- Docker - 4 years
- Machine Learning - 4 years
- Django - 3 years
- Speech to Text - 3 years
- Pandas - 3 years
- PyTorch - 3 years
- Flask - 1 year
Availability
Preferred Environment
Vim Text Editor, Spacemacs, Visual Studio Code (VS Code)
The most amazing...
...project I've developed is related to data science. A model training method that vastly improved our metrics and is used as the main training script.
Work Experience
Python Developer
Business & Finance Consulting
- Replaced and structurized an old system of .docx and .xlsx files by designing and building the DB and front- and back-end systems of the web app.
- Improved CI/CD pipeline build time by two times and reduced image space by three times via optimizing a Dockerfile and GitLab CI/CD and introducing a more sophisticated package manager.
- Designed and developed the web scraper and a report generation system that replaced the previous manual system of data search and report preparation.
- Reduced response time of search query by 30% via optimizing BFS and DFS search algorithms.
Data Scientist
Btsdigital
- Implemented a speech-to-text (STT) code as an easy plug-in service regardless of work mode and architecture within a single STT framework.
- Designed and implemented the scalable architecture of a face recognition project for an MVP.
- Increased the speed of speech audio data transcription pipeline by 30% by implementing a spoken language identification model, replacing the manual language annotation with an automated system. Used PyTorch, Python, and Bash.
- Saved up to six hours of the STT training time by restructuring the data storage system and data preparation pipeline.
- Parsed image and text data of 33 million users and over 1,000 public groups from social networks by developing a message queue system. Used Python, Python-requests, PostgreSQL, and Celery.
- Improved speech-to-text model metrics by developing a new training script without using any additional training data.
Teaching and Research Assistant
Nazarbayev University
- Launched and developed a3ranking.com. A website to show and visualize the research on a novel academic ranking system. Used Python, Flask, JavaScript, Bootstrap, and D3.
- Parsed and matched academic data for more than 5,000 universities from four sources by incorporating Selenium and HTTP requests tools through a self-constructed message-queue system.
- Developed a grid-search optimization technique to estimate the regularization parameter of regularized linear discriminant analysis 10 fold faster than the conventional cross-validation optimization technique using R.
- Studied, implemented, and performed an extensive comparative analysis of high-dimensional statistical learning methods under different sample and feature size ratios.
Experience
CRM Web Tool
I am the sole developer working on this project. The back end is based on the Django REST Framework and PostgreSQL and utilizes the Service-Controller-Repository pattern. The front end is based on the React Refine framework. CI/CD and deployment are done via GitLab CI/CD and DigitalOcean Droplet.
Audio Data Preparation Pipeline
I was the initiator and sole developer of this project, which sped up the annotation process from 3.5 – 4 hours per batch to 45 –50 minutes per batch. It was completed using Apache Airflow, FFmpeg, Docker, in-house STT (for pseudo annotation), Python, and Flask.
Speech-to-text Model Optimization
Virtual Assistant
Admin Tool
In this project, I was the sole developer responsible for the data scraping, back end, front end, CI/CD, and deployment.
Speech-to-text Service
Cold Calling Automation
Audio Annotation Tool
As a full-stack developer, I designed and developed the architecture of the back-end system, data collection pipeline between the database and ML engineers, and UI. I also developed the front-end system and transferred the project to a new team.
Spoken Language Identification (LID)
Computer Vision Web Application
Social Network Parsing App
University Ranking System
Education
Master's Degree in Electrical and Electronics Engineering
Nazarbayev University - Astana, Kazakhstan
Bachelor's Degree in Electrical and Electronics Engineering
Nazarbayev University - Astana, Kazakhstan
Skills
Libraries/APIs
NumPy, PyTorch, Pandas, D3.js, jQuery, Flask-RESTful, OpenCV, Vue, HTMX, React, PostgREST
Tools
MATLAB, LaTeX, Vim Text Editor, Git, Spacemacs, RabbitMQ, Asterisk, TensorBoard, Apache Airflow, Docker Compose, GitLab CI/CD, GitHub
Languages
Python, Python 3, Bash, SQL, R, C++, JavaScript, HTML, Go, TypeScript
Frameworks
Django, Flask, Bootstrap, Cypress, Selenium, Django REST Framework
Platforms
Docker, Jupyter Notebook, Arduino, Raspberry Pi, Apache Kafka, Visual Studio Code (VS Code), Kubernetes
Storage
PostgreSQL, Google Cloud
Paradigms
Microservices Architecture, ETL
Other
Machine Learning, HTTP, APIs, HTTP REST, Web Development, Speech to Text, Speech Recognition, Deep Learning, Data Engineering, Scripting, Back-end, Web Scraping, Data Scraping, Artificial Intelligence (AI), Computer Vision, Embedded Systems, Natural Language Processing (NLP), FAISS, Generative Pre-trained Transformers (GPT), CI/CD Pipelines, Scraping
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring