Dmytro Babych
Verified Expert in Engineering
Data Scientist and Software Developer
Dmytro is a data scientist with seven years of experience. He possesses exceptional expertise in utilizing natural language processing tools and is well-versed in Python and Java. His impressive track record includes a significant accomplishment at Samsung Electronics, where he optimized neural networks to reduce RAM usage by a factor of 10. Dmytro has worked in different application domains, such as healthcare, electronics, lead generation, and finance.
Portfolio
Experience
Availability
Preferred Environment
Git, Python, PyTorch, TensorFlow, Amazon Web Services (AWS), NumPy, Scikit-learn, Plotly, Android, Java
The most amazing...
...project I've finished is the optimization of neural networks for mobile devices. We reduced the RAM utilization 10 times without loss in accuracy.
Work Experience
Senior Data Scientist
Eagle Genomics
- Delivered a knowledge graph-based (KG) application to help scientists find the effects of different microbiome-related entities (such as species themselves, genes, proteins, and metabolites) on human and animal health.
- Co-piloted the application to query and summarize the KG (based on GPT-4 API).
- Handled KG enrichment using text-mining, utilizing a fine-tuned large language model (based on LoRA) for relationship classification between these entities within sentences.
- Implemented the human-in-the-loop process. Feedback from users on relationship classification is stored, and the model is further re-trained on that.
- Constructed a pipeline to parse new scientific articles and add them to the graph database (TypeDB and Neo4j).
- Contributed to the retrieval-augmented generation pipeline. When the answer to the question is not present in the KG, the application looks for it in the text database built in Qdrant.
Senior/Lead Data Scientist
LITSLINK
- Delivered multiple projects, including ASR for a banking application, a job matching platform, a lead generation engine, text generation for compliments and products description, and a travel application chatbot.
- Guided and provided feedback to junior colleagues on less complicated projects.
- Contributed extensively to presales activities with potential clients.
Machine Learning Engineer
SoftServe
- Developed a working PoC for a semantic textual similarity project, deployed TensorFlow model using AWS Batch, and created a web UI application for client needs.
- Created a pipeline for an information retrieval project, which included processing PDF documents, UI application for annotating them, training, and extraction of new documents.
- Supported the pipeline by creating CI/CD infrastructure in GitHub.
NLP Engineer
Samsung
- Implemented state-of-the-art neural network solutions in the NLP field.
- Optimized and deployed neural networks on mobile devices, reducing RAM usage 10 times without accuracy drop.
- Contributed to semantic similarity tasks, document categorization, named-entity recognition (NER) and part-of-speech (POS) tagging, question answering, and text summarization.
Software Developer
NeoDesign
- Developed internal tools for payment administration, including a Django-based dashboard web app.
- Scraped web pages and extracted information from social media.
- Prepared visualization dashboards with customer activity.
Experience
Literature-based Knowledge Graph
The project aimed to enhance researchers' understanding of biomedical concepts by constructing a knowledge graph representing interrelationships among entities and providing valuable insights into opinions and attitudes expressed in the literature. Ultimately, this automated approach facilitated efficient knowledge discovery and contributed to advancements in the field of biomedical research.
As the lead developer on this project, I implemented key components, including biomedical named-entity recognition, relationship classification, targeted sentiment classification, and sub-graph visualization.
Neural Network Optimization for Mobile Devices
Text Classification:
Model size on disk: 80 MB => 12 MB, 85 % reduced
RAM usage: 85 MB => 25 MB, 71% reduced
Accuracy drop: None
Named Entity Recognition:
Model size on disk: 323 MB => 43 MB, 87 % reduced
RAM usage: 400 MB => 40 MB, 90% reduced
Accuracy drop: None
Workflow included:
1. Training the model is the same as with normal embeddings;
2. Quantization was implemented in C++ TensorFlow source code:
2.1 Finding relevant (biggest) tensor in the model's graph;
2.2 Implementing quantization of -0.33 +0.33 values to 1s and 0s;
2.3 Packing ones and zeros on int8 data type so it’ll take less space;
2.4 Implementation of unpacking operation for inference and replacing old operation with it;
2.5 Saving the transformed graph.
3. Java package implementation for model inference and performance measuring.
Education
Master's Degree in Computer Science
Vistula University - Warsaw, Poland
Bachelor's Degree in Physical and Biomedical Electronics
Igor Sikorsky Kyiv Polytechnic Institute - Kyiv, Ukraine
Skills
Libraries/APIs
PyTorch, TensorFlow, NumPy, Scikit-learn, Pandas, Beautiful Soup, Matplotlib
Tools
Git, Plotly, ChatGPT
Languages
Python, Java
Platforms
Ubuntu Linux, Amazon Web Services (AWS), Android
Frameworks
Flutter, Django
Other
Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Models (LLMs), Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, Machine Learning, OpenAI GPT-3 API, Language Models, Algorithms, Programming, OCR, Recurrent Neural Networks (RNNs), Optimization
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring