Dmytro Babych, Developer in London, United Kingdom
Dmytro is available for hire
Hire Dmytro

Dmytro Babych

Verified Expert  in Engineering

Data Scientist and Software Developer

Location
London, United Kingdom
Toptal Member Since
July 18, 2023

Dmytro is a data scientist with seven years of experience. He possesses exceptional expertise in utilizing natural language processing tools and is well-versed in Python and Java. His impressive track record includes a significant accomplishment at Samsung Electronics, where he optimized neural networks to reduce RAM usage by a factor of 10. Dmytro has worked in different application domains, such as healthcare, electronics, lead generation, and finance.

Portfolio

Eagle Genomics
Python, PyTorch, Pandas, NumPy, Scikit-learn, Plotly...
LITSLINK
Python, PyTorch, Plotly, Amazon Web Services (AWS)...
SoftServe
Python, PyTorch, Pandas, Scikit-learn, NumPy, Amazon Web Services (AWS), Plotly...

Experience

Availability

Part-time

Preferred Environment

Git, Python, PyTorch, TensorFlow, Amazon Web Services (AWS), NumPy, Scikit-learn, Plotly, Android, Java

The most amazing...

...project I've finished is the optimization of neural networks for mobile devices. We reduced the RAM utilization 10 times without loss in accuracy.

Work Experience

Senior Data Scientist

2021 - PRESENT
Eagle Genomics
  • Delivered a knowledge graph-based (KG) application to help scientists find the effects of different microbiome-related entities (such as species themselves, genes, proteins, and metabolites) on human and animal health.
  • Co-piloted the application to query and summarize the KG (based on GPT-4 API).
  • Handled KG enrichment using text-mining, utilizing a fine-tuned large language model (based on LoRA) for relationship classification between these entities within sentences.
  • Implemented the human-in-the-loop process. Feedback from users on relationship classification is stored, and the model is further re-trained on that.
  • Constructed a pipeline to parse new scientific articles and add them to the graph database (TypeDB and Neo4j).
  • Contributed to the retrieval-augmented generation pipeline. When the answer to the question is not present in the KG, the application looks for it in the text database built in Qdrant.
Technologies: Python, PyTorch, Pandas, NumPy, Scikit-learn, Plotly, Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Models (LLMs), Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, Machine Learning, OpenAI GPT-3 API, ChatGPT, Language Models

Senior/Lead Data Scientist

2020 - 2021
LITSLINK
  • Delivered multiple projects, including ASR for a banking application, a job matching platform, a lead generation engine, text generation for compliments and products description, and a travel application chatbot.
  • Guided and provided feedback to junior colleagues on less complicated projects.
  • Contributed extensively to presales activities with potential clients.
Technologies: Python, PyTorch, Plotly, Amazon Web Services (AWS), Artificial Intelligence (AI), Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Machine Learning, Language Models

Machine Learning Engineer

2019 - 2020
SoftServe
  • Developed a working PoC for a semantic textual similarity project, deployed TensorFlow model using AWS Batch, and created a web UI application for client needs.
  • Created a pipeline for an information retrieval project, which included processing PDF documents, UI application for annotating them, training, and extraction of new documents.
  • Supported the pipeline by creating CI/CD infrastructure in GitHub.
Technologies: Python, PyTorch, Pandas, Scikit-learn, NumPy, Amazon Web Services (AWS), Plotly, Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning, OCR, Language Models

NLP Engineer

2018 - 2019
Samsung
  • Implemented state-of-the-art neural network solutions in the NLP field.
  • Optimized and deployed neural networks on mobile devices, reducing RAM usage 10 times without accuracy drop.
  • Contributed to semantic similarity tasks, document categorization, named-entity recognition (NER) and part-of-speech (POS) tagging, question answering, and text summarization.
Technologies: Python, TensorFlow, PyTorch, Java, Android, Ubuntu Linux, Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning, Language Models

Software Developer

2016 - 2017
NeoDesign
  • Developed internal tools for payment administration, including a Django-based dashboard web app.
  • Scraped web pages and extracted information from social media.
  • Prepared visualization dashboards with customer activity.
Technologies: Python, Django, Beautiful Soup, Matplotlib, Artificial Intelligence (AI), Natural Language Processing (NLP), Machine Learning, Language Models

Literature-based Knowledge Graph

By utilizing advanced natural language processing techniques, the application extracted relevant information from the text, established relationships between entities, and classified sentiment toward extracted entities and pre-defined terms.

The project aimed to enhance researchers' understanding of biomedical concepts by constructing a knowledge graph representing interrelationships among entities and providing valuable insights into opinions and attitudes expressed in the literature. Ultimately, this automated approach facilitated efficient knowledge discovery and contributed to advancements in the field of biomedical research.

As the lead developer on this project, I implemented key components, including biomedical named-entity recognition, relationship classification, targeted sentiment classification, and sub-graph visualization.

Neural Network Optimization for Mobile Devices

In order to process text “on device” without sending the data to external servers (due to privacy issues), we needed to adopt neural networks (NN) for smartphone usage. Using different techniques like pruning, quantization, and word2bits we were able to achieve such results:

Text Classification:
Model size on disk: 80 MB => 12 MB, 85 % reduced
RAM usage: 85 MB => 25 MB, 71% reduced
Accuracy drop: None

Named Entity Recognition:
Model size on disk: 323 MB => 43 MB, 87 % reduced
RAM usage: 400 MB => 40 MB, 90% reduced
Accuracy drop: None

Workflow included:
1. Training the model is the same as with normal embeddings;
2. Quantization was implemented in C++ TensorFlow source code:
2.1 Finding relevant (biggest) tensor in the model's graph;
2.2 Implementing quantization of -0.33 +0.33 values to 1s and 0s;
2.3 Packing ones and zeros on int8 data type so it’ll take less space;
2.4 Implementation of unpacking operation for inference and replacing old operation with it;
2.5 Saving the transformed graph.
3. Java package implementation for model inference and performance measuring.
2017 - 2019

Master's Degree in Computer Science

Vistula University - Warsaw, Poland

2013 - 2017

Bachelor's Degree in Physical and Biomedical Electronics

Igor Sikorsky Kyiv Polytechnic Institute - Kyiv, Ukraine

Libraries/APIs

PyTorch, TensorFlow, NumPy, Scikit-learn, Pandas, Beautiful Soup, Matplotlib

Tools

Git, Plotly, ChatGPT

Languages

Python, Java

Platforms

Ubuntu Linux, Amazon Web Services (AWS), Android

Frameworks

Flutter, Django

Other

Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Models (LLMs), Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, Machine Learning, OpenAI GPT-3 API, Language Models, Algorithms, Programming, OCR, Recurrent Neural Networks (RNNs), Optimization

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring