Vladimir Kotrovskiy, Developer in Yerevan, Armenia
Vladimir is available for hire
Hire Vladimir

Vladimir Kotrovskiy

Verified Expert  in Engineering

Machine Learning Engineer and Developer

Location
Yerevan, Armenia
Toptal Member Since
June 17, 2020

Vladimir is an experienced ML engineer with completed cross-disciplinary projects in areas ranging from NLP and CV to fintech and medicine. He can successfully develop and maintain a product from ideation to production in close contact with the business and collaborative teams, find solutions to all emerging problems, and always aims for excellent results. Vladimir has extensive experience with all major ML frameworks, writes clean code, and builds scalable and maintainable solutions.

Portfolio

Berry Appleman & Leiden - Dunasi
Data Science, SQL, Python, Azure, Predictive Modeling, Predictive Analytics...
LoyaltyLoop, LLC
Python, Generative Pre-trained Transformers (GPT), GPT...
WorldQuant
Algorithmic Trading, Python, Graphs, Neo4j, JanusGraph, TigerGraph...

Experience

Availability

Full-time

Preferred Environment

Docker, Google Cloud, Linux, Amazon Web Services (AWS), Kubernetes

The most amazing...

...thing I've built is a flexible voice assistant SDK, allowing clients to create all types of dialogs and interactions, ASR activation, QA, and Wiki integration.

Work Experience

Data Scientist for legal SaaS

2022 - 2024
Berry Appleman & Leiden - Dunasi
  • Designed and developed a complex product for real time data gathering, processing and analysis for constantly updated time series data.
  • Developed system architecture and several SQL and NoSQL databases.
  • Developed a distributed Redis queue system with multiple instances and workers per instance for parsing working fully automatically and monitoring tools.
  • Worked on whitepapers and multiple predictors, including time series predictors, analytical pipelines, and R&D.
  • Developed several backend microservices for mobile and web applications.
  • Worked with Databricks ETLs and pipelines and integration between databases <-> Databricks <-> back\front ends.
Technologies: Data Science, SQL, Python, Azure, Predictive Modeling, Predictive Analytics, Data Mining, Amazon Web Services (AWS), Time Series, Redis, Redis Queue, Distributed Systems, Parsers, AWS Lambda, Databricks, PySpark, PostgreSQL, Amazon SageMaker, Architecture

Data Scientist, NLP

2022 - 2022
LoyaltyLoop, LLC
  • Developed several variants of topic modeling, including transformers, statistical-based, and some mixtures.
  • Carried out sentiment analysis, emotion detections, and scoring models.
  • Found long-term trends and time-related changes in customers' attitudes.
  • Made visual representations of the clustering process and topics, e.g., word clouds.
  • Developed custom quality metrics and statistics for clusters.
  • Completed back-end development with AWS S3, Instances, DynamoDB, Lambdas, VPC, and other services.
Technologies: Python, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Transformers, Statistics, Amazon Web Services (AWS), Clustering, Sentiment Analysis, JSON, Convolutional Neural Networks (CNN), TensorFlow, PyTorch, AWS Lambda, Large Language Models (LLMs), Hugging Face, Generative AI, Topic Modeling, Amazon SageMaker

Senior Machine Learning Engineer

2021 - 2022
WorldQuant
  • Build knowledge graphs from multiple datasets, developing architecture of graphs.
  • Developed and applied machine learning models to graphs, such as PyTorch Geometric.
  • Developed graph scripts, algorithms, and parallelizations.
Technologies: Algorithmic Trading, Python, Graphs, Neo4j, JanusGraph, TigerGraph, Machine Learning, Business Models, Data Modeling

Machine Learning Engineer / Data Scientist

2020 - 2021
MySky
  • Built an ML, CV, and NLP system for recognizing tabular data from invoices and other documents, with a pipeline able to fast-train models for a specific invoice type.
  • Developed multiple NLP/ML predictive systems for intelligent document processing, including named-entity recognition (NER), invoices and costs classification, text summarization, missing text prediction, and fraud detection.
  • Worked on AWS services and back-end development, including Textract, S3, Lambda, SNS, PostgreSQL, DynamoDB, CloudTrail, EventBus, and CloudWatch integrations.
Technologies: GPT, Natural Language Processing (NLP), Machine Learning, Artificial Intelligence (AI), Python, Data Science, Pandas, Computer Vision, Amazon Web Services (AWS), Git, SQL, PyTorch, Distributed Systems, Bash, Natural Language Toolkit (NLTK), OpenCV, Windows, Linux, Docker, SciPy, Data Visualization, Quantitative Analysis, Matplotlib, TensorBoard, SpaCy, Gensim, Scikit-learn, Kubernetes, Transformers, Deep Learning, Podman, BERT, Generative Pre-trained Transformers (GPT), Data, Big Data, Image Recognition, Data Engineering, Data Pipelines, ETL, Google Cloud Console, NumPy, Google Cloud Platform (GCP), Neural Networks, Statistical Analysis, Data Analytics, Startups, Apache Kafka, RabbitMQ, Amazon Simple Queue Service (SQS), NoSQL, XGBoost, OpenAI Gym, Time Series, Keras, Predictive Modeling, Business Models, Data Modeling, Image Processing, JSON, Convolutional Neural Networks (CNN), TensorFlow, Large Language Models (LLMs), Hugging Face, Generative AI, PostgreSQL, Amazon SageMaker, Architecture

Data Scientist

2019 - 2020
MTS AI
  • Built several NLP parts, including the NER, of a multi-agent medical AI advice system designed to dynamically make predictions based on brief initial claims, available laboratory data, and a series of consecutive follow-up questions.
  • Created parts of the advising system responsible for generating relevant questions based on the current and previous state and integrated it with knowledge graphs.
  • Developed and trained several transformer-based classification models (BERT, GPT) for a patient's diagnosis classification.
  • Oversaw the integration of computer vision models into the general pipeline as well as designing and training alternative encoders.
Technologies: SpaCy, TensorFlow, Machine Learning, GPT, Natural Language Processing (NLP), Artificial Intelligence (AI), Python, Data Science, Pandas, Computer Vision, Git, SQL, Natural Language Toolkit (NLTK), C++, PyTorch, Distributed Systems, Bash, OpenCV, Windows, Linux, Docker, Data Visualization, Healthcare, Quantitative Analysis, Matplotlib, TensorBoard, Rasa.ai, DeepPavlov, Gensim, Scikit-learn, GCC, Transformers, Deep Learning, Podman, BERT, Generative Pre-trained Transformers (GPT), Data, Big Data, Image Recognition, Data Engineering, Data Pipelines, ETL, NumPy, Neural Networks, Chatbots, Statistical Analysis, Data Analytics, Apache Kafka, RabbitMQ, Amazon Simple Queue Service (SQS), XGBoost, OpenAI Gym, Time Series, Keras, Predictive Modeling, Data Modeling, Image Processing, JSON, Convolutional Neural Networks (CNN), Large Language Models (LLMs), Hugging Face, Generative AI, PostgreSQL

Data Scientist, NLP

2018 - 2019
Alan, Inc.
  • Developed crucial parts of the voice-assistant SDK, including the dialog flow engine, domain-specific NER models, sentiment analysis, and web and mobile screen/states integration.
  • Created a full pipeline for automated NER training—from MTurk data gathering to production, in-training boosting, and a custom tool and templates for dialog creation.
  • Implemented several alternative intents matching models, optimizations, BERT, and a custom CRF-based classifier.
  • Designed and developed all of the small-footprint keyword spotting models used for voice activation and stoppage of customer applications (e.g., "OK Google"). It's integrated into existing applications.
  • Developed a Wikidata integration used to answer general questions and another QA model created as an option for certain cases.
  • Worked on a custom user-scripting language for customers to easily define their dialogs and adjustable ways to control the flow for a better experience.
  • Added production monitoring, gathered cases and statistics, and automated NER models re-training based on such.
Technologies: Amazon Web Services (AWS), Rasa.ai, Natural Language Toolkit (NLTK), SpaCy, PyTorch, TensorFlow, Machine Learning, Python, Git, Pandas, SciPy, Google Cloud Console, Data Pipelines, ETL, Data, Transformers, BERT, Generative Pre-trained Transformers (GPT), Gensim, Scikit-learn, TensorBoard, Artificial Intelligence (AI), Data Science, Bash, JavaScript, Windows, Linux, Docker, Natural Language Processing (NLP), GPT, Data Visualization, Matplotlib, DeepPavlov, GCC, Deep Learning, Data Engineering, NumPy, Google Cloud Platform (GCP), Neural Networks, Chatbots, Statistical Analysis, Data Analytics, Startups, Amazon Simple Queue Service (SQS), NoSQL, XGBoost, Keras, Data Modeling, JSON, Convolutional Neural Networks (CNN), AWS Lambda, Large Language Models (LLMs), Hugging Face, Generative AI, PostgreSQL, Architecture

Data Scientist, Computer Vision

2017 - 2018
SmaSS Technologies
  • Developed a face-recognition product capable of working in various surroundings, light conditions, and people moving and wearing some occlusive objects.
  • Created high-speed algorithms used to select the most promising frames from the stream; also optimized and adjusted 3D ConvNets for the same purposes.
  • Oversaw and was responsible for 3D face reconstruction (mostly SFS).
  • Developed a search-and-comparison routine for reconstructed models.
  • Fulfilled optimization/compression for small devices.
Technologies: OpenCV, TensorFlow, Object Detection, Machine Learning, Artificial Intelligence (AI), Python, Data Science, Pandas, Computer Vision, Git, C++, PyTorch, Bash, Windows, Linux, Docker, Data Visualization, Matplotlib, TensorBoard, Deep Learning, Data, Image Recognition, Data Engineering, NumPy, Neural Networks, Startups, Keras, Data Modeling, Image Processing, Convolutional Neural Networks (CNN)

Data Scientist

2014 - 2017
First Moscow Medical University, Novartis
  • Worked on a very collaborative medical project and was responsible for the overall architecture, design, and execution of R&D and the interactions with stakeholders and colleagues from diverse pharmaceutical companies and the university.
  • Developed ML models for predicting a patient's response to oncological “targeted drugs,” depending on genetic, clinical, and other pieces of information—classical ML (XGBoost and AdaBoost) and many experiments/research with DNNs architectures.
  • Implemented patient clusterization and searches for groups of patients related to the response based on genetic expression arrays, previous therapy, etc.
  • Developed models that evaluated a patient's clinical status during treatment.
  • Executed a search for possible interconnections between different physiological metrics, specific genes and the success of drug treatment, and disease progression with a more statistical-based approach (Gaussians mixture models, LDA, LSH, SVD).
  • Defined novel training metrics describing a successful treatment based on patient and drug data.
  • Completed the ML part in mostly standalone research on obstetrics—its correlations with genes, cellular pathways, nutrition, and other pieces of data.
  • Developed computer vision (classification and object detection) models for CT and x-ray scans in several medical taxonomies and applied CNN for microarray scans for cancer classification.
Technologies: TensorFlow, Octave, Pandas, Scikit-learn, Machine Learning, Artificial Intelligence (AI), Python, Data Science, Computer Vision, Amazon Web Services (AWS), Git, SQL, Natural Language Toolkit (NLTK), C++, PyTorch, Bash, JavaScript, OpenCV, MATLAB, Windows, Linux, Qt, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, SciPy, Data Visualization, Healthcare, Quantitative Analysis, Matplotlib, TensorBoard, Deep Learning, Java, R, Data, Image Recognition, Data Engineering, Caffe2, NumPy, Neural Networks, Statistical Analysis, Data Analytics, XGBoost, Time Series, Keras, Predictive Modeling, Business Models, Image Processing, JSON, Convolutional Neural Networks (CNN), Large Language Models (LLMs)

Research Scientist

2013 - 2014
Keldysh Institute of Applied Mathematics (Russian Academy of Sciences)
  • Developed and supported parts of a type of complex software used to calculate radiation-induced processes in layers of materials and various objects.
  • Developed intricate algorithms in close collaboration with mathematicians, statisticians, and physicists.
  • Completed the parallelization of computations and algorithms.
  • Created a GUI on the Qt framework for some existing applications and added new features and ways of interaction with users.
Technologies: MATLAB, Octave, Qt, C++, Python, Git, SQL, Bash, Windows, Linux, SciPy, Data Visualization, Matplotlib, Scikit-learn, Java, R, Data, Data Engineering, NumPy

Java Developer

2011 - 2012
Prime (previously Prime-Tass)
  • Developed a client-server application for trading.
  • Parallelized existing scripts. Developed general user interface in Swift.
  • Analyzed and developed algorithms used in the company's software.
  • Developed a program for traffic analysis in real-time.
Technologies: Java, Swift, Qt

ML Invoice Table Recognition System

http://mysky.com
I was responsible for building an invoice recognition system to replace the company's existing manual processing. Documents, ranging from a single receipt to several hundred invoice pages, would come from multiple providers and contain tables (or something close to a table) and header fields. The system recognizes a provider by logo/header information/persistent table features or an invoice-type specific computer vision model extract tables, which are then merged algorithmically, and data would be provided to other services through an API call; for some document types, additional NLP models are used for classification and NER tasks. Some parts of ETL were done with AWS to provide seamless document flow.

Voice Assistant SDK

http://alan.app
I've built a voice-assistant SDK, providing companies with the ability to create custom voice chatbots. It combined several NLP models empowering retrieval and generative functionality, a QA module, and an advanced ETL for NER models, starting from MTurk with in-training boosting and automated evaluation/deployment/retraining. Another interesting part was a pipeline with all integrations of a small-footprint audio model, allowing a client to activate/deactivate the application by a voice command of their choice (like "Ok Google"). I worked on this project with the CTO, who mainly worked on the back end, and I developed the ML, GCC, AWS integrations, and dialog framework.

Medical Diagnostic AI System

I developed a very flexible multi-agent medical AI advising system capable of providing diagnostics/predictions based on initial brief claims, available lab data, and a series of consecutive follow-up questions generated by an NLU module until a certain level of confidence is achieved. Additional CV modules were integrated and used for specific conditions. Other tasks include disease progression prediction and AI-assisted drugs selection.

3D Face Recognition System

With another ML engineer and PLD engineer, I built a 3D-face recognition, capable of working in various surroundings, light conditions, and dynamic occlusions. The product worked as intended in terms of precision, but speed requirements didn't make it possible to be run on devices.

• Used video streams from CCTV cameras with overlapping fields of view.
• Applied different OpenCV and our own algorithms, partly based on 3D ConvNets, singling out several promising frames from different cameras, then refined and augmented.
• Used NN models, fulfilling several complete 3D-face reconstructions (mostly 3DMM and SFS).
• Maintained statistical metrics later used to merge these 3D models matched against an existing database.

Visa Processing Times Software

A complex software product containing an automated, distributed, and scalable parser, processing ETLs, databases, Databricks integrations, back ends, predictors and flexible analytics, monitoring, EC2 services, daily and weekly reports, and visualizations.

Languages

Python, SQL, Bash, C++, Octave, Java, R, JavaScript, Swift

Libraries/APIs

TensorFlow, PyTorch, SpaCy, Natural Language Toolkit (NLTK), Pandas, SciPy, Matplotlib, OpenCV, Scikit-learn, NumPy, XGBoost, Keras, Redis Queue, PySpark

Tools

Rasa.ai, Git, TensorBoard, Amazon Simple Queue Service (SQS), Amazon SageMaker, MATLAB, Gensim, GCC, Google Cloud Console, RabbitMQ, OpenAI Gym

Paradigms

Data Science, ETL

Platforms

Windows, Linux, Docker, Amazon Web Services (AWS), Apache Kafka, Kubernetes, Google Cloud Platform (GCP), Databricks, AWS Lambda, Azure

Industry Expertise

Healthcare

Storage

Data Pipelines, NoSQL, JSON, PostgreSQL, Neo4j, JanusGraph, Redis

Other

Machine Learning, Data Visualization, Computer Vision, Artificial Intelligence (AI), Natural Language Processing (NLP), Deep Learning, Podman, BERT, Data, Image Recognition, Object Detection, Data Engineering, Neural Networks, Chatbots, Statistical Analysis, Data Analytics, Startups, Time Series, Predictive Modeling, Business Models, Data Modeling, Image Processing, Convolutional Neural Networks (CNN), GPT, Parsers, Large Language Models (LLMs), Hugging Face, Generative AI, Distributed Systems, Quantitative Analysis, Transformers, Generative Pre-trained Transformers (GPT), Big Data, Topic Modeling, Architecture, DeepPavlov, Caffe2, Algorithmic Trading, Graphs, TigerGraph, Statistics, Clustering, Sentiment Analysis, Predictive Analytics, Data Mining, Analytics

Frameworks

Qt

2012 - 2013

Completed a Non-degree Program in Bioinformatics

University of British Columbia - Vancouver, BC, Canada

2009 - 2012

Bachelor of Science Degree in Computer Science

Bauman Moscow State Technical University - Moscow, Russia

2003 - 2009

Professional Medical Diploma in Medicine

Russian State Medical University - Moscow, Russia

DECEMBER 2016 - PRESENT

Machine Learning

University of Washington

JUNE 2016 - PRESENT

Machine Learning

Stanford (Coursera)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring