Viktor Petukhov, Developer in Tbilisi, Georgia
Viktor is available for hire
Hire Viktor

Viktor Petukhov

Verified Expert  in Engineering

Data Scientist and AI Developer

Location
Tbilisi, Georgia
Toptal Member Since
September 22, 2022

Viktor holds a PhD in biostatistics, during which he developed eight open-source packages used by thousands of researchers worldwide. He also worked as a data science team lead and an independent AI consultant, translating business needs into technical terms. Combining scientific and enterprise experience, Viktor can support companies on the full spectrum, from formulating a business problem to building a production-ready AI solution.

Portfolio

Tentakel
Management, Strategy, Fundraising, Artificial Intelligence (AI)...
Self-employed
Strategy, Computer Vision, Machine Learning, Data Scraping, Data Analysis...
CleverBots
Python, Management, Artificial Intelligence (AI), Linux, Jupyter, Git...

Experience

Availability

Part-time

Preferred Environment

Linux, Jupyter, RStudio, Git, Visual Studio Code (VS Code), PyCharm, CLion

The most amazing...

...project I've developed is a pipeline for unsupervised segmentation of imaging-based transcriptomics data, actively used among the top five labs in the field.

Work Experience

CTO

2023 - PRESENT
Tentakel
  • Developed an MVP of a product for a service that converts natural language queries to SQL and data visualization.
  • Developed a working product for a retrieval-augmented generation (RAG) pipeline over gigabytes of documents.
  • Hired and managed a team of three software developers.
Technologies: Management, Strategy, Fundraising, Artificial Intelligence (AI), Natural Language Processing (NLP)

Data Science Consultant

2021 - PRESENT
Self-employed
  • Developed a novel algorithm for interpretable detection of truck failures for Viaduct.ai.
  • Created a strategy for starting a computer vision department in a drone-producing startup. It helped to attract $300,000 in funding, and the department was successfully started.
  • Analyzed thousands of job ads within the alternative protein industry, which significantly improved the HR agency's business strategy.
Technologies: Strategy, Computer Vision, Machine Learning, Data Scraping, Data Analysis, Business Cases, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Linux, Jupyter, Git, Data Visualization, Artificial Intelligence (AI), Python, Computational Biology, Data Science, Natural Language Toolkit (NLTK), SQL, REST APIs, PyTorch, Deep Learning, Language Models, Data Extraction, Data Manipulation, Data Analytics, Large Data Sets

Data Science Team Lead

2018 - 2019
CleverBots
  • Developed a match-making algorithm for networking and event recommendation deployed on a week-long educational event hosting over 1,000 participants.
  • Built a market share prediction algorithm that achieved more than 95% accuracy.
  • Managed developers working on a project for churn-rate prediction for an online educational platform.
Technologies: Python, Management, Artificial Intelligence (AI), Linux, Jupyter, Git, Bayesian Statistics, Data Visualization, Data Analysis, Data Science, Natural Language Toolkit (NLTK), PyTorch, Deep Learning, Data Extraction, Data Manipulation, Data Analytics

Algorithm Developer

2015 - 2017
EPAM Systems
  • Implemented an algorithm for DNA structural variations search using optical map sequencing.
  • Re-implemented an algorithm for trend analysis of drug effects, porting it to a new experiment management system.
  • Expanded a dose-response curve-fitting algorithm, adding additional curve parameters.
Technologies: Perl, Python, C++, Java, SAS, R, Algorithms, Linear Algebra, Statistics, Linux, Jupyter, RStudio, Git, Bayesian Statistics, Computational Biology, Data Analysis, Bioinformatics, Data Science, SQL, REST APIs, Data Manipulation, Data Analytics

Bayesian Segmentation of Imaging-based Spatial Transcriptomics Data

https://github.com/kharchenkolab/Baysor
A tool for unsupervised segmentation of imaging-based spatial transcriptomics data. It was developed as a response to new data-producing protocols with no good way to analyze the produced data. Baysor is the only existing tool that can make sense of this spatial data without additional biological experiments. It is used in many labs worldwide, including the ones that pioneered the field of imaging-based transcriptomics.

dropEst: Pipeline for Low-level Processing of scRNA-seq Data

https://github.com/kharchenkolab/dropEst
A high-performance tool for low-level processing of single-cell RNA-sequencing data. This pipeline consists of two parts:

• The first part performs data extraction and conversion from raw sequencing data into a format suitable for data analysis (gene expression matrices).
• The second part corrects sequencing errors and filters the noise in data using string algorithms, Bayesian statistics, and machine learning techniques.

ggrastr: An R Package for Improved Data Visualization

https://cran.r-project.org/web/packages/ggrastr/index.html
ggrastr is a ggplot2 extension, which allows the rasterization of individual layers of the plot for scientific visualization. With the first version released in 2017, the package is still actively used by the scientific community, with around 5,000 downloads per month. It was initially developed and advertised by me alone, though after gaining popularity, the package was largely reworked by other community members.
2018 - 2022

PhD in Biostatistics

University of Copenhagen - Copenhagen, Denmark

2015 - 2017

Master's Degree in Informatics and Applied Mathematics

St. Petersburg Polytechnic University - St. Petersburg, Russia

2011 - 2014

Bachelor's Degree in Informatics and Applied Mathematics

South Ural State University - Chelyabinsk, Russia

Libraries/APIs

Natural Language Toolkit (NLTK), REST APIs, PyTorch

Tools

Jupyter, Git, PyCharm, CLion

Languages

R, Python, C++, Julia, C#, Java, Perl, SAS, SQL

Platforms

Linux, RStudio, Visual Studio Code (VS Code), Software Design Patterns

Paradigms

Data Science, Management

Industry Expertise

Bioinformatics

Frameworks

RStudio Shiny

Other

Machine Learning, Computational Biology, Data Analysis, Research, Bayesian Statistics, Statistical Modeling, Network Analysis, Graph Theory, Data Visualization, Artificial Intelligence (AI), Data Manipulation, Data Analytics, Large Data Sets, Statistics, Linear Algebra, Linear Optimization, Life Science, Probabilistic Graphical Models, Probability Theory, Data Scraping, Natural Language Processing (NLP), Algorithms, Open Source, Genomics, Data Extraction, Generative Pre-trained Transformers (GPT), Molecular Biology, Strategy, Computer Vision, Business Cases, Deep Learning, Language Models, Biology, Fundraising

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring