Henrik Sergoyan, Developer in Munich, Bavaria, Germany
Henrik is available for hire
Hire Henrik

Henrik Sergoyan

Verified Expert  in Engineering

Bio

Henrik is a data scientist with over eight years of experience specializing in natural language processing, focusing on GenAI and retrieval-augmented generation (RAG) solutions. He excels in building end-to-end data solutions, including data collection, processing, analytics, reporting, and deployment. Proficient in SQL and NoSQL databases, Henrik combines strong project management skills with innovative problem-solving to deliver exceptional results.

Portfolio

Zoetis
Data Analysis, SQL, Statistics, Agile Data Science, API Integration, Databricks...
Insummary Technologies Inc.
Data Science, Python, Machine Learning, FastAPI, Azure AI Document Intelligence...
Toptal Client
Python, Data Science, MongoDB, Jupyter Notebook, ETL, Data Visualization...

Experience

  • Data Extraction - 6 years
  • Python - 5 years
  • Machine Learning - 5 years
  • Artificial Intelligence (AI) - 5 years
  • ETL - 4 years
  • Large Language Models (LLMs) - 3 years
  • Generative Pre-trained Transformers (GPT) - 3 years
  • Natural Language Processing (NLP) - 3 years

Availability

Part-time

Preferred Environment

Windows, MacOS, Slack, PyCharm, Jupyter Notebook, Visual Studio Code (VS Code), Google Cloud, Large Language Models (LLMs)

The most amazing...

...thing I've developed is an end-to-end data science pipeline for two different platforms used by the Armenian government and United Nations Development Office.

Work Experience

Lead Data Analyst

2022 - PRESENT
Zoetis
  • Fine-tuned LLM models to match competitor products with Zoetis products. Implemented an end-to-end RAG solution, improving pricing strategy by 20% and comparison accuracy by 35%, ensuring competitive pricing alignment.
  • Built an end-to-end Gross-to-Net (GtN) waterfall dashboard for price management. The dashboard tracks GtN impacts using DDP data to map the List to Net classifications for both Sales In and Sales Out, supporting profitable growth.
  • Designed a centralized data pipeline with SQL and Azure Data Lake, boosting pricing analysis efficiency. Developed rebate allocation models, improving accuracy by 60% and compliance monitoring, resulting in a 25% increase in actionable insights.
  • Enhanced sales forecasting accuracy by 70% for main products. Developed a Streamlit application to visualize results, providing stakeholders with clear insights and enabling better decision-making through more accurate sales predictions.
  • Developed a web scraping tool with Python and Beautiful Soup to monitor competitor pricing data from PDFs. Automated 85% of the data collection process, significantly reducing manual effort and improving data accuracy.
Technologies: Data Analysis, SQL, Statistics, Agile Data Science, API Integration, Databricks, Azure, Azure SQL Databases, Azure Databricks, Python, Spark, Azure SQL, Azure SQL Data Warehouse, Web Scraping, PDF Scraping, Sales Forecasting, Trend Forecasting, Business Analysis, Business Analytics, Microsoft Power BI, Deployment, Data Processing, Data Management, Databases, Natural Language Processing (NLP), Streamlit, Revenue Modeling, Revenue Analysis, Retrieval-augmented Generation (RAG), Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Llama 3, Data Extraction, Document Parsing, Pattern Recognition, AI Agents, Entity Extraction

LLM Architect

2024 - 2024
Insummary Technologies Inc.
  • Engineered a fine-tuning pipeline using GPT-4o/4o-mini, enhancing extraction precision and model adaptability.
  • Integrated advanced RAG techniques into the pipeline, significantly boosting extraction accuracy and precision.
  • Built a data labeling pipeline, automating annotation workflows to enhance model training accuracy.
Technologies: Data Science, Python, Machine Learning, FastAPI, Azure AI Document Intelligence, Retrieval-augmented Generation (RAG), Light LLMs, Large Language Models (LLMs), Fine-tuning, OpenAI, OpenAI GPT-4 API, Pinecone, Embedding Models

Senior Data Science Consultant

2022 - 2022
Toptal Client
  • Developed compound Aggregation pipelines in MongoDB to process a large number of nested documents in a given collection.
  • Created a system that identifies bugs in the data processing stage where structured information was derived from PDF reportings of charity organizations. With the help of my system, we could detect and fix all inconsistencies in the database.
  • Created a user-friendly Streamlit dashboard (MVP) that serves as a user's charity navigator. I've developed interactive visualization (Sankey diagram) for each charity that shows the flow of money (from revenue to expenses) across the year.
Technologies: Python, Data Science, MongoDB, Jupyter Notebook, ETL, Data Visualization, Streamlit, Data, Data Analysis, API Integration, Analytics, Database Management, Statistical Programming, Statistical Modeling, Sentiment Analysis, Document Parsing, Pattern Recognition, AI Agents, Entity Extraction

Machine Learning Expert

2021 - 2022
Station Casinos LLC - Main
  • Developed a system that identifies customers who are going to leave the building (in the 15-minute interval), taking into account 42 variables that describe the past and current behavior of the client.
  • Developed complex SQL queries that pulled real-time data from SQL databases.
  • Deployed the chance of leaving the model in production using RapidMiner.
Technologies: Machine Learning, SQL, Python, Linux, RapidMiner, Windows, Gradient Boosted Trees, Deep Learning, Deep Neural Networks (DNNs), Predictive Learning, Data, Data Analysis, API Integration, Data Science, Analytics, NoSQL, Database Management, Statistical Programming, Statistical Modeling, Sentiment Analysis, Document Parsing, Pattern Recognition, AI Agents

Senior Data Scientist

2021 - 2021
Fozzy Group
  • Created and implemented sales forecasting models for promotional products.
  • Deployed a promotional forecasting model and implemented a monitoring system for that model.
  • Assisted in improving the recommender system of the Ukrainian biggest grocery stores, including feature engineering and modeling.
  • Created a Power BI dashboard for sales forecasting models to analyze errors.
  • Led the model deployment by communicating with relevant stakeholders to identify business needs, create system architecture, and assist the back-end team in deploying our model in a most optimized way.
Technologies: Python, PyCharm, MySQL, Time Series Analysis, Machine Learning Operations (MLOps), Recommendation Systems, Microsoft Power BI, LightGBM, CatBoost, XGBoost, Graylog, RabbitMQ, Flask, REST, Windows, Slack, Jupyter Notebook, Data Mining, Data Engineering, SQL, ETL, Machine Learning, Artificial Intelligence (AI), Data Science Product Manager, Azure SQL, Ensemble Methods, BERT, TensorFlow, Data Science, Deep Learning, Keras, Statistics, PySpark, Amazon Web Services (AWS), Dashboards, RStudio Shiny, Tableau, Predictive Learning, Gradient Boosted Trees, Reporting, Data Analytics, Data Analysis, Data Reporting, Web Scraping, Time Series, BigQuery, Statistical Analysis, Model Development, Pandas, PyTorch, Software Engineering, Mathematics, Data Visualization, Source Code Review, Task Analysis, Interviewing, Data, API Integration, Predictive Analytics, Analytics, NoSQL, Database Management, Statistical Programming, Statistical Modeling, Sentiment Analysis, NumPy, Pattern Recognition, AI Agents, Entity Extraction

Senior Data Science Consultant

2019 - 2021
Armenia National SDG Innovation Lab | UNDP Office
  • Developed a first-ever AI-powered real-time tool travelinsights.ai for data analytics that uses artificial intelligence to collect, analyze, and visualize tourist reviews about Armenia from Tripadvisor, Facebook, and Booking.com.
  • Created a real-time platform Edu2Work to scrape over 60,000 online job postings, extract and standardize relevant information from the unstructured job descriptions, and present the analysis in a dashboard.
  • Developed a data science portion of a monitoring platform sdglab.am/en/projects to monitor Armenian Sustainable Development Goals (SDGs). This is a user-friendly, AI-powered, open-access interactive online tool for data analytics.
  • Built a citizen request classification model to increase the Armenian government's operational efficiency, assigning requests made by Armenian citizens to the corresponding ministries and departments.
  • Managed a data science team. Participated in project planning from an initial stage, developed a work breakdown structure (WBS) for each task, and managed communication between the data science team and the lab executives.
Technologies: Python, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), TensorFlow, Google Cloud, BERT, Transformers, Zero-shot Learning (ZSL), Few-shot Learning, Word2Vec, Clustering, GRAPH, FbProphet, CATS Forecasting, Ensemble Methods, Data Scraping, ETL, MongoDB, Selenium, Social Media APIs, Project Design, Design Thinking, Agile Project Management, Windows, MacOS, Slack, PyCharm, Jupyter Notebook, Data Mining, Unsupervised Learning, Data Engineering, Machine Learning Operations (MLOps), Machine Learning, Artificial Intelligence (AI), Data Science Product Manager, Data Science, Named-entity Recognition (NER), Deep Learning, Keras, Scikit-learn, Dashboards, RStudio Shiny, Linux, Predictive Learning, Gradient Boosted Trees, Deep Neural Networks (DNNs), Reporting, Data Analytics, Google Cloud Platform (GCP), Data Analysis, Data Reporting, Web Scraping, Time Series, Statistical Analysis, Model Development, Pandas, PyTorch, Software Engineering, Mathematics, Data Visualization, Technical Hiring, Code Review, Source Code Review, Task Analysis, Interviewing, Team Management, Data, API Integration, Predictive Analytics, Office 365, Analytics, NoSQL, Database Management, Statistical Programming, Statistical Modeling, Sentiment Analysis, NumPy, AI Agents, Entity Extraction

Teaching Associate

2019 - 2020
American University of Armenia
  • Supervised a team of senior students for their Capstone project focusing on real estate market analytics in Armenia. Developed models for data extraction, interior design classification, distance calculation, and most optimal price estimation.
  • Conducted weekly problem-solving sessions with 20 BSc and MSc students for the Statistics course. Explained solutions for a unique set of problems based on discussed topics.
  • Assisted in creating the syllabus and agenda for the Natural Language Processing and Statistics courses.
  • Supervised students for their Capstone projects, some related to the real estate market and news analytics.
Technologies: Statistics, Bayesian Statistics, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), University Teaching, Supervisor, Real Estate, Web Scraping, Data Collection, BigQuery, Statistical Analysis, PyTorch, Mathematics, Technical Hiring, Code Review, Task Analysis, Interviewing, Data, GIS, RStudio, Predictive Analytics, Office 365, Sports, Data Science, Analytics, NoSQL, Database Management, Statistical Programming, Sentiment Analysis, NumPy

Data Scientist

2018 - 2019
Ameriabank
  • Created and deployed an AI-based virtual assistant for the bank's employees. Reduced the operational efficiency of the bank's internal communications by 120%.
  • Developed forecasting algorithms for financial market indicators, commodities, prices, and sales.
  • Performed customer segmentation analysis based on their transactions and activity.
Technologies: Python, SQL, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Windows, Slack, PyCharm, Jupyter Notebook, Data Mining, Data Scraping, Unsupervised Learning, Data Engineering, ETL, Machine Learning, Artificial Intelligence (AI), Ensemble Methods, Zero-shot Learning (ZSL), BERT, TensorFlow, Google Cloud ML, Data Science, Named-entity Recognition (NER), Statistics, Bayesian Statistics, Scikit-learn, Dashboards, RStudio Shiny, Linux, Predictive Learning, Gradient Boosted Trees, Reporting, Data Analytics, Google Cloud Platform (GCP), Sports, Data Analysis, Data Reporting, Web Scraping, Data Collection, Time Series, Statistical Analysis, Model Development, Pandas, Mathematics, Data Visualization, Code Review, Source Code Review, Task Analysis, Team Management, RStudio, Predictive Analytics, Office 365, Analytics, NoSQL, Database Management, Statistical Programming, Statistical Modeling, Sentiment Analysis, NumPy

Data Scientist | Statistician

2017 - 2018
ClinChoice
  • Recognized inconsistencies in datasets while preparing SAS programs before a database lock.
  • Developed the SAS programs to produce tables, listings, and graphs according to the specifications indicated in the statistical analysis plan (SAP).
  • Created, validated, and documented the SAS programs by good clinical programming practices and according to applicable guidelines and the client's standard operating procedures.
Technologies: SAS, SAS SQL, Windows, Slack, Data Mining, ETL, Ensemble Methods, BERT, Bayesian Statistics, R, Predictive Learning, Reporting, Data Analytics, Data Analysis, Data Reporting, Web Scraping, Data Collection, Statistical Analysis, Pandas, RStudio, Predictive Analytics, Office 365, NoSQL, Database Management, Statistical Programming, Statistical Modeling, Sentiment Analysis, NumPy

AI-driven Project Document Analysis and Clustering

https://document-analyzer.streamlit.app/
Collaborated with the United Nations Industrial Development Organization (UNIDO) on an AI-driven project to analyze and cluster extensive PDF files of project documents. Utilizing advanced NLP models like GPT-4 and Flan-T5, relevant information was extracted, and clusters based on shared missions and themes were created. The project involved data scraping with Selenium and Beautiful Soup, data management with MongoDB, and interactive report creation using Streamlit. The final deliverables included an AI model tailored to the MoIP methodology, analysis of 180+ project documents, identification of innovative areas, and deployment of a Streamlit dashboard and Hugging Face API for seamless integration.

Labor Market Intelligence Platform | Edu2Work

https://edu2work.am/
The Edu2Work platform was developed in response to the dynamic nature of the labor market and the ongoing mismatch between the demand and supply of talent in Armenia. The platform employs cutting-edge natural language processing (NLP) models to gather and analyze thousands of online job postings from a range of commercial websites. By doing so, it provides comprehensive, up-to-date data on the Armenian labor market, empowering individuals to make informed career decisions.
The development of Edu2Work involved the design and implementation of an end-to-end data science pipeline, encompassing efficient and flexible data ingestion, information extraction and standardization, and data visualization. Core NLP tasks performed during the project included job title standardization according to European standards, industry classification, skill extraction and classification (soft/hard), and degree extraction (BSc, MSc, PhD, None). These tasks were instrumental in enabling the platform to provide high-quality labor market data in a user-friendly and accessible format.

Promotional Forecasting

In this project, I've developed an end-to-end pipeline for forecasting the sales model of promotional products in the largest retail stores in Ukraine. The model considers over 30 features to accurately predict the sales of products planned to be in a promotion. After being deployed internally, the model has increased the operational efficiency of the commerce team deciding on the type and amount of promotion, and the logistics team, allocating sufficient resources in each branch.

Tourism Analytics Platform

https://www.travelinsights.ai/
I developed an AI-powered real-time data analytics tool for the tourism sector in Armenia. The online tool uses travel storytelling and artificial intelligence to collect, analyze, and visualize tourist reviews about Armenia from Tripadvisor, Facebook, and Booking.com. Through real-time analysis and visualization of the tourist reviews, the tool reveals actual travel preferences and on-the-ground issues in Armenia. With one scroll, policymakers, businesses, or tourists can explore insights from all over the world about different regions and locations of Armenia.
2020 - 2022

Master's Degree in Mathematics in Data Science

Technical University of Munich - Munich, Germany

2019 - 2021

Master's Degree in Statistics

Yerevan State University - Yerevan, Armenia

2015 - 2019

Bachelor's Degree in Computer Science

American University of Armenia - Yerevan, Armenia

Libraries/APIs

CatBoost, XGBoost, Pandas, NumPy, TensorFlow, Keras, Scikit-learn, PyTorch, Social Media APIs, PySpark

Tools

Slack, PyCharm, Named-entity Recognition (NER), Visual Studio, Tableau, BigQuery, GIS, Microsoft Power BI, Graylog, RabbitMQ, Supervisor, AutoML, ChatGPT

Languages

Python, R, SQL, SAS, Rust, JavaScript

Frameworks

LightGBM, Selenium, RStudio Shiny, Flask, Streamlit, Spark

Paradigms

ETL, Design Thinking, Agile Project Management, REST, Automation

Platforms

MacOS, Jupyter Notebook, RStudio, Windows, Linux, Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP), Visual Studio Code (VS Code), RapidMiner, Databricks, Azure SQL Data Warehouse

Storage

Database Management, MongoDB, MySQL, Google Cloud, SAS SQL, NoSQL, Azure SQL, Azure SQL Databases, Databases

Other

Data Mining, Data Scraping, Natural Language Processing (NLP), Word2Vec, FbProphet, Ensemble Methods, Machine Learning, Artificial Intelligence (AI), Data Science, Deep Learning, Statistics, Dashboards, Gradient Boosted Trees, Reporting, Data Analytics, Fantasy Sports, Data Analysis, Data Reporting, Web Scraping, Data Collection, Time Series, Statistical Analysis, Model Development, Mathematics, Data Visualization, Task Analysis, Interviewing, Data, Predictive Analytics, Sports, Football, Analytics, Statistical Programming, Statistical Modeling, Sentiment Analysis, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, Data Extraction, Document Parsing, Pattern Recognition, Llama, Entity Extraction, Unsupervised Learning, Data Engineering, Computational Statistics, Machine Learning Operations (MLOps), Dash, Time Series Analysis, BERT, Transformers, Zero-shot Learning (ZSL), Few-shot Learning, Project Design, Data Science Product Manager, Bayesian Statistics, Predictive Learning, Deep Neural Networks (DNNs), University Teaching, Real Estate, Technical Hiring, Code Review, Source Code Review, Team Management, API Integration, Office 365, Retrieval-augmented Generation (RAG), Large Language Models (LLMs), Llama 3, Gemini, AI Agents, Google Cloud ML, Recommendation Systems, CATS Forecasting, Software Engineering, Agile Data Science, Graphs, Clustering, GRAPH, AppFolio, Linear Algebra, Matrix Algebra, Natural Language Generation (NLG), Probabilistic Information Retrieval, Information Retrieval, Raft Consensus Algorithm, Raft, Prompt Engineering, OpenAI GPT-3 API, OpenAI, Flan-T5, Azure Databricks, PDF Scraping, Sales Forecasting, Trend Forecasting, Business Analysis, Business Analytics, Deployment, Data Processing, Data Management, Revenue Modeling, Revenue Analysis, Generative Artificial Intelligence (GenAI), Large Language Model Operations (LLMOps), Visualization, FastAPI, Azure AI Document Intelligence, Light LLMs, Fine-tuning, Pinecone, Embedding Models

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring