Henrik Sergoyan
Verified Expert in Engineering
Data Scientist and Machine Learning Developer
Munich, Bavaria, Germany
Toptal member since November 5, 2021
Henrik is a data scientist with over eight years of experience specializing in natural language processing, focusing on GenAI and retrieval-augmented generation (RAG) solutions. He excels in building end-to-end data solutions, including data collection, processing, analytics, reporting, and deployment. Proficient in SQL and NoSQL databases, Henrik combines strong project management skills with innovative problem-solving to deliver exceptional results.
Portfolio
Experience
- Data Extraction - 6 years
- Python - 5 years
- Machine Learning - 5 years
- Artificial Intelligence (AI) - 5 years
- ETL - 4 years
- Large Language Models (LLMs) - 3 years
- Generative Pre-trained Transformers (GPT) - 3 years
- Natural Language Processing (NLP) - 3 years
Availability
Preferred Environment
Windows, MacOS, Slack, PyCharm, Jupyter Notebook, Visual Studio Code (VS Code), Google Cloud, Large Language Models (LLMs)
The most amazing...
...thing I've developed is an end-to-end data science pipeline for two different platforms used by the Armenian government and United Nations Development Office.
Work Experience
Lead Data Analyst
Zoetis
- Fine-tuned LLM models to match competitor products with Zoetis products. Implemented an end-to-end RAG solution, improving pricing strategy by 20% and comparison accuracy by 35%, ensuring competitive pricing alignment.
- Built an end-to-end Gross-to-Net (GtN) waterfall dashboard for price management. The dashboard tracks GtN impacts using DDP data to map the List to Net classifications for both Sales In and Sales Out, supporting profitable growth.
- Designed a centralized data pipeline with SQL and Azure Data Lake, boosting pricing analysis efficiency. Developed rebate allocation models, improving accuracy by 60% and compliance monitoring, resulting in a 25% increase in actionable insights.
- Enhanced sales forecasting accuracy by 70% for main products. Developed a Streamlit application to visualize results, providing stakeholders with clear insights and enabling better decision-making through more accurate sales predictions.
- Developed a web scraping tool with Python and Beautiful Soup to monitor competitor pricing data from PDFs. Automated 85% of the data collection process, significantly reducing manual effort and improving data accuracy.
LLM Architect
Insummary Technologies Inc.
- Engineered a fine-tuning pipeline using GPT-4o/4o-mini, enhancing extraction precision and model adaptability.
- Integrated advanced RAG techniques into the pipeline, significantly boosting extraction accuracy and precision.
- Built a data labeling pipeline, automating annotation workflows to enhance model training accuracy.
Senior Data Science Consultant
Toptal Client
- Developed compound Aggregation pipelines in MongoDB to process a large number of nested documents in a given collection.
- Created a system that identifies bugs in the data processing stage where structured information was derived from PDF reportings of charity organizations. With the help of my system, we could detect and fix all inconsistencies in the database.
- Created a user-friendly Streamlit dashboard (MVP) that serves as a user's charity navigator. I've developed interactive visualization (Sankey diagram) for each charity that shows the flow of money (from revenue to expenses) across the year.
Machine Learning Expert
Station Casinos LLC - Main
- Developed a system that identifies customers who are going to leave the building (in the 15-minute interval), taking into account 42 variables that describe the past and current behavior of the client.
- Developed complex SQL queries that pulled real-time data from SQL databases.
- Deployed the chance of leaving the model in production using RapidMiner.
Senior Data Scientist
Fozzy Group
- Created and implemented sales forecasting models for promotional products.
- Deployed a promotional forecasting model and implemented a monitoring system for that model.
- Assisted in improving the recommender system of the Ukrainian biggest grocery stores, including feature engineering and modeling.
- Created a Power BI dashboard for sales forecasting models to analyze errors.
- Led the model deployment by communicating with relevant stakeholders to identify business needs, create system architecture, and assist the back-end team in deploying our model in a most optimized way.
Senior Data Science Consultant
Armenia National SDG Innovation Lab | UNDP Office
- Developed a first-ever AI-powered real-time tool travelinsights.ai for data analytics that uses artificial intelligence to collect, analyze, and visualize tourist reviews about Armenia from Tripadvisor, Facebook, and Booking.com.
- Created a real-time platform Edu2Work to scrape over 60,000 online job postings, extract and standardize relevant information from the unstructured job descriptions, and present the analysis in a dashboard.
- Developed a data science portion of a monitoring platform sdglab.am/en/projects to monitor Armenian Sustainable Development Goals (SDGs). This is a user-friendly, AI-powered, open-access interactive online tool for data analytics.
- Built a citizen request classification model to increase the Armenian government's operational efficiency, assigning requests made by Armenian citizens to the corresponding ministries and departments.
- Managed a data science team. Participated in project planning from an initial stage, developed a work breakdown structure (WBS) for each task, and managed communication between the data science team and the lab executives.
Teaching Associate
American University of Armenia
- Supervised a team of senior students for their Capstone project focusing on real estate market analytics in Armenia. Developed models for data extraction, interior design classification, distance calculation, and most optimal price estimation.
- Conducted weekly problem-solving sessions with 20 BSc and MSc students for the Statistics course. Explained solutions for a unique set of problems based on discussed topics.
- Assisted in creating the syllabus and agenda for the Natural Language Processing and Statistics courses.
- Supervised students for their Capstone projects, some related to the real estate market and news analytics.
Data Scientist
Ameriabank
- Created and deployed an AI-based virtual assistant for the bank's employees. Reduced the operational efficiency of the bank's internal communications by 120%.
- Developed forecasting algorithms for financial market indicators, commodities, prices, and sales.
- Performed customer segmentation analysis based on their transactions and activity.
Data Scientist | Statistician
ClinChoice
- Recognized inconsistencies in datasets while preparing SAS programs before a database lock.
- Developed the SAS programs to produce tables, listings, and graphs according to the specifications indicated in the statistical analysis plan (SAP).
- Created, validated, and documented the SAS programs by good clinical programming practices and according to applicable guidelines and the client's standard operating procedures.
Experience
AI-driven Project Document Analysis and Clustering
https://document-analyzer.streamlit.app/Labor Market Intelligence Platform | Edu2Work
https://edu2work.am/The development of Edu2Work involved the design and implementation of an end-to-end data science pipeline, encompassing efficient and flexible data ingestion, information extraction and standardization, and data visualization. Core NLP tasks performed during the project included job title standardization according to European standards, industry classification, skill extraction and classification (soft/hard), and degree extraction (BSc, MSc, PhD, None). These tasks were instrumental in enabling the platform to provide high-quality labor market data in a user-friendly and accessible format.
Promotional Forecasting
Tourism Analytics Platform
https://www.travelinsights.ai/Education
Master's Degree in Mathematics in Data Science
Technical University of Munich - Munich, Germany
Master's Degree in Statistics
Yerevan State University - Yerevan, Armenia
Bachelor's Degree in Computer Science
American University of Armenia - Yerevan, Armenia
Skills
Libraries/APIs
CatBoost, XGBoost, Pandas, NumPy, TensorFlow, Keras, Scikit-learn, PyTorch, Social Media APIs, PySpark
Tools
Slack, PyCharm, Named-entity Recognition (NER), Visual Studio, Tableau, BigQuery, GIS, Microsoft Power BI, Graylog, RabbitMQ, Supervisor, AutoML, ChatGPT
Languages
Python, R, SQL, SAS, Rust, JavaScript
Frameworks
LightGBM, Selenium, RStudio Shiny, Flask, Streamlit, Spark
Paradigms
ETL, Design Thinking, Agile Project Management, REST, Automation
Platforms
MacOS, Jupyter Notebook, RStudio, Windows, Linux, Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP), Visual Studio Code (VS Code), RapidMiner, Databricks, Azure SQL Data Warehouse
Storage
Database Management, MongoDB, MySQL, Google Cloud, SAS SQL, NoSQL, Azure SQL, Azure SQL Databases, Databases
Other
Data Mining, Data Scraping, Natural Language Processing (NLP), Word2Vec, FbProphet, Ensemble Methods, Machine Learning, Artificial Intelligence (AI), Data Science, Deep Learning, Statistics, Dashboards, Gradient Boosted Trees, Reporting, Data Analytics, Fantasy Sports, Data Analysis, Data Reporting, Web Scraping, Data Collection, Time Series, Statistical Analysis, Model Development, Mathematics, Data Visualization, Task Analysis, Interviewing, Data, Predictive Analytics, Sports, Football, Analytics, Statistical Programming, Statistical Modeling, Sentiment Analysis, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, Data Extraction, Document Parsing, Pattern Recognition, Llama, Entity Extraction, Unsupervised Learning, Data Engineering, Computational Statistics, Machine Learning Operations (MLOps), Dash, Time Series Analysis, BERT, Transformers, Zero-shot Learning (ZSL), Few-shot Learning, Project Design, Data Science Product Manager, Bayesian Statistics, Predictive Learning, Deep Neural Networks (DNNs), University Teaching, Real Estate, Technical Hiring, Code Review, Source Code Review, Team Management, API Integration, Office 365, Retrieval-augmented Generation (RAG), Large Language Models (LLMs), Llama 3, Gemini, AI Agents, Google Cloud ML, Recommendation Systems, CATS Forecasting, Software Engineering, Agile Data Science, Graphs, Clustering, GRAPH, AppFolio, Linear Algebra, Matrix Algebra, Natural Language Generation (NLG), Probabilistic Information Retrieval, Information Retrieval, Raft Consensus Algorithm, Raft, Prompt Engineering, OpenAI GPT-3 API, OpenAI, Flan-T5, Azure Databricks, PDF Scraping, Sales Forecasting, Trend Forecasting, Business Analysis, Business Analytics, Deployment, Data Processing, Data Management, Revenue Modeling, Revenue Analysis, Generative Artificial Intelligence (GenAI), Large Language Model Operations (LLMOps), Visualization, FastAPI, Azure AI Document Intelligence, Light LLMs, Fine-tuning, Pinecone, Embedding Models
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring