Vahan Martirosyan, Developer in Abu Dhabi, United Arab Emirates
Vahan is available for hire
Hire Vahan

Vahan Martirosyan

Verified Expert  in Engineering

Bio

Vahan is a data scientist with years of experience building end-to-end ETL pipelines. He is adept at leveraging cutting-edge tools in NLP, time series analysis, computer vision, geospatial data analysis, network analysis, and tabular data analysis to meet project needs. Vahan employs a holistic approach to data science and AI consulting and enjoys deep-diving into the business context underlying his data science projects.

Portfolio

United Nations Industrial Development Organization
Artificial Intelligence (AI), Natural Language Processing (NLP)...
Kiwi Data
Data Science, Artificial Intelligence (AI), Machine Learning, Python...
Intrinsica, Inc.
Artificial Intelligence (AI), Data Science, Knowledge Graphs, Ontologies...

Experience

  • Natural Language Processing (NLP) - 5 years
  • Computer Vision - 5 years
  • Time Series Analysis - 5 years
  • Consulting - 5 years
  • Generative Pre-trained Transformers (GPT) - 5 years
  • ETL - 5 years
  • Machine Learning - 5 years
  • Geospatial Data - 4 years

Availability

Part-time

Preferred Environment

Ubuntu, Visual Studio Code (VS Code), Jupyter, MongoDB, Python, ChatGPT, Stable Diffusion, Real Estate, OpenAI GPT-4 API

The most amazing...

...project I've developed uses various data sources and modeling modalities, including NLP, CV, and networks, to deliver social, political, and economic insights.

Work Experience

Senior NLP Engineer

2024 - 2025
United Nations Industrial Development Organization
  • Developed a CV-based contextual document parsing and chunking pipeline.
  • Built robust pipeline for custom information extraction, standardization, and clustering.
  • Developed and deployed user friendly dashboard to visualize complex informational patterns using, among other things, networks visualizations, dynamic flowcharts, etc.
Technologies: Artificial Intelligence (AI), Natural Language Processing (NLP), Large Language Models (LLMs), Google Cloud, Front-end, Data Visualization, Document Parsing, Computer Vision, Retrieval-augmented Generation (RAG)

Data Scientist

2023 - 2025
Kiwi Data
  • Used cutting-edge LLMs and CV frameworks to build intelligent PDF parsing and chunking pipelines.
  • Created datasets and trained information extraction models that extract structured insights from complex legal documents.
  • Built question answering, semantic search, and other complex frameworks for inference on legal documents and commercial agreements.
  • Consulted team with respect to NLP and data science-related tasks with long-term impact on business decision-making.
Technologies: Data Science, Artificial Intelligence (AI), Machine Learning, Python, Data Analysis, Statistics, SQL, Azure

Senior Knowledge Graph Advisor

2024 - 2024
Intrinsica, Inc.
  • Developed modeling framework for information extraction from public financial listings.
  • Created complex visualizations, including entity network visualizations, to illustrate connections between companies based on structured information extracted from unstructured text.
  • Consulted founders with respect to technical aspects of business models with long-term impacts on decision-making.
Technologies: Artificial Intelligence (AI), Data Science, Knowledge Graphs, Ontologies, Python, Data Analysis, Big Data, Data Collection, Modeling, Natural Language Processing (NLP), Sentiment Analysis, Vectorization, Semantic Search

Machine Learning Specialist

2023 - 2023
Ekwithree GmbH
  • Built a key-phrase extraction model that extracts business-relevant terms from unstructured company text.
  • Created a custom semantic search pipeline for company search.
  • Consulted with the team about technical development and business planning.
Technologies: Python, Data Science, Machine Learning, Natural Language Processing (NLP), Text Analytics, Artificial Intelligence (AI), Generative Pre-trained Transformers (GPT)

NLP Data Scientist

2022 - 2022
Grata
  • Built an NLP pipeline with components that include synthetic dataset augmentation using GPT-3, few-shot topic classification using contrastive learning and transformer finetuning, and a suit of linguistic heuristics.
  • Created a keyword extraction pipeline that uses morphological and dependency parsing, synthetic data augmentation using GPT-3, few-shot classification using contrastive learning, and transformers to extract dyadic networks from company descriptions.
  • Developed and deployed interactive dashboards to demonstrate data extraction tools using Steamlit and GCP.
Technologies: Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Data Science, Python, PyTorch, Machine Learning, Statistics, Artificial Intelligence (AI), Generative Pre-trained Transformer 3 (GPT-3), Deep Learning, APIs, Language Models, Data Processing Automation

Data Scientist

2022 - 2022
Hxr Eq LLC
  • Researched concerning models and techniques used by major eCommerce websites for search ranking.
  • Consulted concerning the business implications of models and techniques used in eCommerce search ranking for eCommerce retailers.
  • Counseled concerning future work and development in eCommerce search ranking strategies.
Technologies: Data Science, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Rankings, Generative Pre-trained Transformer 3 (GPT-3), Python, Deep Learning, Machine Learning, Artificial Intelligence (AI), APIs, Language Models, Data Processing Automation

ML and OpenAI Developer

2022 - 2022
HODL Media Inc.
  • Developed an algorithm to filter cryptocurrency-related news search results.
  • Deployed a pipeline that leverages several data retrieval APIs, transformer-based architectures, and the GPT-3 API in GCP.
  • Consulted concerning the future deployment of NLP-driven solutions for information retrieval.
Technologies: OpenAI, Machine Learning, OpenAI Gym, Generative Pre-trained Transformer 3 (GPT-3), Python, Deep Learning, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation

NLP Engineer

2022 - 2022
Sky Dust Intelligence B.V.
  • Developed an AI framework that leverages GPT-3 and other transformer-based neural network architectures to automate email summarization, replies, and question answering.
  • Developed a cloud-based Office Outlook add-in that leverages an AI framework for email automation.
  • Consulted the team with regard to product development and Natural Language Processing.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Knowledge Graphs, Deep Learning, Generative Pre-trained Transformer 3 (GPT-3), Python, Machine Learning, APIs, Language Models, Data Processing Automation, MVP Design

Co-researcher

2021 - 2022
American University of Armenia
  • Developed a transformers-driven NLP toolkit to analyze multi-language news and social media text data.
  • Built a pipeline for real-time monitoring, analysis, and visualization of strategic information and psychological operations (PSYOPS).
  • Consulted the government of Armenia on strategic information operations.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Time Series Analysis, Research, Data Scraping, MongoDB, Transformers, Social Network Analysis, Consulting, Hugging Face, TensorFlow, PyTorch, NumPy, Pandas, Google Cloud Platform (GCP), Data Science, Jupyter Notebook, Scikit-learn, BigQuery, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, NoSQL, Charts, Databases, Microsoft Excel, Graphs, OpenAI Gym, OpenAI, Keras, Data Modeling, Time Series, Azure, Python, Deep Learning, Machine Learning, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design

International Consultant on Social Media Data Quality Assessment

2021 - 2022
United Nations Statistics Division
  • Developed a hybrid NLP-driven methodology to monitor social media data quality.
  • Built an end-to-end ETL pipeline that gathers social media data using advanced automation bots. It also leverages the state-of-the-art of transformer-based architectures for text and image classification.
  • Conceived and facilitated training seminars on a range of topics in data science and NLP.
  • Contributed to the National Administrative Department of Colombia's (DANE) social media data strategy.
  • Participated in international forums to present and discuss results and prospects of undertaken tasks.
Technologies: Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Data Scraping, Computer Vision, Statistics, Data Visualization, Python, Deep Learning, Machine Learning, Consulting, Hugging Face, TensorFlow, PyTorch, NumPy, Pandas, Google Cloud Platform (GCP), Data Science, Jupyter Notebook, Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, NoSQL, Charts, Databases, Microsoft Excel, Graphs, Microsoft Power BI, OpenAI Gym, OpenAI, Keras, Data Modeling, Time Series, Azure, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design

Data Science Team Lead

2019 - 2022
UNDP Armenia National SDG Innovation Lab
  • Developed supervised and unsupervised language models for Armenian, Russian, and English in various use cases.
  • Designed, implemented, and managed end-to-end data science projects for various sectors, including tourism, labor, social services, etc.
  • Oversaw and applied novel methods for unconventional data analysis of the sustainable development goals (SDG) implementation in Armenia and other countries.
  • Represented Armenia in international forums on data science for international development.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Time Series Analysis, Computer Vision, ETL, EDA, Deep Learning, Machine Learning, Data Scraping, Geospatial Data, MongoDB, Data Visualization, Dashboards, Market Research & Analysis, Hugging Face, XGBoost, CatBoost, TensorFlow, PyTorch, NumPy, Pandas, Google Cloud Platform (GCP), Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, SQL, NoSQL, Charts, Databases, Microsoft Excel, Graphs, Microsoft Power BI, OpenAI Gym, OpenAI, Keras, Data Modeling, Time Series, Azure, Python, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design

Entrepreneur and Researcher

2019 - 2020
Impact Hub
  • Researched and modeled diversified revenue-sharing approaches for smallholder aggregation to reduce smallholder farmers' supply chain risk in agricultural production.
  • Communicated with stakeholders in agriculture, finance, and international development to research, develop, and promote the concept.
  • Developed a novel approach for risk management in smallholder agricultural production.
Technologies: Risk Models, Supply Chain, International Trade, Entrepreneurship, Time Series Analysis, Research, NumPy, Pandas, Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, Charts, Databases, Microsoft Excel, Business Intelligence (BI), Microsoft Power BI, Data Modeling, Time Series, Azure, Data Processing Automation, MVP Design

Machine Learning Analyst

2018 - 2020
Ameriabank
  • Built natural language processing models for a virtual call center assistant (chatbot).
  • Developed recurrent neural networks and convolutional neural networks to forecast commodity prices, financial market indicators, and product sales.
  • Created the novel Product2Vec and Customer2Vec models to forecast and predict customer churn.
Technologies: Time Series Analysis, Forecasting, Machine Learning, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Chatbots, TensorFlow, NumPy, Pandas, Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, SQL, Charts, Databases, Microsoft Excel, Business Intelligence (BI), Microsoft Power BI, Keras, Data Modeling, Time Series, Azure, Python, Deep Learning, Artificial Intelligence (AI), APIs, Data Processing Automation, MVP Design

Serviceman

2015 - 2017
Ministry of Defense of Republic of Armenia
  • Developed code to analyze and visualize tactical, strategic, and administrative data.
  • Conducted various tasks related to artillery reconnaissance, collaboration with foreign delegations, research, and speech–writing.
  • Coordinated research by experts from MIT and Harvard, Oxford, and Cambridge universities.
Technologies: Teamwork, Leadership, Python, Time Series Analysis, Project Management, NumPy, Pandas, Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, Charts, Databases, Microsoft Excel, Data Modeling

Experience

AI4Mulberry

https://www.sdglab.am/en/projects
This project aimed to automate the classification of communications between citizens and government agencies to increase the operational efficiency and quality of the service provided by the government of Armenia. Citizens' written communications are hierarchically classified, first concerning the ministries, then the departments of a given ministry, and finally the branches within a given department.

The primary challenge in the project was working with low-resource languages and tiny datasets for supervised learning. The framework I designed to overcome this challenge entailed dataset augmentation using machine translation and generative autoregressive language models for paraphrase generation and zero-shot classification and finetuning of pre-trained transformers such as XLM-Roberta.

Travelinsights

Travelinsights.ai is the first-ever real-time data analytics tool in the tourism sector in Armenia. The tool combines travel storytelling and natural language processing to collect, analyze, and visualize sentiments and topics concerning tourism in Armenia based on Tripadvisor.com, Facebook.com, and Booking.com reviews.

I contributed to designing the tool to provide public policymakers in the tourism sector with real-time actionable intelligence and historical trend data to render decision-making more data-driven and evidence-based.

Edu2Work

https://edu2work.am/
I contributed to the development of Edu2Work, a platform that continuously collects online job announcements from various commercial websites. It then cleans and standardizes the incoming data concerning job titles and skill requirements through supervised machine learning and visualizes the data in an interactive online dashboard.

National Administrative Department of Colombia

The goal of this project was to develop tools to efficiently collect, assess the quality of, and analyze unconventional data from social media, with the goal of obtaining granular insights concerning feelings of discrimination among the population in Colombia.

My responsibilities in this project involved developing a hybrid NLP-driven methodology to monitor social media data quality and building an end-to-end ETL pipeline that gathers social media data using advanced automation bots and leveraging transformer-based architectures for text and image classification.

The project and the insights gathered from it contributed to the social media data strategy of the National Administrative Department of Colombia (DANE).

Education

2023 - 2025

Master's Degree in Natural Language Processing

Mohamed bin Zayed University of Artificial Intelligence - Abu Dhabi, UAE

2013 - 2018

Bachelor of Science Degree in Mathematics with Economics

University College London | UCL - London, United Kingdom

Skills

Libraries/APIs

NumPy, Pandas, Scikit-learn, XGBoost, CatBoost, TensorFlow, PyTorch, Natural Language Toolkit (NLTK), Keras

Tools

Microsoft Excel, Jupyter, Microsoft Power BI, ChatGPT, BigQuery, OpenAI Gym

Languages

Python, SQL

Paradigms

ETL, Business Intelligence (BI), Asynchronous Programming

Platforms

Jupyter Notebook, Ubuntu, Visual Studio Code (VS Code), Google Cloud Platform (GCP), Azure

Storage

Databases, MongoDB, NoSQL, Google Cloud

Industry Expertise

Project Management, Insurance

Frameworks

Flask

Other

Natural Language Processing (NLP), Data Scraping, Deep Learning, Machine Learning, EDA, Transformers, Artificial Intelligence (AI), Text Classification, Text Mining, Web Scraping, Dashboards, Data Science, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, Charts, Data Modeling, APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design, Generative Pre-trained Transformers (GPT), Mathematics, Linear Algebra, Graph Theory, Mathematical Analysis, Microeconomics, Macroeconomics, Probability Theory, Statistics, Computer Vision, Data Visualization, Consulting, Time Series Analysis, Geospatial Data, Forecasting, Chatbots, Research, Social Network Analysis, Risk Models, Teamwork, Leadership, Generative Pre-trained Transformer 3 (GPT-3), IT Project Management, Networks, Geospatial Analytics, Hugging Face, Graphs, OpenAI, Time Series, Stable Diffusion, Real Estate, Environment, Economics, Financial Mathematics, Quantitative Risk Modeling, Game Theory, Measure Theory, Supply Chain, International Trade, Entrepreneurship, Market Research & Analysis, History, Physics, English, Languages, Biology, Environmental Science, Art, Knowledge Graphs, Rankings, OpenAI GPT-4 API, Ontologies, Big Data, Modeling, Sentiment Analysis, Vectorization, Semantic Search, Text Analytics, Large Language Models (LLMs), Agentic AI, Front-end, Document Parsing, Retrieval-augmented Generation (RAG)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring