Vahan Martirosyan, Developer in Abu Dhabi, United Arab Emirates
Vahan is available for hire
Hire Vahan

Vahan Martirosyan

Verified Expert  in Engineering

Data Scientist and Developer

Location
Abu Dhabi, United Arab Emirates
Toptal Member Since
March 16, 2022

Vahan is a data scientist with over five years of experience building several end-to-end ETL pipelines that integrate data from multiple sources. He is adept at leveraging cutting-edge tools in NLP, time series analysis, computer vision, geospatial data analysis, network analysis, and tabular data analysis to meet the project needs. Vahan employs a holistic approach to data science consulting and enjoys deep diving into the business context underlying his data science projects.

Availability

Part-time

Preferred Environment

Ubuntu, Visual Studio Code (VS Code), Jupyter, MongoDB, Python, ChatGPT, Stable Diffusion, Real Estate, OpenAI GPT-4 API

The most amazing...

...project I've developed uses various data sources and modeling modalities, including NLP, CV, and networks, to deliver social, political, and economic insights.

Work Experience

NLP Data Scientist

2022 - PRESENT
Grata Inc
  • Built an NLP pipeline with components that include synthetic dataset augmentation using GPT-3, few-shot topic classification using contrastive learning and transformer finetuning, and a suit of linguistic heuristics.
  • Built a keyword extraction pipeline that uses morphological and dependency parsing, synthetic data augmentation using GPT-3, few-shot classification using contrastive learning and transformers to extract dyadic networks from company descriptions.
  • Built and deployed interactive dashboards to demonstrate data extraction tools using Steamlit and GCP.
Technologies: GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Data Science, Python, PyTorch, Machine Learning, Statistics, Artificial Intelligence (AI), Generative Pre-trained Transformer 3 (GPT-3), Deep Learning, APIs, Language Models, Data Processing Automation

Data Scientist

2022 - 2022
Hxr Eq LLC
  • Researched concerning models and techniques used by major eCommerce websites for search ranking.
  • Consulted concerning the business implications of models and techniques used in eCommerce search ranking for eCommerce retailers.
  • Counseled concerning future work and development in eCommerce search ranking strategies.
Technologies: Data Science, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Rankings, Generative Pre-trained Transformer 3 (GPT-3), Python, Deep Learning, Machine Learning, Artificial Intelligence (AI), APIs, Language Models, Data Processing Automation

ML and OpenAI Developer

2022 - 2022
HODL Media Inc.
  • Developed an algorithm to filter cryptocurrency-related news search results.
  • Deployed a pipeline that leverages several data retrieval APIs, transformer-based architectures, and the GPT-3 API in GCP.
  • Consulted concerning the future deployment of NLP-driven solutions for information retrieval.
Technologies: OpenAI, Machine Learning, OpenAI Gym, Generative Pre-trained Transformer 3 (GPT-3), Python, Deep Learning, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation

NLP Engineer

2022 - 2022
Sky Dust Intelligence B.V.
  • Developed an AI framework that leverages GPT-3 and other transformer-based neural network architectures to automate email summarization, replies, and question answering.
  • Developed a cloud-based Office Outlook add-in that leverages an AI framework for email automation.
  • Consulted the team with regard to product development and Natural Language Processing.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPT, Artificial Intelligence (AI), Knowledge Graphs, Deep Learning, Generative Pre-trained Transformer 3 (GPT-3), Python, Machine Learning, APIs, Language Models, Data Processing Automation, MVP Design

Co-researcher

2021 - 2022
American University of Armenia
  • Developed a transformers-driven NLP toolkit to analyze multi-language news and social media text data.
  • Built a pipeline for real-time monitoring, analysis, and visualization of strategic information and psychological operations (PSYOPS).
  • Consulted the government of Armenia on strategic information operations.
Technologies: GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Time Series Analysis, Research, Data Scraping, MongoDB, Transformers, Social Network Analysis, Consulting, Hugging Face, TensorFlow, PyTorch, NumPy, Pandas, Google Cloud Platform (GCP), Data Science, Jupyter Notebook, Scikit-learn, BigQuery, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, NoSQL, Charts, Databases, Microsoft Excel, Graphs, OpenAI Gym, OpenAI, Keras, Data Modeling, Time Series, Azure, Python, Deep Learning, Machine Learning, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design

International Consultant on Social Media Data Quality Assessment

2021 - 2022
United Nations Statistics Division
  • Developed a hybrid NLP-driven methodology to monitor social media data quality.
  • Built an end-to-end ETL pipeline that gathers social media data using advanced automation bots. It also leverages the state-of-the-art of transformer-based architectures for text and image classification.
  • Conceived and facilitated training seminars on a range of topics in data science and NLP.
  • Contributed to the National Administrative Department of Colombia's (DANE) social media data strategy.
  • Participated in international forums to present and discuss results and prospects of undertaken tasks.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPT, Data Scraping, Computer Vision, Statistics, Data Visualization, Python, Deep Learning, Machine Learning, Consulting, Hugging Face, TensorFlow, PyTorch, NumPy, Pandas, Google Cloud Platform (GCP), Data Science, Jupyter Notebook, Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, NoSQL, Charts, Databases, Microsoft Excel, Graphs, Microsoft Power BI, OpenAI Gym, OpenAI, Keras, Data Modeling, Time Series, Azure, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design

Data Science Team Lead

2019 - 2022
UNDP Armenia National SDG Innovation Lab
  • Developed supervised and unsupervised language models for Armenian, Russian, and English in various use cases.
  • Designed, implemented, and managed end-to-end data science projects for various sectors, including tourism, labor, social services, etc.
  • Oversaw and applied novel methods for unconventional data analysis of the sustainable development goals (SDG) implementation in Armenia and other countries.
  • Represented Armenia in international forums on data science for international development.
Technologies: Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Time Series Analysis, Computer Vision, ETL, EDA, Deep Learning, Machine Learning, Data Scraping, Geospatial Data, MongoDB, Data Visualization, Dashboards, Market Research & Analysis, Hugging Face, XGBoost, CatBoost, TensorFlow, PyTorch, NumPy, Pandas, Google Cloud Platform (GCP), Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, SQL, NoSQL, Charts, Databases, Microsoft Excel, Graphs, Microsoft Power BI, OpenAI Gym, OpenAI, Keras, Data Modeling, Time Series, Azure, Python, Artificial Intelligence (AI), APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design

Entrepreneur and Researcher

2019 - 2020
Impact Hub
  • Researched and modeled diversified revenue-sharing approaches for smallholder aggregation to reduce smallholder farmers' supply chain risk in agricultural production.
  • Communicated with stakeholders in agriculture, finance, and international development to research, develop, and promote the concept.
  • Developed a novel approach for risk management in smallholder agricultural production.
Technologies: Risk Models, Supply Chain, International Trade, Entrepreneurship, Time Series Analysis, Research, NumPy, Pandas, Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, Charts, Databases, Microsoft Excel, Business Intelligence (BI), Microsoft Power BI, Data Modeling, Time Series, Azure, Data Processing Automation, MVP Design

Machine Learning Analyst

2018 - 2020
Ameriabank
  • Built natural language processing models for a virtual call center assistant (chatbot).
  • Developed recurrent neural networks and convolutional neural networks to forecast commodity prices, financial market indicators, and product sales.
  • Created the novel Product2Vec and Customer2Vec models to forecast and predict customer churn.
Technologies: Time Series Analysis, Forecasting, Machine Learning, GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Chatbots, TensorFlow, NumPy, Pandas, Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, SQL, Charts, Databases, Microsoft Excel, Business Intelligence (BI), Microsoft Power BI, Keras, Data Modeling, Time Series, Azure, Python, Deep Learning, Artificial Intelligence (AI), APIs, Data Processing Automation, MVP Design

Serviceman

2015 - 2017
Ministry of Defense of Republic of Armenia
  • Developed code to analyze and visualize tactical, strategic, and administrative data.
  • Conducted various tasks related to artillery reconnaissance, collaboration with foreign delegations, research, and speech–writing.
  • Coordinated research by experts from MIT and Harvard, Oxford, and Cambridge universities.
Technologies: Teamwork, Leadership, Python, Time Series Analysis, Project Management, NumPy, Pandas, Data Science, Jupyter Notebook, Natural Language Toolkit (NLTK), Scikit-learn, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, Charts, Databases, Microsoft Excel, Data Modeling

AI4Mulberry

https://www.sdglab.am/en/projects
This project aimed to automate the classification of communications between citizens and government agencies to increase the operational efficiency and quality of the service provided by the government of Armenia. Citizens' written communications are hierarchically classified, first concerning the ministries, then the departments of a given ministry, and finally the branches within a given department.

The primary challenge in the project was working with low-resource languages and tiny datasets for supervised learning. The framework I designed to overcome this challenge entailed dataset augmentation using machine translation and generative autoregressive language models for paraphrase generation and zero-shot classification and finetuning of pre-trained transformers such as XLM-Roberta.

Travelinsights

https://www.travelinsights.ai/
Travelinsights.ai is the first-ever real-time data analytics tool in the tourism sector in Armenia. The tool combines travel storytelling and natural language processing to collect, analyze, and visualize sentiments and topics concerning tourism in Armenia based on Tripadvisor.com, Facebook.com, and Booking.com reviews.

I contributed to designing the tool to provide public policymakers in the tourism sector with real-time actionable intelligence and historical trend data to render decision-making more data-driven and evidence-based.

Edu2Work

https://edu2work.am/
I contributed to the development of Edu2Work, a platform that continuously collects online job announcements from various commercial websites. It then cleans and standardizes the incoming data concerning job titles and skill requirements through supervised machine learning and visualizes the data in an interactive online dashboard.

National Administrative Department of Colombia

The goal of this project was to develop tools to efficiently collect, assess the quality of, and analyze unconventional data from social media, with the goal of obtaining granular insights concerning feelings of discrimination among the population in Colombia.

My responsibilities in this project involved developing a hybrid NLP-driven methodology to monitor social media data quality and building an end-to-end ETL pipeline that gathers social media data using advanced automation bots and leveraging transformer-based architectures for text and image classification.

The project and the insights gathered from it contributed to the social media data strategy of the National Administrative Department of Colombia (DANE).

Languages

Python, SQL

Libraries/APIs

NumPy, Pandas, Scikit-learn, XGBoost, CatBoost, TensorFlow, PyTorch, Natural Language Toolkit (NLTK), Keras

Tools

Microsoft Excel, Jupyter, Microsoft Power BI, BigQuery, OpenAI Gym

Paradigms

ETL, Data Science, Business Intelligence (BI), Asynchronous Programming

Platforms

Jupyter Notebook, Ubuntu, Visual Studio Code (VS Code), Google Cloud Platform (GCP), Azure

Storage

Databases, MongoDB, NoSQL

Other

Natural Language Processing (NLP), Data Scraping, Deep Learning, Machine Learning, EDA, Transformers, Artificial Intelligence (AI), Text Classification, Text Mining, Web Scraping, Dashboards, Data Analytics, Predictive Modeling, Data Collection, Data Analysis, Charts, Data Modeling, APIs, Web Crawlers, Scraping, Language Models, Data Processing Automation, MVP Design, GPT, Generative Pre-trained Transformers (GPT), Mathematics, Linear Algebra, Graph Theory, Mathematical Analysis, Microeconomics, Macroeconomics, Probability Theory, Statistics, Computer Vision, Data Visualization, Consulting, Time Series Analysis, Geospatial Data, Forecasting, Chatbots, Research, Social Network Analysis, Risk Models, Teamwork, Leadership, Generative Pre-trained Transformer 3 (GPT-3), IT Project Management, Networks, Geospatial Analytics, Hugging Face, Graphs, OpenAI, Time Series, ChatGPT, Stable Diffusion, Real Estate, Environment, Economics, Financial Mathematics, Quantitative Risk Modeling, Game Theory, Measure Theory, Supply Chain, International Trade, Entrepreneurship, Market Research & Analysis, History, Physics, English, Languages, Biology, Environmental Science, Art, Knowledge Graphs, Rankings, OpenAI GPT-4 API

Industry Expertise

Project Management, Insurance

Frameworks

Flask

2013 - 2018

Bachelor of Science Degree in Mathematics with Economics

University College London | UCL - London, United Kingdom

2008 - 2012

High School Diploma in Secondary Education

John F. Kennedy Schule - Berlin, Germany

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring