Arsine is available for hire

Arsine Sarikyan

Verified Expert in Engineering

Data Scientist and Developer

Location

Yerevan, Armenia

Toptal Member Since

November 4, 2022

Arsine is a creative and scientifically rigorous data scientist with over three years of experience. Her experience includes designing experimentation processes such as data collection, robustness checks, and explainability, developing ML and AI strategies, and conducting NLP research. She specializes in causal analytics to answer the why question and find hidden patterns and relationships. Arsine contributes to the research community by writing and publishing academic papers.

Portfolio

American University of Armenia

Python, SQL, A/B Testing, Product Analytics, Marketing Analytics...

Metric

Python, Microsoft Power BI, Predictive Modeling, Predictive Analytics...

Metric

Python, Data Science, Natural Language Processing (NLP)...

Experience

Data Research - 5 years Exploratory Data Analysis - 5 years Time Series - 5 years Machine Learning - 3 years Explainable Artificial Intelligence (XAI) - 3 years Unsupervised Learning - 3 years Artificial Intelligence (AI) - 3 years GPT - 1 year

Availability

Full-time

Preferred Environment

Slack, Python, Trello

The most amazing...

...is putting your knowledge and creativity into the work so that you can extract insights and values of raw data.

Work Experience

Adjunct Instructor for Advanced Topics in Data Analytics

2023 - PRESENT

American University of Armenia

Developed a graduate course for advanced data analysis topics, covering customer segmentation, PCA, research techniques, association rule mining, SQL basics, and others.
Adjusted real-life company consulting projects for student assignments that aimed to put learned concepts to use.
Designed a unique learning curriculum to fill the knowledge gaps in data science and future professional development.

Technologies: Python, SQL, A/B Testing, Product Analytics, Marketing Analytics, Marketing Research & Analysis, Predictive Modeling, Predictive Learning, Predictive Analytics, Principal Component Analysis (PCA), Cluster, Clustering, Analysis, Analytics, Data Analysis, Data, Data Analytics, Data Quality Analysis, Market Research, Market Research & Analysis

Senior Data Scientist

2020 - PRESENT

Metric

Managed and led 360-degree US retail market research for data acquisition, cleaning, and early trend identification.
Performed extensive data quality checks for 10+ data sources and identified the most informative ones for environmental, demographic, and speed information.
Led the data acquisition and scraping, merged various data sets, and created a fully automated working pipeline for continuous data updates and warehousing.
Conducted research to generate features and develop approaches for early identification of trends and most promising location. The acquired insights were evaluated based on historical data and expert knowledge.

Technologies: Python, Microsoft Power BI, Predictive Modeling, Predictive Analytics, Unsupervised Learning, Clustering, Patterns, Geolocation, Feature Planning, Data Cleaning, Data Visualization, Data Extraction, Exploratory Data Analysis, Exploratory Testing, Causal Inference, Research, Time Series, Data Science, Data Analytics, Machine Learning, Data Research, Data Reporting, Regression, Rankings, Data Analysis, Statistics, Statistical Analysis, Statistical Methods, GIS, ETL, Data Engineering, Feature Analysis, Feature Prioritization, Predictive Learning, Linear Regression, Regression Testing, Data Auditing, Field Research, Root Cause Analysis, Pandas, NumPy, Plotly, Matplotlib, Seaborn, Scikit-learn, Data, Analytics, APIs, SciPy, Data Preparation, Data Preprocessing, Algorithms, Forecasting, Trend Forecasting, Data Pipelines, Models, AI Programming, Google Colaboratory (Colab), Communication, Data Communication, Market Research, Market Research & Analysis

Senior Data Scientist

2021 - 2022

Metric

Created an end-to-end pipeline for historical and real-time news scraping. The acquired articles were cleaned and preprocessed to select the most relevant news based on location and topic.
Implemented text cleaning, part-of-speech tagging, feature generation, and sentiment score evaluation. The scores were later used to find the relationship between sentiment and location real price index changes.
Developed news text preprocessing and analysis to classify whether it was a rumor, based on a unique threshold scheme and forward-looking algorithm. The approach resulted in the identification of real-time insight for data without any labels.

Technologies: Python, Data Science, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Data Analytics, Machine Learning, Sentiment Analysis, Data Scraping, Google News, Geolocation, Causal Inference, Data Reporting, Data Research, Unsupervised Learning, Rankings, Data Analysis, Data Engineering, Data Entry Analysis, Data Quality Analysis, Data Quality Management, Data Quality, Data Validation, Data Visualization, Tokenization, Named-entity Recognition (NER), Feature Planning, Feature Analysis, Feature Prioritization, News, Newsletters, Articles, Regression, Linear Regression, Regex, Regression Modeling, Real Estate, Commercial Real Estate, Residential Real Estate, Pandas, NumPy, Scikit-learn, Plotly, Matplotlib, APIs, Natural Language Toolkit (NLTK), SpaCy, Gensim, Patterns, Data, Google Trends, SciPy, LightGBM, Random Forests, Random Forest Regression, Gradient Boosted Trees, Ensemble Methods, Algorithms, Predictive Modeling, Predictive Learning, Predictive Analytics, Forecasting, Data Pipelines, Trend Forecasting, Trend Analysis, Models, Modeling, AI Programming, Google Colaboratory (Colab), Communication, Data Communication, Market Research, Market Research & Analysis

Quant Trader and Senior Data Scientist

2021 - 2022

DWF-Labs

Designed a decentralized finance automated liquidity provision strategy for Uniswap v3 based on ARIMA and GARCH. The strategy minimized the range for maximizing returns and reduced the number of repositions for cost savings.
Created a full historical data acquisition pipeline based on CoinGecko. Merged information about tokens from CoinMarketCap for filtering categories. Improved a social presence score based on Google news sentiment, search score, and LunarCrush API.
Developed a token lifecycle prediction model based on the Prophet for the coins planned to be launched with the category as the only information available. The approach worked on aggregating category information from existing coins.

Technologies: Python, SQL, Dashboards, Patterns, Time Series, Predictive Modeling, Data Extraction, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Data Cleaning, Data Visualization, Regression, Strategy, Decentralized Finance (DeFi), Hierarchical Clustering, Unsupervised Learning, Data Science, Data Analytics, Web Scraping, APIs, Machine Learning, Artificial Intelligence (AI), Pandas, Plotly, Data Reporting, Blockchain, Algorithms, Statistics, Statistical Analysis, ETL, NumPy, Scikit-learn, ARIMA, ARIMA Models, Autoregressive Integrated Moving Average (ARIMA), Predictive Analytics, News, Trends, Google Trends, LightGBM, Random Forest Regression, Ensemble Methods, ARIMAX Models, Data, Analytics, Data Analysis, Analysis, DataViz, Big Data, Dask, Predictive Text, Predictive Learning, Forecasting, Trend Forecasting, Data Pipelines, Models, Modeling, AI Programming, Financial Modeling

Data Scientist

2019 - 2022

State Revenue Committee of Armenia

Built a tax fraud detection gradient boosting model with the ROC AUC score of 71% based on extensive feature generation and data cleaning.
Published a paper about the project in a peer-reviewed journal, which can be read at www.tandfonline.com/doi/full/10.1080/08839514.2021.2012002.
Developed a supplier and buyer network as an alternative way of replacing historical fraud and audit features, which resulted in higher generalizability.
Trained and built an auto-updated model ready for integration with the existing risk identification strategies.
Experimented with a number of different techniques and approaches for finding the best technology based on literature reviews, such as clustering for anomaly detection, regression-based ranking, and feature engineering.
Created many features based on the literature review by joining different data sets. The final model's feature elimination and feature importance based on classical thresholds and SHAP values proved the significant role of those features.

Technologies: Python, Predictive Modeling, Integration, Data Cleaning, Explainable Artificial Intelligence (XAI), Feature Analysis, Feature Planning, Data Extraction, Big Data, Causal Inference, Networks, Data Visualization, Exploratory Data Analysis, Data Analytics, Machine Learning, Artificial Intelligence (AI), Ensemble Methods, Data Reporting, Data Science, Unsupervised Learning, Hierarchical Clustering, Regression, Rankings, Data Analysis, Data Engineering, Pandas, NumPy, SciPy, Scikit-learn, Random Forests, Random Forest Regression, Gradient Boosting, Gradient Boosted Trees, XGBoost, Linear Regression, Logistic Regression, LightGBM, Shapely, Feature Prioritization, Analysis, Analytics, Data Quality Analysis, Feature Roadmaps, Models, Modeling, Evaluation, Interpretation, Neural Networks, Research, Data, Predictive Learning, Predictive Analytics, Data Validation, Lift, Principal Component Analysis (PCA), Cluster, Clustering, Dask, Forecasting, Data Pipelines, AI Programming, Google Colaboratory (Colab), Communication, Data Communication

Data Scientist

2019 - 2021

Metric

Designed and led a data collection team of 8-10 people and a process for model building and evaluation to be robust and available for a larger population, while only having data from 10-15 people. Defined a unique business-specific evaluation metric.
Implemented feature engineering and generation based on the best practices from literature. Experimented with unsupervised clustering and multi and binary classification for activity identification.
Created an algorithm for accurate human activity classification and post-prediction filtering based on the gait data from the shoe console. The model resulted in 98% ROC AUC and achieved only around a 3-minute error compared to the real results.
Developed a high-performance customer weight prediction with only a two-kilogram error. The resulting model was highly compatible with the available techniques in the literature and was advantageous because of its simplicity and light computation.
Improved the product structure after model evaluation, analysis, and recursive feature elimination by removing half of the sensors, making it faster, cheaper, and more usable.

Technologies: Python, Data Cleaning, Supervisory Control & Data Acquisition (SCADA), Data Entry Analysis, Data Visualization, Predictive Modeling, Signal Processing, Time Series, Unsupervised Learning, Clustering, Feature Planning, Feature Analysis, Product Development, Exploratory Testing, Exploratory Data Analysis, Data Analytics, Machine Learning, Artificial Intelligence (AI), Data Reporting, Statistical Data Analysis, Data Science, Pandas, NumPy, Plotly, LightGBM, Regression, Data Analysis, Scikit-learn, Matplotlib, Data Engineering, Fourier Series, Data, DataViz, Data Quality, Data Quality Analysis, Data Quality Management, Data Collection, Data Preparation, Data Preprocessing, Feature Prioritization, Predictive Analytics, Predictive Learning, Shapely, Time Series Analysis, Cost Analysis, Optimization, Analysis, Analytics, Cluster, Algorithms, Linear Optimization, Forecasting, Trend Forecasting, Data Pipelines

Experience

Satellite Metrics of Nighttime Lights

I provided evidence for the common trend assumption for the DID experiment using satellite data on average nighttime light per community. The preprocessed data resulted in a time-series nighttime score for the approximation of economic activity.

Human Activity and Weight Classification

https://medium.com/analytics-vidhya/weight-prediction-framework-from-gait-data-a4823895fc81

An elegant and simple solution for human activity and weight classification from gait data based on the carefully designed features and extensive pre- and post-cleaning of the signal for high generalizability.

Product Class Identification

I developed an algorithm for product categorization based on product description, which worked for short descriptions and robust and high throughput screening (HTS) description changes. Some experiments included attribute extraction, topic modeling, question answering, GPT-3 fine-tuning, and prompt engineering.

Fraud Classification

I developed a model with 79% accuracy, which was utterly transparent and explainable and had the same results as complex black-box models because of advanced feature engineering techniques. I used SHAP values, explainable artificial intelligence, rule extraction, and domain-based feature engineering.

Skills

Languages

Python, SQL, Regex

Frameworks

LightGBM, Lift

Paradigms

Data Science, ETL

Other

Business Logic, Predictive Modeling, Data Cleaning, Explainable Artificial Intelligence (XAI), Feature Analysis, Feature Planning, Predictive Analytics, Exploratory Data Analysis, Machine Learning, Artificial Intelligence (AI), Ensemble Methods, Data Research, Time Series, Natural Language Processing (NLP), Causal Inference, Unsupervised Learning, GPT, Generative Pre-trained Transformers (GPT), Business, Business Cases, Cost Accounting, Finance, SEO Marketing, Marketing Mix, Research, Psychology, Ethics, Environment, Innovation, Alternative Energy, IT Projects, Financing, Data, Analytics, Quantitative Analysis, Integration, Dashboards, Patterns, Data Extraction, Big Data, Networks, Data Visualization, Regression, Strategy, Decentralized Finance (DeFi), Hierarchical Clustering, Clustering, Geolocation, Exploratory Testing, Supervisory Control & Data Acquisition (SCADA), Data Entry Analysis, Signal Processing, Product Development, Data Analytics, Data Reporting, Statistical Data Analysis, Web Scraping, APIs, Sentiment Analysis, Data Scraping, Google News, Satellite Images, Economics, Statistical Analysis, Classification, Logistic Regression, Random Forests, Interpretation, Principal Component Analysis (PCA), Rankings, Generative Pre-trained Transformer 3 (GPT-3), Topic Modeling, Data Analysis, Algorithms, Statistics, Statistical Methods, Data Engineering, Feature Prioritization, Predictive Learning, Linear Regression, Regression Testing, Data Auditing, Field Research, Root Cause Analysis, Data Quality Analysis, Data Quality Management, Data Quality, Tokenization, News, Newsletters, Articles, Regression Modeling, Real Estate, Commercial Real Estate, Residential Real Estate, ARIMA, ARIMA Models, Autoregressive Integrated Moving Average (ARIMA), Trends, Google Trends, Random Forest Regression, Gradient Boosted Trees, ARIMAX Models, Analysis, BERT, Custom BERT, Training, Tax Preparation, Data Preparation, Text Classification, Data Preprocessing, Fourier Series, Data Collection, Time Series Analysis, Cost Analysis, Optimization, Gradient Boosting, Feature Roadmaps, Models, Modeling, Evaluation, Neural Networks, Linear Optimization, Predictive Text, Data Cleansing, Classification Algorithms, Forecasting, Trend Forecasting, Trend Analysis, AI Programming, Google Colaboratory (Colab), Communication, Data Communication, A/B Testing, Product Analytics, Marketing Analytics, Marketing Research & Analysis, Financial Modeling, Market Research, Market Research & Analysis

Libraries/APIs

Pandas, NumPy, Scikit-learn, Matplotlib, Natural Language Toolkit (NLTK), SpaCy, SciPy, Shapely, XGBoost, Dask

Tools

Slack, Trello, Microsoft Power BI, Plotly, GIS, Seaborn, Named-entity Recognition (NER), Gensim, DataViz, Cluster

Platforms

Blockchain

Storage

Data Validation, Data Pipelines

Education

2018 - 2020

Master of Science Degree in Strategic Management

American University of Armenia - Yerevan, Armenia

2014 - 2018

Bachelor of Arts Degree in Business

American University of Armenia - Yerevan, Armenia

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring