Arsine Sarikyan
Verified Expert in Engineering
Data Scientist and Developer
Yerevan, Armenia
Toptal member since November 4, 2022
Arsine is a creative and scientifically rigorous data scientist with over three years of experience. Her experience includes designing experimentation processes such as data collection, robustness checks, and explainability, developing ML and AI strategies, and conducting NLP research. She specializes in causal analytics to answer the why question and find hidden patterns and relationships. Arsine contributes to the research community by writing and publishing academic papers.
Portfolio
Experience
Availability
Preferred Environment
Slack, Python, Trello
The most amazing...
...is putting your knowledge and creativity into the work so that you can extract insights and values of raw data.
Work Experience
Adjunct Instructor for Advanced Topics in Data Analytics
American University of Armenia
- Developed a graduate course for advanced data analysis topics, covering customer segmentation, PCA, research techniques, association rule mining, SQL basics, and others.
- Adjusted real-life company consulting projects for student assignments that aimed to put learned concepts to use.
- Designed a unique learning curriculum to fill the knowledge gaps in data science and future professional development.
Senior Data Scientist
Metric
- Managed and led 360-degree US retail market research for data acquisition, cleaning, and early trend identification.
- Performed extensive data quality checks for 10+ data sources and identified the most informative ones for environmental, demographic, and speed information.
- Led the data acquisition and scraping, merged various data sets, and created a fully automated working pipeline for continuous data updates and warehousing.
- Conducted research to generate features and develop approaches for early identification of trends and most promising location. The acquired insights were evaluated based on historical data and expert knowledge.
Senior Data Scientist
Metric
- Created an end-to-end pipeline for historical and real-time news scraping. The acquired articles were cleaned and preprocessed to select the most relevant news based on location and topic.
- Implemented text cleaning, part-of-speech tagging, feature generation, and sentiment score evaluation. The scores were later used to find the relationship between sentiment and location real price index changes.
- Developed news text preprocessing and analysis to classify whether it was a rumor, based on a unique threshold scheme and forward-looking algorithm. The approach resulted in the identification of real-time insight for data without any labels.
Quant Trader and Senior Data Scientist
DWF-Labs
- Designed a decentralized finance automated liquidity provision strategy for Uniswap v3 based on ARIMA and GARCH. The strategy minimized the range for maximizing returns and reduced the number of repositions for cost savings.
- Created a full historical data acquisition pipeline based on CoinGecko. Merged information about tokens from CoinMarketCap for filtering categories. Improved a social presence score based on Google news sentiment, search score, and LunarCrush API.
- Developed a token lifecycle prediction model based on the Prophet for the coins planned to be launched with the category as the only information available. The approach worked on aggregating category information from existing coins.
Data Scientist
State Revenue Committee of Armenia
- Built a tax fraud detection gradient boosting model with the ROC AUC score of 71% based on extensive feature generation and data cleaning.
- Published a paper about the project in a peer-reviewed journal, which can be read at www.tandfonline.com/doi/full/10.1080/08839514.2021.2012002.
- Developed a supplier and buyer network as an alternative way of replacing historical fraud and audit features, which resulted in higher generalizability.
- Trained and built an auto-updated model ready for integration with the existing risk identification strategies.
- Experimented with a number of different techniques and approaches for finding the best technology based on literature reviews, such as clustering for anomaly detection, regression-based ranking, and feature engineering.
- Created many features based on the literature review by joining different data sets. The final model's feature elimination and feature importance based on classical thresholds and SHAP values proved the significant role of those features.
Data Scientist
Metric
- Designed and led a data collection team of 8-10 people and a process for model building and evaluation to be robust and available for a larger population, while only having data from 10-15 people. Defined a unique business-specific evaluation metric.
- Implemented feature engineering and generation based on the best practices from literature. Experimented with unsupervised clustering and multi and binary classification for activity identification.
- Created an algorithm for accurate human activity classification and post-prediction filtering based on the gait data from the shoe console. The model resulted in 98% ROC AUC and achieved only around a 3-minute error compared to the real results.
- Developed a high-performance customer weight prediction with only a two-kilogram error. The resulting model was highly compatible with the available techniques in the literature and was advantageous because of its simplicity and light computation.
- Improved the product structure after model evaluation, analysis, and recursive feature elimination by removing half of the sensors, making it faster, cheaper, and more usable.
Experience
Satellite Metrics of Nighttime Lights
Human Activity and Weight Classification
https://medium.com/analytics-vidhya/weight-prediction-framework-from-gait-data-a4823895fc81Product Class Identification
Fraud Classification
Education
Master of Science Degree in Strategic Management
American University of Armenia - Yerevan, Armenia
Bachelor of Arts Degree in Business
American University of Armenia - Yerevan, Armenia
Skills
Libraries/APIs
Pandas, NumPy, Scikit-learn, Matplotlib, Natural Language Toolkit (NLTK), SpaCy, SciPy, Shapely, XGBoost, Dask
Tools
Slack, Trello, Microsoft Power BI, Plotly, GIS, Seaborn, Named-entity Recognition (NER), Gensim, ARIMA, ARIMAX, DataViz, Cluster
Languages
Python, SQL, Regex
Frameworks
LightGBM, Lift
Paradigms
ETL
Platforms
Blockchain
Storage
Data Validation, Data Pipelines
Other
Business Logic, Predictive Modeling, Data Cleaning, Explainable Artificial Intelligence (XAI), Feature Analysis, Feature Planning, Predictive Analytics, Exploratory Data Analysis, Machine Learning, Artificial Intelligence, Data Science, Ensemble Methods, Data Research, Time Series, Natural Language Processing (NLP), Causal Inference, Unsupervised Learning, Generative Pre-trained Transformers (GPT), Business, Business Cases, Cost Accounting, Finance, SEO Marketing, Marketing Mix, Research, Psychology, Ethics, Environment, Innovation, Alternative Energy, IT Projects, Financing, Data, Analytics, Quantitative Analysis, Integration, Dashboards, Patterns, Data Extraction, Big Data, Networks, Data Visualization, Regression, Strategy, Decentralized Finance (DeFi), Hierarchical Clustering, Clustering, Geolocation, Exploratory Testing, Supervisory Control & Data Acquisition (SCADA), Data Entry Analysis, Signal Processing, Product Development, Data Analytics, Data Reporting, Statistical Data Analysis, Web Scraping, APIs, Sentiment Analysis, Data Scraping, Google News, Satellite Images, Economics, Statistical Analysis, Classification, Logistic Regression, Random Forests, Interpretation, Principal Component Analysis (PCA), Rankings, Generative Pre-trained Transformer 3 (GPT-3), Topic Modeling, Data Analysis, Algorithms, Statistics, Statistical Methods, Data Engineering, Feature Prioritization, Predictive Learning, Linear Regression, Regression Testing, Data Auditing, Field Research, Root Cause Analysis, Data Quality Analysis, Data Quality Management, Data Quality, Tokenization, News, Newsletters, Articles, Regression Modeling, Real Estate, Commercial Real Estate, Residential Real Estate, Trends, Google Trends, Random Forest Regression, Gradient Boosted Trees, Analysis, BERT, Custom BERT, Training, Tax Preparation, Data Preparation, Text Classification, Data Preprocessing, Fourier Series, Data Collection, Time Series Analysis, Cost Analysis, Optimization, Gradient Boosting, Feature Roadmaps, Models, Modeling, Evaluation, Neural Networks, Linear Optimization, Predictive Text, Data Cleansing, Classification Algorithms, Forecasting, Trend Forecasting, Trend Analysis, AI Programming, Google Colaboratory (Colab), Communication, Data Communication, A/B Testing, Product Analytics, Marketing Analytics, Marketing Research & Analysis, Financial Modeling, Market Research, Market Research & Analysis
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring