Data Science Consultant
2021 - 2022ClinicalMind, LLC- Developed data collection pipelines for various sources, such as PubMed, Twitter, medical forums, Open Payments database, claims database, Google Books, and more. These data were merged and deduplicated using specialized ML algorithms.
- Built an algorithm for surfacing and analyzing lexicon differences across audiences on social media, traditional media, and scientific publications.
- Constructed Bespoke Metrics for finding Digital Opinion Leaders in social media, scientific publications, books, and trends. An algorithm for name disambiguation allowed for finding Opinion Leaders prominent in all areas or a specific area.
Technologies: Python, Google API, Natural Language Processing (NLP), Data Engineering, Key Performance Metrics, APIs, Data Collection, Data Pipelines, Linear Algebra, Twitter API, PubMed & Mendeley APIs, ELTSenior Data Scientist
2019 - 2021IBM Process Mining (formerly myInvenio)- Designed and implemented a cloud-based platform for the on-demand training and deployment of ML models for process mining and business process analysis, enabling several novel ML algorithms and massively shortened TTM for further ML-based projects.
- Codesigned and implemented a novel AI/XAI algorithm for extracting the root causes of business process anomalies; this bleeding-edge algorithm led to advertising our product as AI-enabled, multiple sales to large customers, and partnering with IBM.
- Automated manual production steps in drug manufacturing using ML models, leading to a throughput increase of 200% while maintaining labor costs. Proposed cost-effective improvements estimated to reduce manual involvement by 50%.
- Designed and prototyped an NLP-based ML pipeline, which allowed unsupervised identification of business threads from screen capture and click-and-key log of PC user activity; implemented a POC of PC application to obtain the necessary data.
- Created a data quality control pipeline for business process logs and designed a wizard-based UX to guide users in fixing issues with their data; it reduced the load on the helpdesk, as many requests were related to data issues unknown to clients.
- Implemented a simple yet powerful engine for business rule mining by extending the Java library for decision trees; designed UI/UX for presenting its results to the users; the sales department often cited the new feature as a major selling point.
- Created numerous automated CD/CI pipelines, development, and testing tools for the internal use of the data science team, which streamlined workflows and reduced manual labor related to testing by about three times, as estimated by coworkers.
Technologies: Data Reporting, Data Science, Pandas, Java 8, You Only Look Once (YOLO), Conda, DevOps, Azure DevOps, Business Rules, Decision Trees, Logistic Regression, Tokenization, Topic Modeling, WinAPI, Tesseract, OCR, Machine Learning Operations (MLOps), Git, Azure Tables, Azure Table Storage, SQL, Process Discovery, Process Mining, Business Process Analysis, Classification Algorithms, Azure Blob Storage API, Bash Script, Azure Kubernetes Service (AKS), Azure Functions, Explainable Artificial Intelligence (XAI), LSTM Networks, Gradient Boosting, Gradient Boosted Trees, Azure Machine Learning, Azure Blobs, Azure, Python, Computer Vision, Kubernetes, Natural Language Processing (NLP), Random Forests, Artificial Intelligence (AI), REST APIs, RESTful Development, MySQL, PostgreSQL, Root Cause Analysis, Anomaly Detection, MLflow, Apache Airflow, XGBoost, ETL, Amazon Web Services (AWS)Senior Research Scientist/Senior Data Scientist
2017 - 2019CAMLIN Group- Discovered and fixed results-invalidating the bug in the previously used data analysis pipeline; without my involvement, wrong results would have been published in a major publication and invalidated the filed patent.
- Designed and implemented a data preprocessing pipeline for the multidimensional biometric data and deep-learning model for real-time prediction of user intentions from biometric data.
- Ported a Tensorflow-based ML model on edge device Jetson TX2, which allowed model training, deployment, and real-time prediction in a portable battery-powered form.
- Co-led a multistage project, planning and coordinating work between several parties, including a scientistic research lab, industrial R&D, and engineering team, and communicated with the stakeholders.
- Participated in patenting the discoveries, including an ML model, as an end-to-end approach for Brain-Computer Interface architecture (https://patents.google.com/patent/WO2020211958A1).
- Created and taught an extensive 3-week course on Applied Data Science for interns and junior employees and conducted internal training.
Technologies: Gantt Charts, Project Management, Data Preprocessing, Data Pipelines, Principal Component Analysis (PCA), Unsupervised Learning, Classification Algorithms, Logistic Regression, Convolutional Neural Networks, Deep Learning, TensorFlow, Data Quality Analysis, Complex Data Analysis, Time Series, Time Series Analysis, Accelerometers, Experimental Design, Biometrics, HDF, MATLAB, Python 3, Keras, Artificial Intelligence (AI), Anomaly Detection