Senior Data Scientist
2019 - 2021myInvenio- Designed and implemented a cloud-based platform for on-demand training and deployment of ML models for process mining and business process analysis; this enabled several novel ML algorithms and massively shortened TTM for further ML-based projects.
- Co-designed and implemented a novel AI/XAI algorithm for extracting root causes of business process anomalies; this bleeding-edge algorithm led to advertising our product as AI-enabled, multiple sales to large customers, and partnering with IBM.
- Automated manual production steps in drug manufacturing using ML models, which led to throughput increase by 200% while maintaining labor cost; proposed additional cost-effective improvements estimated to reduce manual involvement by 200%.
- Designed and prototyped an NLP-based ML pipeline, which allowed unsupervised identification of business threads from screen capture and click-and-key log of PC user activity; implemented a POC of PC application to obtain the necessary data.
- Implemented a data quality control pipeline for business process logs and designed a wizard-based UX to guide users in fixing issues with their data; it reduced the load on the helpdesk, as many requests were related to data issues unknown to clients.
- Implemented a simple yet powerful engine for business rule mining by extending the Java library for decision trees; designed UI/UX for presenting its results to the users; the sales department often cited the new feature as a major selling point.
- Created numerous automated CD/CI pipelines, development, and testing tools for internal use of the data science team, which streamlined workflows and reduced manual labor related to testing by about three times, as estimated by the co-workers.
Technologies: Data Reporting, Data Science, Pandas, Java 8, You Only Look Once (YOLO), Conda, DevOps, Azure DevOps, Business Rules, Decision Trees, Logistic Regression, Tokenization, Topic Modeling, WinAPI, Tesseract, OCR, Machine Learning Operations (MLOps), Git, Azure Tables, Azure Table Storage, SQL, Process Discovery, Process Mining, Business Process Analysis, Classification Algorithms, Azure Blob Storage API, Bash Script, Azure Kubernetes Service (AKS), Azure Functions, Explainable Artificial Intelligence (XAI), LSTM Networks, Gradient Boosting, Gradient Boosted Trees, Azure Machine Learning, Azure Blobs, Azure, Python, Computer Vision, Kubernetes, Natural Language Processing (NLP), Random Forests, Artificial Intelligence (AI), RESTful Development, RESTful APIs, MySQL, PostgreSQL, Root Cause Analysis, Anomaly Detection, MLflow, Apache AirflowSenior Research Scientist/Senior Data Scientist
2017 - 2019CAMLIN Group- Discovered and fixed results-invalidating the bug in the previously used data analysis pipeline; without my involvement, wrong results would have been published in a major publication and invalidated the filed patent.
- Designed and implemented a data preprocessing pipeline for the multidimensional biometric data and deep-learning model for real-time prediction of user intentions from biometric data.
- Ported a Tensorflow-based ML model on edge device Jetson TX2, which allowed model training, deployment, and real-time prediction in a portable battery-powered form.
- Co-led a multistage project, planning and coordinating work between several parties, including a scientistic research lab, industrial R&D, and engineering team, and communicated with the stakeholders.
- Participated in patenting the discoveries, including an ML model, as an end-to-end approach for Brain-Computer Interface architecture (https://patents.google.com/patent/WO2020211958A1).
- Created and taught an extensive 3-week course on Applied Data Science for interns and junior employees and conducted internal training.
Technologies: Gantt Charts, Project Management, Data Preprocessing, Data Pipelines, Principal Component Analysis (PCA), Unsupervised Learning, Classification Algorithms, Logistic Regression, Convolutional Neural Networks, Deep Learning, TensorFlow, Data Quality Analysis, Complex Data Analysis, Time Series, Time Series Analysis, Accelerometers, Experimental Design, Biometrics, HDF, MATLAB, Python 3, Keras, Artificial Intelligence (AI), Anomaly Detection