Data Scientist
2019 - 2021Octimine- Conducted research in biomedical named entity recognition (NER) and developed a system in Python that extracts and normalizes chemical entities and diseases from the legal text.
- Created a monitoring system in Node.js to collect information from staging and production servers. Visualized the results and created monitoring dashboards using Grafana.
- Used Docker to containerize external dependencies and runtimes for various system components to alleviate the dependency overhead and create faster development pipelines.
Technologies: Web Development, Data Modeling, JavaScript, Docker Hub, NumPy, Matplotlib, Machine Learning, Visual Studio Code, GitLab, Git, Jupyter Notebook, Natural Language Processing (NLP), Word2Vec, Linux, Java, Python, Data Science, Software Engineering, Neural Networks, Deep Neural Networks, NLTK, Node.js, Docker, Grafana, Pandas, Cheminformatics, NER, Data Visualization, Deep Learning, Transformers, HDF5, Scikit-learn, Seaborn, PyTorch, Elasticsearch, Kibana, Data Engineering, Big Data, XPath, XQuery, Scraping, Data Scraping, Text Classification, Regex, Categorization, Data Pipelines, Data Analytics, Data Analysis, Analysis, Analytics, Scientific Data Analysis, JSON, RedisResearch Software Development Engineer
2018 - 2019Microsoft- Developed an automated benchmarking pipeline in Python based on various NLU evaluation metrics. The pipeline runs periodically in an automated fashion and produces up-to-date evaluation metrics of the system and comparisons with competitor systems.
- Worked on back-end servers with C# and .NET framework. Created new API endpoints and optimized existing ones, resulting in a significant drop in response latency.
- Refactored a large system component with legacy code to an extensible design following best-practice design patterns, thus allowing for easier future extendibility while maintaining backward compatibility.
Technologies: Web Development, Matplotlib, NumPy, .NET, Visual Studio Code, Visual Studio, Git, Natural Language Processing (NLP), Word2Vec, Anaconda, Agile Software Development, Software Engineering, Data Science, Seaborn, Scikit-learn, NLTK, Pandas, Natural Language Understanding (NLU), NER, Data Visualization, ASP.NET, C#, Python, Regex, Text Classification, Classification, Text Categorization, SQL, Data Analysis, Data Analytics, Data Pipelines, Analysis, Analytics, Scientific Data AnalysisData Scientist
2016 - 2016Self-employed- Collaborated with a chemist colleague on chemical data analysis tasks, focusing on finding patterns and relations between chemical compound structures and their usage in drugs related to specific diseases.
- Conducted experiments in natural language understanding and created a pipeline that performs intent classification and named-entity recognition to automate the processing of client receipts.
- Used image recognition and computer vision algorithms to enhance the capabilities of a license plate recognition system to identify non-standard, hand-written, and multilingual characters.
Technologies: C++, Visual Studio Code, GitLab, GitHub, Git, Jupyter Notebook, R, Natural Language Processing (NLP), Word2Vec, Anaconda, Data Science, Software Engineering, Data Visualization, NLTK, Computer Vision, OpenCV, NER, Natural Language Understanding (NLU), Cheminformatics, NumPy, Matplotlib, Seaborn, Scikit-learn, Pandas, Python, Exploratory Data Analysis, Text Categorization, Data Analysis, Data Analytics, Analysis, Analytics, Scientific Data AnalysisResearch Assistant
2015 - 2015Ulm University- Conducted research in neuroinformatics, focusing on analyzing biomedical data of patients and identifying patterns that reflect the level of pain a patient is undergoing during a medical operation.
- Created machine learning models that predict pain intensity of a specific patient based on visual data from their facial expressions and biopotential data from sensors recording signals in their nervous system.
- Developed a neural network package in the R language that implements a parameterized, multi-layer perception optimized with resilient and classic backpropagation algorithms.
Technologies: Ggplot2, C#, Data Science, Data Visualization, Deep Learning, Neuroinformatics, Neural Networks, Machine Learning, Python, R, Data Analysis, Data Analytics, Analysis, Analytics, Scientific Data Analysis, Clustering, RStudio, RStudio Shiny, Dplyr, Tidyverse