Clustering Algorithms Developer
Laura has a Ph.D. from the Max Planck Institute for Informatics, Germany, in the field of computational biology, focused on cancer biomarker detection using statistics and machine learning. She worked on projects in the field of natural language processing such as named entity recognition, sentiment analysis, fake news detection. Recently, she has worked on applying reinforcement learning methodology for trading financial instruments.
ExperienceMachine Learning - 14 yearsClustering Algorithms - 14 yearsRandom Forests - 14 yearsData Visualization - 14 yearsR - 14 yearsGenerative Pre-trained Transformers (GPT) - 7 yearsSentiment Analysis - 7 yearsNatural Language Processing (NLP) - 7 years
The most amazing...
...project I did was to analyze a novel neuroblastoma-tumor dataset and search for viral DNA that could be causing cancer in small children.
Data Scientist and Machine Learning Engineer
- Implemented a reinforcement learning framework for algorithmic trading of cryptocurrencies.
- Implemented chatbots from scratch using NLP state-of-the-art methods, based on Transformers (BERT).
- Executed chatbots using Google Dialogflow and Google Cloud.
- Implemented a framework for automated relation extraction from technical documents.
- Implemented a module for estimating product repurchase-rate for an eCommerce client. In the same context, wrote algorithms for identifying abnormal purchase rates.
- Worked on a machine learning-based solution for pattern detection in trading data (financial domain). Wrote heuristics as a semi-automated procedure for producing labeled data.
Lead Scientist | Text Analysis
- Developed ML models for NLP, including methods for domain adaptation, methods for automated feature selection, methods for optimization of F-measure. Applied models such as logistic regression, SVM, CRF for both classification and sequence tagging.
- Developed a machine learning model for classification of tweets as either rumor/not rumor in R.
- Acquired in-depth knowledge in relational databases, ontologies, and linked data. Implemented a classification model written in Java, that automatically categorizes Wikipedia pages as either belonging to the topic "Food and Drink" or not.
- Experimented with topic models with LDA in order to help with a reccommender system for a large publishing company.
- Built prototypes for training word-vectors embeddings and graph embeddings.
- Developed models for sentiment analysis for English and Bulgarian, in R and Java. The methods were supervised for English and unsupervised for Bulgarian.
- Acquired significant experience with automated and semi-automated integration of various RDF resources as DBpedia and Geonames.
Max-Planck Institute für Informatik
- Gained expertise in cancer genetics, with a focus on copy number aberrations and acquired additional in-depth knowledge in domains like epigenetics, transcriptomics, and viral genomes.
- Used supervised and unsupervised machine learning methods for modeling cancer genetic data. The supervised methods used were: logistic regression, elastic net, SVM, decision trees, and random forest.
- Wrote machine learning models in the statistical language R and acquired in-depth expertise with visualization techniques in R.
- Acquired solid experience with presenting complex AI models to non-experts (medical doctors), by giving the intuition behind the mathematical models.
- Performed feature selection with various methods: filters with statistical tests, penalty methods for linear models, and pruning.
- Acquired solid knowledge in computational statistics and statistical learning. This includes statistical tests, statistical distributions, estimators, and bias-variance decomposition.
- Wrote scientific papers and learned how to deliver high-quality presentations in conferences and in front of clients.
- Worked closely with medical doctors in hospitals. Conducted interdisciplinary communication with medical doctors, in order to maximize the benefit of the machine learning solutions for their patients.
Canadian Heritage Information Network (CHIN) - Data Analysishttps://lauratolosi.shinyapps.io/museums/
I worked with two colleagues on this project. My role was to statistically estimate the proportion of malformed data, focusing on its most important features (eg. museum, objects category, type, name, language). I also had to estimate what proportion of the errors are systematic and are addressable by automatic methods (NLP).
Eventually, the project was successful, exceeding the expectations of the Canadian institution.
Brexit Twitter Analysis
Algorithmic Trading of Cryptocurrencies
Rumor Detection on Social Media (Twitter)
I was involved in many aspects of the PHEME project. As a data scientist, I developed an ML model for prediction of rumors on Twitter. As a member of Ontotext's team, I coordinated the integration of various pipeline components coming from all partners. I wrote deliverables, reports and scientific papers describing our work.
Mining Highly Structured Information (MobiBiz, London)
Chatbot for Dialogue with Book Characters (USC Libraries)
My role in the project was to help my team select a speech recognition system that can be used to translate users’ questions into text and to implement a question answering model that is able to select the appropriate answer from the list of possible answers. I used BERT for question answering. The system is deployed as a web service and takes requests in real-time, through a Flask app.
R, Python 2, Python 3, Python, RDF, Java, SPARQL, SQL
Machine Learning, Data Visualization, Random Forests, Clustering Algorithms, Natural Language Processing (NLP), Sentiment Analysis, Scientific Data Analysis, Research, Statistics, Computational Biology, GPT, Generative Pre-trained Transformers (GPT), BERT, Neural Networks, Convolutional Neural Networks, Deep Neural Networks, Generalized Linear Model (GLM), Information Retrieval, Applied Mathematics, Algorithms, Reinforcement Learning, Deep Reinforcement Learning, Chatbots, Custom BERT, ASR, Ontologies, Deep Learning, Agile Data Science, Natural Language Understanding (NLU), Time Series Analysis
Scikit-learn, TensorFlow, SQLAlchemy
PyCharm, Dialogflow, Git, GitLab
Linux, Jupyter Notebook, RStudio
RStudio Shiny, Flask
JSON, PostgreSQL, Amazon S3 (AWS S3)
PhD in Computational Biology
Max-Planck-Insitute for Informatics - Saarbrücken, Germany
Master's Degree in Computational Biology
Max-Planck-Insitute for Informatics - Saarbrücken, Germany
Bachelor's Degree in Computer Science
University of Bucharest - Bucharest, Romania
Participation in EEML Summer School for Deep Learning, Organized by Google Deep Mind