Laura Tolosi
Verified Expert in Engineering
Machine Learning Developer
Sofia, Bulgaria
Toptal member since February 4, 2019
Laura has a Ph.D. from the Max Planck Institute for Informatics, Germany, in the field of computational biology, focused on cancer biomarker detection using statistics and machine learning. She worked on projects in the field of natural language processing such as named entity recognition, sentiment analysis, fake news detection. Recently, she has worked on applying reinforcement learning methodology for trading financial instruments.
Portfolio
Experience
- Machine Learning - 14 years
- R - 14 years
- Data Visualization - 14 years
- Random Forests - 14 years
- Clustering Algorithms - 14 years
- Natural Language Processing (NLP) - 7 years
- Sentiment Analysis - 7 years
- Generative Pre-trained Transformers (GPT) - 7 years
Availability
Preferred Environment
R, Python
The most amazing...
...project I did was to analyze a novel neuroblastoma-tumor dataset and search for viral DNA that could be causing cancer in small children.
Work Experience
Data Scientist and Machine Learning Engineer
Self-employed
- Implemented a reinforcement learning framework for algorithmic trading of cryptocurrencies.
- Implemented chatbots from scratch using NLP state-of-the-art methods, based on Transformers (BERT).
- Executed chatbots using Google Dialogflow and Google Cloud.
- Implemented a framework for automated relation extraction from technical documents.
- Implemented a module for estimating product repurchase-rate for an eCommerce client. In the same context, wrote algorithms for identifying abnormal purchase rates.
- Worked on a machine learning-based solution for pattern detection in trading data (financial domain). Wrote heuristics as a semi-automated procedure for producing labeled data.
Lead Scientist | Text Analysis
Ontotext Ad
- Developed ML models for NLP, including methods for domain adaptation, methods for automated feature selection, methods for optimization of F-measure. Applied models such as logistic regression, SVM, CRF for both classification and sequence tagging.
- Developed a machine learning model for classification of tweets as either rumor/not rumor in R.
- Acquired in-depth knowledge in relational databases, ontologies, and linked data. Implemented a classification model written in Java, that automatically categorizes Wikipedia pages as either belonging to the topic "Food and Drink" or not.
- Experimented with topic models with LDA in order to help with a reccommender system for a large publishing company.
- Built prototypes for training word-vectors embeddings and graph embeddings.
- Developed models for sentiment analysis for English and Bulgarian, in R and Java. The methods were supervised for English and unsupervised for Bulgarian.
- Acquired significant experience with automated and semi-automated integration of various RDF resources as DBpedia and Geonames.
PhD
Max-Planck Institute für Informatik
- Gained expertise in cancer genetics, with a focus on copy number aberrations and acquired additional in-depth knowledge in domains like epigenetics, transcriptomics, and viral genomes.
- Used supervised and unsupervised machine learning methods for modeling cancer genetic data. The supervised methods used were: logistic regression, elastic net, SVM, decision trees, and random forest.
- Wrote machine learning models in the statistical language R and acquired in-depth expertise with visualization techniques in R.
- Acquired solid experience with presenting complex AI models to non-experts (medical doctors), by giving the intuition behind the mathematical models.
- Performed feature selection with various methods: filters with statistical tests, penalty methods for linear models, and pruning.
- Acquired solid knowledge in computational statistics and statistical learning. This includes statistical tests, statistical distributions, estimators, and bias-variance decomposition.
- Wrote scientific papers and learned how to deliver high-quality presentations in conferences and in front of clients.
- Worked closely with medical doctors in hospitals. Conducted interdisciplinary communication with medical doctors, in order to maximize the benefit of the machine learning solutions for their patients.
Experience
Canadian Heritage Information Network (CHIN) - Data Analysis
https://lauratolosi.shinyapps.io/museums/I worked with two colleagues on this project. My role was to statistically estimate the proportion of malformed data, focusing on its most important features (eg. museum, objects category, type, name, language). I also had to estimate what proportion of the errors are systematic and are addressable by automatic methods (NLP).
Eventually, the project was successful, exceeding the expectations of the Canadian institution.
Brexit Twitter Analysis
Algorithmic Trading of Cryptocurrencies
Rumor Detection on Social Media (Twitter)
I was involved in many aspects of the PHEME project. As a data scientist, I developed an ML model for prediction of rumors on Twitter. As a member of Ontotext's team, I coordinated the integration of various pipeline components coming from all partners. I wrote deliverables, reports and scientific papers describing our work.
Mining Highly Structured Information (MobiBiz, London)
Chatbot for Dialogue with Book Characters (USC Libraries)
My role in the project was to help my team select a speech recognition system that can be used to translate users’ questions into text and to implement a question answering model that is able to select the appropriate answer from the list of possible answers. I used BERT for question answering. The system is deployed as a web service and takes requests in real-time, through a Flask app.
Education
PhD in Computational Biology
Max-Planck-Insitute for Informatics - Saarbrücken, Germany
Master's Degree in Computational Biology
Max-Planck-Insitute for Informatics - Saarbrücken, Germany
Bachelor's Degree in Computer Science
University of Bucharest - Bucharest, Romania
Certifications
Participation in EEML Summer School for Deep Learning, Organized by Google Deep Mind
EEML
Skills
Libraries/APIs
Scikit-learn, TensorFlow, SQLAlchemy
Tools
PyCharm, Dialogflow, Git, GitLab
Languages
R, Python 2, Python 3, Python, RDF, Java, SPARQL, SQL
Platforms
Linux, Jupyter Notebook, RStudio
Frameworks
RStudio Shiny, Flask
Storage
JSON, PostgreSQL, Amazon S3 (AWS S3)
Other
Machine Learning, Data Visualization, Random Forests, Clustering Algorithms, Natural Language Processing (NLP), Sentiment Analysis, Scientific Data Analysis, Research, Statistics, Computational Biology, Generative Pre-trained Transformers (GPT), BERT, Neural Networks, Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Generalized Linear Model (GLM), Information Retrieval, Applied Mathematics, Algorithms, Reinforcement Learning, Deep Reinforcement Learning, Chatbots, Custom BERT, Automatic Speech Recognition (ASR), Mixed-effects Models, Marketing Mix, Meta Robyn, Time Series, Ontologies, Deep Learning, Agile Data Science, Natural Language Understanding (NLU), Time Series Analysis
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring