Matteo Pallini
Verified Expert in Engineering
Data Scientist and Software Developer
Matteo is a data scientist, machine learning engineer, and software developer with a BSc and MSc in economics and statistics. He has been a data analyst, modeler, and developer, mainly in small tech companies. Matteo's capabilities include Python, SQL, Git, Bash, MongoDB, and Docker; he has built regression analyses and tree-based models; and he has used NLP and scraping techniques.
Portfolio
Experience
Availability
Preferred Environment
Jupyter Notebook, PyCharm, Ubuntu
The most amazing...
...thing I've built was a scraping pipeline that extracted the number of attendees from ~200,000 events websites, using a combination of regex and NLP techniques.
Work Experience
Machine Learning Engineer
Migacore Technologies
- Determined that the process used to flag events relevant for travel demand was time-consuming and possibly biased. Transitioned to an XGBoost model trained on manually labeled events, reducing the time to add features to the pipeline by 70%.
- Extracted event characteristics from the relevant websites, using a combination of XPath, regex, and NLP. The features built (e.g., attendee numbers and presence of sponsored airline offers) had accuracy rates ranging from 80 to 95%.
- Scraped websites for events likely to generate uplifts in flight demand.
Data Scientist/Software Engineer
Iwoca
- Improved the accuracy of credit scorecards through the creation and inclusion of a logistic regression model.
- Automated credit checks and credit application rejections, reducing the frequency of numerous manual interventions by 15 to 40%.
- Created the MySQL marketing database and integrated it with internal and external platforms. The database, storing approximately 3.5 million leads, allowed iwocato to optimize its marketing channels and grow the main one by more than 125%.
- Built tools that allowed the strategy team to monitor and forecast financial metrics and loss statistics. Acquired and applied extensive knowledge of Pandas and Matplotlib during this initiative.
Experience
Scraping Visitor Numbers from Big Events Websites
The final pipeline started off by extracting the website text using Scrapy. From the text, through the use of regex, it was possible to extract the paragraphs that contained a number and referred to visitors. Then, from this set, only the cases for which the number referred to the event were kept. It was possible to do so through a combination of SpaCy named-entity recognition (NER) and some NLTK utilities.
Eventually, this process allowed us to extract visitor numbers from websites with a false positive rate below 20%. This was a fairly small percentage, considering the broad variety of websites and the fact that it was achieved in approximately three weeks of work.
Skills
Languages
Python
Libraries/APIs
Pandas, Matplotlib, Scikit-learn, SpaCy
Other
Data Analysis, Web Scraping, Loans & Lending, Statistics, Econometrics, Regression Modeling, Time Series Analysis, Bayesian Statistics, Gradient Boosted Trees, Time Series, Machine Learning, Business Loans, Travel
Frameworks
Scrapy
Tools
Git, PyCharm
Paradigms
Data Science
Platforms
Jupyter Notebook, Ubuntu, Docker
Storage
MySQL, PostgreSQL, MongoDB
Industry Expertise
Marketing
Education
Master of Science Degree in Economics and Statistics
Bocconi University - Milan, Italy
Bachelor of Science Degree in Economics and Statistics
Bocconi University - Milan
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring