Simone Romano
Verified Expert in Engineering
Machine Learning Developer
Helsinki, Finland
Toptal member since January 3, 2022
Simone is a machine learning scientist and engineer with experience in academia and enterprises, including Microsoft and Huawei. He likes to work at the intersection of deep machine learning, NLP, and information retrieval. Simone also loves to work on exploration analysis and building theoretically sound machine learning pipelines ready for production. He especially enjoys building web products.
Portfolio
Experience
Availability
Preferred Environment
Windows, Linux, Visual Studio Code (VS Code), Jupyter Notebook, Google Colaboratory (Colab), Git, Amazon Web Services (AWS)
The most amazing...
...thing I've worked on is a low-latency and web-scale machine learning system for query suggestions used by hundreds of million users worldwide.
Work Experience
NLP and Machine Learning Solution Architect
Toptal
- Defined the machine learning and data roadmap for startups with no in-house machine learning expert.
- Identified the key machine learning solutions and technology for startups with no in-house machine learning expert.
- Implemented actionable machine learning and data pipelines for MVPs and products.
- Iterated on improving models and pipelines on existing ML products.
Indie Scientist, Engineer, and Business Developer
Self-employed
- Brainstormed, researched, developed, and implemented fully functional web apps with machine learning at the core.
- Developed productized services for a machine learning agency and a search engine customization agency.
- Built a variety of products based on machine learning and web scraping. Used machine learning techniques like NLP, computer vision, and information retrieval.
- Contributed to every part of the launch of a new project, including research, engineering, marketing, and business development.
Principal Scientist
Huawei Technologies Co.
- Designed and developed an NLP multilingual auto-moderation system.
- Guided the project engineering and machine learning strategy.
- Performed extensive research, literature review, and comparative analysis.
- Gathered data and supervised the collection of text data performed by third parties.
- Implemented machine learning pipelines to automatically classify text data in multiple languages achieving over 90% accuracy.
- Interfaced with the development team to deploy models and pipelines in the Huawei Cloud infrastructure.
- Published patents to protect intellectual property.
Scientific Advisor
Sciar
- Provided advice on the state-of-the-art augmented reality and machine learning techniques to advance scientific discoveries in laboratories.
- Provided advice on best practices to implement machine learning pipelines.
- Advised on local tech hubs and startup venues locally and internationally.
Founder
Tywai
- Put together a team of scientists and engineers to develop a deep learning and computer vision platform for photo processing.
- Led the process of creating, operating, and growing a tech startup.
- Developed a deep learning pipeline to process photos with augmented reality techniques.
- Oversaw the design and development of a React Native app to interface with the machine learning back end.
Applied Scientist
Microsoft
- Worked on Bing and Windows query autosuggest, which are currently used by hundreds of million users worldwide.
- Optimized the machine learning-based autosuggest system capable of a very low-latency throughput worldwide, spelling correct, making suggestions based on the user context and background.
- Deployed various machine learning and data pipelines A/B testing their performance on hundreds of million users.
- Worked with machine learning pipelines crunching terabytes of data.
- Interfaced with researchers from Microsoft Research to design new innovative ways to work on text data.
Research Scientist
National Institute of Informatics
- Worked on the analysis of high-dimensional datasets.
- Developed techniques to be more effective in high-dimensional datasets. These techniques are based on the intrinsic dimensionality theory.
- Presented results at conferences and in research seminars.
- Applied novel techniques to data from the World Health Organization (WHO) and energy consumption data to increase energy efficiency.
Academic Tutor
University of Melbourne
- Taught to bachelor and master students fundamentals on data mining and natural language processing.
- Designed Kaggle competition about Twitter geolocation.
- Designed tests and final exams to assess students' skills.
PhD Student
University of Melbourne
- Completed doctoral studies in machine learning and data mining.
- Published work on several top-tier venues, such as The International Conference on Machine Learning (ICML), The Conference on Knowledge Discovery and Data Mining (KDD), and The Journal of Machine Learning Research (JMLR).
- Won best paper award for ''A Framework to Adjust Dependency Measure Estimates for Chance" paper at the SIAM International Conference on Data Mining.
- Worked on several domains and applications like web, social media, medicine, biology, traffic data, and sport.
- Focused on improving classification algorithms, feature selection techniques, clustering algorithms, anomaly detection techniques, among others.
Research Scientist
University of Padova
- Worked on response prediction for the treatment of hepatitis C in collaboration with medical doctors.
- Developed interpretable predictive models based on decision trees.
- Created novel models that showed actions that doctors can take into consideration to improve treatment response.
Data Analyst
Euromonitor International
- Analyzed data from national markets in the clothing industry to predict future trends.
- Wrote business intelligence reports for companies interested in understanding future trends.
- Interviewed major players in the industry to ground prediction models on their background expertise.
Web Developer
Ewebb
- Worked as a web developer and designer, building fully functional company websites and eCommerce websites.
- Designed front ends and back ends working on client-side and server-side scripting.
- Interfaced with clients to understand their user needs.
Experience
Bing and Windows Search
http://www.bing.comThe system currently serves different markets in the US, Europe, Asia, and Australia. It can serve hundreds of thousands of keystrokes per second via a distributed algorithm.
Automatic Multilingual Content Moderation
https://consumer.huawei.com/en/mobileservices/appgallery/I was a key player in implementing a machine learning pipeline to perform automatic content moderation. This consisted of research and development of state-of-the-art NLP methods, data gathering, model development, and deployment to production.
Emojuju
https://emojuju.com/It makes use of a neural network written in JavaScript to process pictures on your device to assure privacy. No picture is uploaded to any back end for privacy purposes.
Tabslu
Your data is stored on a Google sheet that you own and can modify. By plugging in your data to Tabslu, your users can pay a subscription using Stripe to access it.
I built the whole marketplace from scratch and integrated it with Google APIs to use Google Sheets as the back end.
Tywai
I put together a team of scientists and engineers for this task. Managed the implementation of some end applications, such as changing the color of clothing and adding writings and advertising on it in a realistic way.
Business Link
This system allows searching across companies' webpages to see if a particular logo is present. It uses OpenAI CLIP for image search. A scraper runs 24/7 to collect companies' websites.
Learning Parameters of 3D Simulations
https://ailivesim.comIt allows to generate realistic maritime simulations. I implemented some computer vision algorithms based on transfer learning to learn parameters of 3D scenes out of the simulation.
For example, I wanted to answer questions like if it is possible to predict the time of the day based on a picture?
Prediction of Hepatitis C Treatment Response
I performed an in-depth statistical analysis of the medical data provided. It involved dealing with missing values, categorical data, and censored data.
This work happened in collaboration with medical doctors.
Prediction of Fungal Infections
A fungal infection is a major cause of mortality for these patients.
I developed an early diagnostic tool to predict if an infection is starting for doctors to act quickly.
Web App to Find Relevant HTML Tags on Webpages
Some examples include:
• Identify the date of a published news article.
• Identify partners on a company's landing page.
• Identify logos of technology in use.
Data is continuously scraped from the web. A custom-built preprocessing engine makes sense of the HTML data and builds the suitable feature representation to apply machine learning approaches.
Relevant tags are automatically found using NLP machine learning techniques.
Adjusting and Designing Dependency Measures
They are ubiquitously used for feature selection, clustering comparisons and validation, splitting criteria in a random forest, and to infer biological networks.
This is my PhD work, and it proposes a series of contributions to improve the accuracy and scalability of machine learning techniques that extensively employ dependency measures.
Search Engines as a Service
Search can be performed on web pages, documents, or images, using textual queries and images as queries. All tech is implemented with open source technology.
Education
PhD Degree in Machine Learning
University of Melbourne - Melbourne, Australia
Master's Degree in Computer Engineering
University of Padova - Padova, Italy
Bachelor's Degree in Biomedical Engineering
University of Padova - Padova, Italy
Certifications
AWS Certified Solutions Architect Professional
Amazon Web Services
AWS Certified Database - Specialty
Amazon Web Services
AWS Certified Data Analytics - Specialty
Amazon Web Services
AWS Certified SysOps Administrator Associate
AWS
AWS Certified Solutions Architect Associate
AWS
AWS Certified Machine Learning - Specialty
Amazon Web Services
AWS Certified Developer - Associate
Amazon Web Services
AWS Certified Cloud Practitioner
Amazon Web Services
Artificial Intelligence
Udacity
Machine Learning
Coursera
Skills
Libraries/APIs
PyTorch, TensorFlow, Pandas, Scikit-learn, SciPy, Natural Language Toolkit (NLTK), Stanford NLP, Ggplot2, Matplotlib, XGBoost, NumPy, TensorFlow Deep Learning Library (TFLearn), Stripe, REST APIs, Beautiful Soup, PIL, OpenCV, OpenGL, Fabric, jQuery, Stripe API, Google APIs, Keras, Dask, Vue, Google Location API
Tools
LaTeX, MATLAB, Weka, Stanford NER, Stanford CoreNLP, IPython, IPython Notebook, Microsoft AI, Amazon SageMaker, GitHub, ChatGPT, Open Neural Network Exchange (ONNX), Jupyter, Git, TensorBoard, GitLab, Scikit-image, Ansible, Azure Machine Learning, Jenkins, HoloLens, Canvas, Google Sheets, Asana, GitLab CI/CD, Amazon Elastic Container Service (ECS), Boto, Boto 3, You Only Look Once (YOLO)
Languages
Python, Java, R, SQL, C#, C, Active Server Pages (ASP), CSS, HTML, JavaScript, C++, UML
Frameworks
Django, Streamlit, Flask, React Native, ASP.NET, Bootstrap, OAuth 2, Selenium, JUnit, Spark, Spring Boot, LightGBM, Ruby on Rails (RoR)
Paradigms
Anomaly Detection, Object-oriented Programming (OOP), Web UX Design, Model View Controller (MVC), ETL, Microservices Architecture, Web UI Design, Functional Programming, Parallel Computing, Dynamic Programming, Business Intelligence (BI), Distributed Computing, Unit Testing, Agile Software Development, Continuous Delivery (CD), DevOps, Clean Code, Microservices, Scrum
Platforms
Windows, Visual Studio Code (VS Code), Jupyter Notebook, Web, Amazon EC2, Amazon Web Services (AWS), Linux, Docker, AWS Elastic Beanstalk, Google Cloud Platform (GCP), Android, iOS, X (formerly Twitter), Ubuntu, Magento, Drupal, Microsoft, Jina AI, AWS Lambda, Azure, Blockchain
Storage
Databases, PostgreSQL, MySQL, JSON, Apache Hive, Amazon S3 (AWS S3), Data Pipelines, Google Cloud, SQLite, Amazon DynamoDB, Database Modeling
Other
Google Colaboratory (Colab), Machine Learning, Data Science, Artificial Intelligence (AI), University Teaching, Data Mining, Conference Speaking, Presentations, Information Theory, Clustering, Classification Algorithms, Feature Selection, Linear Algebra, Hugging Face, Cloud, Natural Language Processing (NLP), A/B Testing, Linear Regression, Deep Learning, Correlational Analysis, Intrinsic Dimensionality, Computer Vision, Generative Adversarial Networks (GANs), Web Search, Random Forests, Regression, Technical Writing, Predictive Analytics, Predictive Modeling, Full-stack, Facial Recognition, Transformers, BERT, Scraping, Web Scraping, Convolutional Neural Networks (CNNs), Hypothesis Testing, Decision Trees, Data Reporting, Data Visualization, Big Data, Data Cleaning, Random Forest Regression, Information Retrieval, Search Engines, Data Engineering, Learning Transfer, Neural Networks, Artificial Neural Networks (ANN), Image Processing, English, Entity Extraction, Data Analysis, Data Analytics, Natural Language Understanding (NLU), Image Analysis, Image Analytics, Text Mining, Statistical Data Analysis, Language Models, Competitive Programming, OpenAI, Generative Pre-trained Transformer 2 (GPT-2), Generative Pre-trained Transformer 3 (GPT-3), Text Generation, Text Analytics, Information Extraction, Statistical Analysis, Generative Artificial Intelligence (GenAI), API Integration, Team Leadership, Generative Pre-trained Transformers (GPT), Supervised Machine Learning, GPU Computing, NVIDIA TensorRT, Explainable Artificial Intelligence (XAI), PDF Scraping, OpenAI GPT-4 API, OpenAI GPT-3 API, Computer Vision Algorithms, Benchmarking, LLM, Speech Synthesis, NLU, Deep Neural Networks (DNNs), Diffusion Models, Llama, Image Recognition, R&D, JupyterLab, Open-source LLMs, Biomedical Skills, Time Series Analysis, Algorithms, Distributed Systems, Operations Research, Optimization, Augmented Reality (AR), Association Rule Learning, Analysis, Writing & Editing, Back-end, Front-end, Web Development, Segmentation Algorithms, Data Scraping, Web Crawlers, Startups, Lean Startups, Data Architecture, Recurrent Neural Networks (RNNs), Entity Relationships, Object Detection, Object Recognition, Cloud Infrastructure, APIs, DALL-E, Text to Speech (TTS), Analysis of Variance (ANOVA), Text Classification, Operating Systems, Algebra, Robotics, Signal Processing, Digital Electronics, Telemedicine, Biometrics, Microsoft Cognitive Toolkit (CNTK), Linear Optimization, Mixed Reality (MR), Market Research & Analysis, Stakeholder Interviews, Interviews, Medicine, Ajax, PayPal, OAuth, Statistics, Time Series, Geospatial Data, Servers, CI/CD Pipelines, Federated Learning, AWS DevOps, Elastic Load Balancers, Amazon RDS, Experimental Design, IT Project Management, Support Vector Machines (SVM), Google Drive, Feature Engineering, Scientific Data Analysis, 3D, Business Development, Business Design, Typesense, Reinforcement Learning, Amazon Machine Learning, Location Services, DreamBooth, Stable Diffusion, Midjourney, Speech Recognition, Transformer-XL, Multivariate Analysis (MVA)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring