Thiago De Sousa Silveira
Verified Expert in Engineering
Machine Learning Developer
Currently, Thiago is working as a machine learning and algorithm engineer with a focus on sentiment analysis, natural language processing, text classification, and recommender systems. He has a master’s degree in computer science from Tsinghua University and over five years of experience developing with Python specifically with machine learning, data processing, and data scraping.
Portfolio
Experience
Availability
Preferred Environment
IntelliJ IDEA, PyCharm, Sublime Text, Anaconda, Git, MacOS
The most amazing...
...code I've written was a semisupervised sentiment analysis method for social media texts. The project was eventually published in a journal.
Work Experience
Machine Learning Engineer
Giance Technologies
- Created aspect-based sentiment analysis methods (supervised and unsupervised) for polarity classification of social media posts and new sources.
- Developed a neural network method for text classification focused on news categorization for multiple languages.
- Constructed deep neural methods for aspect extraction. The tools were used together with the developed sentiment analysis method for social media content.
Big Data Analyst
Alpha Lawyer
- Developed a method for calculating document similarity on a large collection of text documents. As the documents were stored in HBase, we created a MapReduce method for calculating cosine similarity between all pairs documents; duplicate or very similar documents were removed.
- Created a neural network model for topic segmentation. The documents had a general format of topics. A time distributed bi-directional long-short time memory network was created to find the correct sentences in which the document should be split.
- Built an annotation tool called YeddaSeg for tagging topic segments in documents. The annotated documents were used for training the topic segmentation model.
Summer Intern
CSIRO
- Built a crawler for gathering the map coordinates of a transportation station once given an address.
- Developed a transportation simulator based on a random walk model for simulating layers of public transportation.
Experience
UNuSUAL | Unified Unexpectedness Evaluation Sentiment Analysis Tool
https://github.com/fhmourao/UNuSUAL• https://dl.acm.org/citation.cfm?id=3019760
Technologies: Python, Java
How Good is Your Recommender System? | A Survey on Evaluations in Recommendation
In the extent of offline evaluations, some traditional assessment concepts were explored, such as accuracy, root-mean-square error, and P/N for top-k recommendations.
In recent years, more research has proposed some new concepts such as novelty, diversity, and serendipity. These concepts have been addressed to satisfy the users’ requirements. We proposed numerous definitions and metrics in the previous work.
Due to the absence of a specific summarization on evaluations of recommendation combining traditional metrics and recent progress, this paper surveys and organizes the primary research that present definitions about concepts and propose metrics or strategies to evaluate proposals. Besides, this survey also settles the relationship between the concepts, categorizes them according to their objectives, and suggests potential future topics on user satisfaction.
SACI: Sentiment Analysis by Collective Inspection on Social Media Content
We assumed that a better collective analysis could be had when exploiting overlaps among distinct posts of the collection, so we proposed SACI. SACI is sentiment analysis by collective inspection: a lexicon-based unsupervised method that extracts collective sentiment without individual classifications. We based SACI on a directed transition graph among terms of a post set and used a prior classification of these terms regarding their roles in consolidating opinions. Paths represent subsets of posts on the chart, and the collective opinion is defined by traversing all of the ways.
We demonstrated that collective analysis outperforms individual one concerning approximating collection opinions. However, assessments on SACI show that proper individual classifications do not guarantee reliable aggregate analyses and vice-versa. Further, SACI fulfills simultaneous requirements of efficacy, efficiency, and handle of dynamicity posed by high demanding scenarios. Indeed, the consolidation of a SACI-based web tool for real-time analysis of tweets evinces the usefulness.
LEGi: Context-aware Lexicon Consolidation by Graph Inspection
Despite all efforts, the literature still lacks proposals that address both requirements. In this sense, we propose LEGi, a corpus-based method for consolidating context-aware sentiment lexicons. We based it on a semi-supervised strategy for the propagation of lexicon-semantic classes on a transition graph of terms.
Empirical analyses on two distinct domains, derived from Twitter, demonstrate that LEGi outperformed four well-established methods for lexicon consolidation. Further, we found that LEGi's lexicons may improve the quality of the sentiment analysis performed by a traditional approach in the literature. Thus, our results point out LEGi as a promising method for consolidating lexicons in high demanding scenarios, such as social media.
FAiR: A Framework for Analyses and Evaluations on Recommender Systems
For this reason, distinct frameworks have been developed to ease the deployment of recommender systems in research and production environments. In the present work, we performed an extensive study of the most popular evaluation metrics, organizing them into three groups: effectiveness-based, parallel dimensions of quality, and domain profiling. Further, we consolidated a framework named FAiR to help researchers. It helped to evaluate their recommender systems using these metrics and to identify the characteristics of data collections that may intrinsically affect a RS's performance. FAiR is compatible with the output format.
Combining Data Mining Techniques to Enhance Cardiac Arrhythmia Detection
Machine learning algorithms have been presented as promising tools in aid of CA diagnoses, with emphasis on those related to automatic classification. However, these algorithms suffer from two traditional problems related to classification: (1) excessive number of numerical attributes generated from the decomposition of an ECG; and (2) the number of patients diagnosed with CAs is much lower than those classified as “normal” leading to very unbalanced datasets.
In this paper, we combined in a coordinated way several data mining techniques, such as clustering, feature selection, oversampling strategies, and automatic classification algorithms to create more efficient classification models to identify the disease. In our evaluations, using a traditional dataset provided by the UCI, we improved significantly the effectiveness of Random Forest classification algorithm achieving an accuracy of over 88%, a value higher than the best already reported in the literature.
A Framework for Unexpectedness Evaluation in Recommendations
In this context, this work implements the most referenced metrics, consolidating a framework of unexpectedness assessments in the recommendation—allowing us to characterize, compare, and combine all those metrics.
Empirical evaluations on real data and different recommender systems demonstrated our framework's usefulness. Besides showing that the existing metrics diverge about which recommender system, the framework enabled the combining of all metrics so that we could capture different perspectives.
We aimed to help researchers and professionals learn about the recommender systems. They needed to understand the actual impact of distinct metrics concerning unexpectedness as well as how to select the proper metric to highlight gains or loses.
Using Aspect-based Analysis for Explainable Sentiment Predictions
http://tcci.ccf.org.cn/conference/2019/papers/XAI98.pdfargue that aspect-based analysis can help deriving deep interpretation of the sentiment predicted by a document-level analysis, working as a proxy method.
We propose a framework to verify if predictions produced by a trained aspect-based model can be used to explain document-level sentiment classifications, by calculating an agreement metric between the two models.
In our case study with two benchmark datasets, we achieve 90% of agreement between the models, thus showing the an aspect-based analysis should be favored for the sake of explainability.
YEEDASeg | Text Segmentation Annotation Tool
https://github.com/ThiagoSousa/YEEDASegText segmentation, in this case, concerns itself in finding chunks/sequential blocks of texts that are semantically close in the text. Later this can be used for automatic text segmentation models, such as paragraph segmentation.
Skills
Languages
Python, Java, AspectJ
Libraries/APIs
Scikit-learn, Keras, TensorFlow
Tools
VADER Sentiment Analysis, PyCharm, Sublime Text, Git, IntelliJ IDEA
Platforms
Jupyter Notebook, MacOS, Anaconda, Apache Kafka
Storage
MySQL, MongoDB, NoSQL, HBase, Elasticsearch
Other
Machine Learning, Sentiment Analysis, Classification Algorithms, Clustering Algorithms, Text Classification, Custom BERT, Natural Language Processing (NLP), Recommendation Systems, GPT, Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Deep Neural Networks, Annotation Processors
Frameworks
Flask, Hadoop
Paradigms
MapReduce, Data Science
Education
Master's Degree in Computer Science
Tsinghua University - Beijing, China
Bachelor's Degree in Computer Science
Universidade Federal de São João Del Rei - São João Del Rei, Minas Gerais, Brazil
Certifications
IELTS Academic
British Council
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring