Verified Expert in Engineering
Paulo is a data scientist with four years of experience in multiple lines of business. With Python as the main stack, he worked with numerous machine learning algorithms, data analysis, visualization, and hypothesis testing such as A/B, statistical analysis, and even data engineering work. Paulo has an engineering background, and problem-solving comes naturally to him.
Python, Google Cloud Platform (GCP), Amazon Web Services (AWS), Jupyter Notebook, PyCharm, Visual Studio Code (VS Code)
The most amazing...
...thing I've done is use data science to reduce the number of students who dropped out of college.
Oko Exchange Inc.
- Used OpenAI's Large Language Models (LLMs) APIs (GPT-3.5 Turbo and GPT-4) to parse data from unstructured text into a structured format.
- Utilized Azure Document Intelligence (previously Azure Form Recognizer) to extract the text from files and LangChain and vector stores to leverage LLMs on large text files.
- Used AWS Lambda for model serving and Amazon S3 (AWS S3) for storing files.
- Created predictive models to direct debt-recovering agencies on whom to approach.
- Built models to direct the chance to contact the debtor, making the company's approaches more effective.
- Migrated from Pandas to Databricks to process large heaps of data.
Data Engineer/Analyst for Qlik Sense
CBD Industries, LLC
- Developed an ETL (Extract-Transform-Load) architecture inside of Qlik Sense.
- Integrated multiple third-party APIs into Qlik Sense.
- Utilized AWS services to scale the solution for a big data context.
- Developed a dynamic pricing algorithm for a hotel chain.
- Performed ad-hoc data analysis to help drive the business forward.
- Helped data analysts with their research to find inconsistencies, give feedback and provide overall technical support.
- Developed a dynamic pricing algorithm for a business to connect buyers and sellers of bottled gas.
- Helped with experiments to roll out new features in a data-driven way.
- Collaborated in the analytics chapter of the company to spread the data-driven culture.
- Helped the company to identify workers abusing their food spending on trips or with clients.
- Assisted the company in finding leaders who were not billing the clients correctly, causing money loss.
- Created a model to help operations know if they had enough computers for the new employees, based on past hiring behavior.
- Developed a lead-scoring model to help privately owned colleges obtain more students.
- Created a model to identify the risk of students abandoning college and provided insight on the necessary steps to avoid it.
- Improved the work of the company's data pipeline since it was built for small data, which became unfeasible.
- Developed a model to predict if a car was stolen based on tracker data and previously known user behavior.
- Improved the company's data pipeline using Spark since the previous one was no longer feasible for the amount of processed data.
- Analyzed data to determine if some previously developed models were working as expected.
- Created a model to predict a cow milk yield in a day.
- Helped the company find new marketing places based on milk producers' public data.
- Developed an IoT device to monitor the milk quality in a tank.
College Dropout Prediction
Students drop out mainly because they face financial hardships, live too far from the campus, can't manage to work and study simultaneously, or even struggle academically and think it's not worth the effort.
Dropping out is a massive problem for the college since the college will miss out on years of revenue from those students. Therefore, it's good for the college to give short-term incentives to hold students in the long run.
With that in mind, I developed a machine learning model to identify the risk and the cause of dropping out. Finally, I provided insight on what incentive the college could offer in trying to hold students.
Car-theft Prediction Using Tracking Data
The project I worked on revolved around tracking the users' data, establishing the user's typical behavior using one machine learning model, and then predicting if the car is being stolen using another machine learning model. The objective was to predict these events even before the user reported them to speed up the process of retrieving the car.
For this project, I used Python as the programming language. For the data processing part, we used Apache Spark on the Databricks platform since it was a lot of data, and processing on a single machine was too slow for the requirements (it was time sensitive). The historical data storage was on a MongoDB database, and the API we used to serve the model was Flask.
Dynamic Pricing to Sell Cooking Gas Bottles
However, once the gas runs out while a person is cooking, they want to have a new can delivered to their home ASAP since not having it may ruin their meals.
With that in mind, the company's business connected vendors and clients through a mobile app. The issue was that these vendors were not used to fierce competition and were very displeased with us.
To calm the situation, we developed a dynamic pricing algorithm using machine learning to maintain the prices at a sustainable level for the vendors while also being advantageous for the clients.
For this project, I used Python for the programming part, Flask to serve my model, and Docker to containerize the model with the API.
Pandas, REST APIs, XGBoost, TensorFlow
BigQuery, Tableau, GitHub, PyCharm, Git, Postman, Microsoft Power BI, Pytest, Qlik Sense, Azure ML Studio, Apache Airflow
Data Science, ETL, Database Design, Azure DevOps, Business Intelligence (BI)
Jupyter Notebook, Google Cloud Platform (GCP), Visual Studio Code (VS Code), Amazon Web Services (AWS), Docker, Azure, Android, Kubernetes, AWS Lambda, Databricks
Data Pipelines, Databases, SQL Server 2016, MySQL, Redis, Relational Databases, Data Integration, MongoDB, PostgreSQL, Amazon S3 (AWS S3), Data Lakes
Machine Learning, Data Analysis, Data Visualization, Software Development, Statistics, Algorithms, API Integration, Analytics, Data, ETL Tools, Data Reporting, Data Analytics, Big Data, Linear Regression, Clustering, Dashboards, Predictive Modeling, Predictive Analytics, Statistical Analysis, Statistical Data Analysis, Mathematical Analysis, Mathematics, Statistical Methods, Back-end, APIs, Data Engineering, Data Mining, Signal Processing, Hospitality, Google BigQuery, Data Warehousing, Cloud, Artificial Intelligence (AI), Industrial IT, Google Data Studio, Natural Language Processing (NLP), Web Scraping, Azure Data Factory, Dremio, GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, OpenAI GPT-3 API
Flask, Apache Spark, Spark, React Native, Swagger
Bachelor's Degree in Control and Automation Engineering
Federal University of Minas Gerais (UFMG) - Belo Horizonte, Minas Gerais, Brazil
Master's Degree in Control Engineering
Lund University - Lund, Skane, Sweden
Natural Language Processing Nanodegree