
Matheus Gaignoux Raiol
Verified Expert in Engineering
Data Engineer and Developer
São Paulo - State of São Paulo, Brazil
Toptal member since November 29, 2022
Matheus is a data engineer who enjoys data modeling, architecting, and developing data pipelines. He is mainly interested in use cases in the finance and retail industries. Matheus aims to design simple, robust solutions with high functionality and low maintenance levels.
Portfolio
Experience
- Data Engineering - 4 years
- Python - 4 years
- PySpark - 4 years
- Databricks - 3 years
- Apache Airflow - 3 years
- SQL - 3 years
- Delta Lake - 2 years
- Azure Data Factory (ADF) - 2 years
Availability
Preferred Environment
Databricks, Azure Data Factory (ADF), Apache Airflow, Python, Spark, PySpark, Pandas, Docker, Azure, Data Lakes
The most amazing...
...thing I've implemented is a unified medallion architecture for files with different schemas from several sources.
Work Experience
Senior Data Engineer
Shopee
- Designed and implemented a custom paradigm to feed the fraud analysis team data mart tables.
- Developed optimized jobs using Spark SQL, preventing large memory use and queue traffic congestion.
- Worked actively on large datasets using Spark as a processing tool. The pipeline development used software engineering best practices and a standard pattern aiming to reach an easy understanding for future modifications.
Senior Data Engineer
Via
- Developed a data pipeline to ingest files from several sources to be transformed into a unified set of tables for chargeback analysis.
- Designed and implemented a process to extract, transform, load, and analyze purchase orders made in the marketplace web platform. The goal of the analysis step was to identify fraudulent orders.
- Created pipelines to feed data into tables of a fraud team datamart. The comprehension of fraud concepts and how the transactional business rules were reflected in the available data was required to guarantee clarity to downstream applications.
Data Engineer
Inmetrics
- Refactored data pipelines to change the modeling of a data warehouse to a star schema, increasing query performance.
- Implemented a machine learning workflow to use in a capacity planning platform.
- Developed a near real-time application to ingest data from a third-party company into a data lake.
Data Engineer
EY
- Created a data warehouse for BI use cases to cover several credit products of a huge Brazilian bank.
- Developed data pipelines to feed tables inside the data warehouse and optimized queries since the company produced large daily data loads.
- Generated pipelines for daily updates on feature stores. This process was based on the knowledge of business rules, calculations, and how the ML models were developed and used.
Experience
Fraud Analysis Data Pipeline
Medallion Architecture for Chargeback Analysis
• Staging layer (raw files)
• Bronze layer (same information as the previous stage but unifying the file schemas)
• Silver layer (last status of a set of keys)
• Gold layer (combined information)
The medallion architecture provided the needed flexibility to the extent that changes in some business rules could only be applied to the appropriate stage without compromising the rest. Furthermore, this architecture ensured a unified source of truth for any application after the gold layer. Rather than having each consumer process raw files in a potentially inconsistent way, building this ready-to-use refined business enabled a high level of reusability. Lastly, implementing programming best practices guaranteed quick completion of jobs and a lower level of cluster memory and cloud resources.
Data Mart Custom Paradigm
Education
Bachelor's Degree in Physics
Federal University of Pará - Pará, Brazil
Certifications
Certified Associate Developer for Apache Spark 3.0
Databricks
Skills
Libraries/APIs
PySpark, Pandas, Scikit-learn
Tools
Spark SQL, Apache Airflow, Amazon Athena, AWS Glue, Amazon Elastic MapReduce (EMR)
Languages
Python, SQL, Scala
Frameworks
Spark, Adaptive Query Execution (AQE), Apache Spark, Hadoop
Platforms
Databricks, Docker, Apache Kafka, Azure, Amazon Web Services (AWS), AWS Lambda
Storage
PostgreSQL, MySQL, Redshift, Microsoft SQL Server, Apache Hive, Data Pipelines, HDFS, Data Lakes, Amazon S3 (AWS S3)
Paradigms
ETL
Other
UDFs, DataFrames, Azure Data Factory (ADF), Delta Lake, Data Engineering, Data Wrangling, Azure Databricks, Azure Data Lake, APIs, SFTP, Data Warehousing, Data Marts, Applied Mathematics, Physics, Computational Physics, Data Modeling, Query Optimization, Data, ELT, Shell Scripting
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring