
Henrique de Paula Lopes
Verified Expert in Engineering
Data Engineering Developer
Porto Alegre - State of Rio Grande do Sul, Brazil
Toptal member since October 2, 2018
Henrique is a skilled data analyst and engineer with a computer science and mathematics background. Experienced in data engineering, ETL processes, and optimizing systems for business intelligence, he has a proven track record in migrating and scaling data pipelines, integrating 3rd-party systems, and developing tools to improve operations and reduce costs. With expertise in Python, PySpark, SQL, and AWS, Henrique delivers accurate and efficient data solutions.
Portfolio
Experience
- SQL - 7 years
- Python - 6 years
- Data Engineering - 5 years
- ETL - 5 years
- Database Design - 3 years
- Data Lake Design - 2 years
- Data Warehousing - 2 years
- Data Lakes - 2 years
Availability
Preferred Environment
GitHub, MacOS, Apache Pulsar
The most amazing...
...problem I've solved was the migration of a genomic data ETL pipeline from a shell script-based job to an Airflow-managed Python job.
Work Experience
Data Analyst
PepsiCo
- Conducted in-depth quality checks on data pipelines using, identifying, and addressing discrepancies before the data was integrated into business systems, enhancing trust and satisfaction across various departments.
- Reverse-engineered a series of interconnected dbt models, documenting and mapping data sources to model fields, which significantly reduced troubleshooting time and enhanced data mismatch detection.
- Managed the seamless integration of 3rd-party logistics providers, e.g., FedEx and UPS, with supply chain and eCommerce platforms, ensuring accurate data flow and resolving any issues through proactive communication with service teams.
Python Developer
Jared See
- Developed a script to automatically extract annotations from a PDF file.
- Exported extracted annotations to Excel spreadsheets with proper formatting.
- Made the script to extract annotations available through a terminal command that could be installed anywhere through pip.
Data Engineer
Ilegra
- Migrated dozens of T-SQL procedures and data transformation workflows from Azure to Databricks, leveraging Spark with Python to optimize performance and scalability, reducing costs and processing time.
- Implemented complex business rules using PySpark's parallel processing capabilities, improving data processing efficiency and accelerating workflows by more than 100%, ensuring alignment with organizational objectives.
- Handled communication between management and developers to identify and correctly translate business needs into code.
Senior Software Engineer
Virtasant
- Developed Python-based analysis tools to evaluate cloud computing usage, identifying cost-saving opportunities through strategic service migrations to optimized environments, resulting in up to 80% savings.
- Designed and maintained ETL pipelines to collect, preprocess, and store cloud usage data for analysis tools, leveraging AWS S3 for storage and Amazon Athena for efficient data access.
- Worked on validations for the data used to find savings opportunities, creating and delegating tasks to teammates whenever a validation failed.
Data Engineer
Color
- Maintained legacy ETL workflows that ran on Python scripts orchestrated with Celery, accessing data through Django ORM and AWS CDK to deliver customer reports in the form of tables stored in Google BigQuery.
- Replicated the company's primary database to Google BigQuery using Fivetran, speeding up data processing and facilitating the development of ETL jobs.
- Led the migration of legacy ETL jobs to dbt models on Google BigQuery, optimizing performance by accelerating result generation and achieving significant improvements in processing speed and cost efficiency.
- Orchestrated ETL jobs to Airflow, enhancing reliability with retry capabilities. Improved failure monitoring and a consistent scheduling platform.
- Migrated a shell script-based data pipeline processing genomic data to a Python-based ETL workflow in Airflow, improving automation and scalability.
Python Developer and Data Scientist
Bold Metrics
- Contributed to the migration of AWS resources to a CDK-based infrastructure-as-code stack, simplifying resource management for the several AWS services required by the company's products.
- Developed scripts to ingest and parse customer data from human-readable formats like Excel, accelerating the quality assurance process for training the company's recommendation systems.
- Contributed to modifications in the API structure, particularly in-log storage, to enable efficient processing by big data tools.
Data Analyst
Warren Corretora de Titulos e Valores Mobiliarios e Cambio Ltda
- Automated key BI metrics using SQL queries and Metabase dashboards, including customer segmentation insights and monthly churn rates.
- Implemented an event-tracking pipeline with Snowplow, leveraging AWS services to enable the company to own and analyze data in multiple ways.
- Built and maintained a data lake on AWS, managing data cataloging and access with services like S3 and AWS Glue, along with Python and PySpark-based ETL jobs for data ingestion.
- Headed collaborative efforts across teams to define critical business metrics and indicators, ensuring alignment and accessibility of data across departments.
- Maintained a data warehouse, transforming data from various sources into meaningful business metrics and supporting Metabase dashboards for reporting.
Full-stack Developer
Bananas Music Branding
- Handled the company's internal web-based management system, overseeing everything from the deployment tools to small changes in the system's interface. The system was made using PHP and React.
- Contributed to migrating the old PHP-based system to a new one built using Python and Django.
- Oversaw CI/CD processes related to the migration described in the item above.
Intern
Simbio
- Contributed to a proprietary point-of-sale system that provided small business management tools. The system was built on Odoo, Python, Django, and PostgreSQL.
- Made eventual changes to the company's website, mainly using HTML5 and CSS.
- Handled changes in the system's database schema, which was built using PostgreSQL.
Intern
Sthima (later renamed to Fleye)
- Created a Node.js-based system that reads data generated from a call router (used in call centers) and displays it in a human-friendly way using dashboards built with AngularJS components.
- Contributed to the management system of a telecom company, which provided tools for managing customers, technicians, and active/inactive cable routes using Django's MVC architecture.
- Optimized Django ORM queries used by the systems the company developed.
Web Developer
Federal University of Health Sciences of Porto Alegre
- Maintained a web-based scientific paper submission system based on PHP and MySQL on the back end and JavaScript and CSS on the front end. Professors and students used it from time to time.
- Created and tested a new system based on C#, which students used to make reservations for study rooms.
- Maintained several other systems used by the university's students.
- Handled some deployment-related tasks using TortoiseSVN, making changes to the code hosted on the university's servers.
- Worked on reports based on queries made on the students' database using MySQL and Oracle.
Experience
Project Euler
https://github.com/henriquedpl/project-eulerAccording to the project's documentation, I am more than halfway to solving over 115 problems, a feat achieved by only 1% of the platform's users.
Education
Bachelor's Degree in Computer Science
Federal University of Rio Grande do Sul - Porto Alegre, Brazil
Exchange Program in Computer Science
Rijksuniversiteit Groningen - Groningen, Netherlands
Certifications
AWS Certified Solutions Architect – Associate
Amazon Web Services Training and Certification
Skills
Libraries/APIs
PySpark, Matplotlib, React, Pandas, Node.js, Standard Template Library (STL), Keras, Scikit-learn, NumPy, TensorFlow, PyTorch, SQLAlchemy
Tools
AWS Glue, Jupyter, Seaborn, Spreadsheets, Amazon Elastic Block Store (EBS), Amazon Elastic MapReduce (EMR), Atom, GitHub, Odoo, PyCharm, Amazon Athena, Snowplow Analytics, Git, Celery, BigQuery, Apache Airflow, Microsoft Excel
Languages
SQL, Python, Python 3, PHP, Snowflake, JavaScript, HTML, CSS, C#, Bash Script, T-SQL (Transact-SQL)
Paradigms
ETL, Database Design, Business Intelligence (BI), REST, Functional Programming, Automation
Storage
MySQL, Amazon S3 (AWS S3), Amazon Aurora, PostgreSQL, Redshift, Data Lakes, Data Lake Design, Data Pipelines, Databases, MongoDB, Data Integration, Database Integration, Amazon DynamoDB
Frameworks
Spark, Django, Ruby on Rails 5, Apache Spark, Django REST Framework, AngularJS, Material UI, Flask
Platforms
Amazon Web Services (AWS), Amazon EC2, Jupyter Notebook, AWS Lambda, Ubuntu, Oracle, OpenERP, Mixpanel, MacOS, Kubernetes, Google Cloud Platform (GCP), Databricks, AWS IoT
Other
Data Engineering, APIs, Algorithms, Data Visualization, Dashboards, Dashboard Design, Reinforcement Learning, MVC Frameworks, Metabase, Amazon RDS, Amazon Redshift, Data Analytics, Back-end, Parquet, Data Warehousing, Data Warehouse Design, Fintech, AWS Database Migration Service (DMS), Data Science, ETL Tools, Machine Learning, Image Processing, Optimization, Software Engineering, 3D Image Processing, Fivetran, Data Build Tool (dbt), Data Wrangling, Data Analysis, Data Architecture, Cloud Architecture, AWS Cloud Architecture, Architecture, ETL Development, Google BigQuery, Document Parsing, PDF, Apache Pulsar, Business Requirements, Mathematics, Mathematical Logic, GitHub Actions
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring