Henrique de Paula Lopes, Developer in Porto Alegre - State of Rio Grande do Sul, Brazil
Henrique is available for hire
Hire Henrique

Henrique de Paula Lopes

Verified Expert  in Engineering

Data Engineering Developer

Porto Alegre - State of Rio Grande do Sul, Brazil

Toptal member since October 2, 2018

Bio

Henrique is a skilled data analyst and engineer with a computer science and mathematics background. Experienced in data engineering, ETL processes, and optimizing systems for business intelligence, he has a proven track record in migrating and scaling data pipelines, integrating 3rd-party systems, and developing tools to improve operations and reduce costs. With expertise in Python, PySpark, SQL, and AWS, Henrique delivers accurate and efficient data solutions.

Portfolio

PepsiCo
SQL, Data Analysis, Python, Business Requirements, Data Pipelines...
Jared See
Python, Microsoft Excel, Automation, Document Parsing, PDF, Spreadsheets
Ilegra
Databricks, Spark, PySpark, T-SQL (Transact-SQL), Python, ETL, Data Engineering...

Experience

  • SQL - 7 years
  • Python - 6 years
  • Data Engineering - 5 years
  • ETL - 5 years
  • Database Design - 3 years
  • Data Lake Design - 2 years
  • Data Warehousing - 2 years
  • Data Lakes - 2 years

Availability

Full-time

Preferred Environment

GitHub, MacOS, Apache Pulsar

The most amazing...

...problem I've solved was the migration of a genomic data ETL pipeline from a shell script-based job to an Airflow-managed Python job.

Work Experience

Data Analyst

2024 - 2024
PepsiCo
  • Conducted in-depth quality checks on data pipelines using, identifying, and addressing discrepancies before the data was integrated into business systems, enhancing trust and satisfaction across various departments.
  • Reverse-engineered a series of interconnected dbt models, documenting and mapping data sources to model fields, which significantly reduced troubleshooting time and enhanced data mismatch detection.
  • Managed the seamless integration of 3rd-party logistics providers, e.g., FedEx and UPS, with supply chain and eCommerce platforms, ensuring accurate data flow and resolving any issues through proactive communication with service teams.
Technologies: SQL, Data Analysis, Python, Business Requirements, Data Pipelines, Apache Airflow, Snowflake, ETL, Data Engineering, Data Analytics, Spreadsheets

Python Developer

2024 - 2024
Jared See
  • Developed a script to automatically extract annotations from a PDF file.
  • Exported extracted annotations to Excel spreadsheets with proper formatting.
  • Made the script to extract annotations available through a terminal command that could be installed anywhere through pip.
Technologies: Python, Microsoft Excel, Automation, Document Parsing, PDF, Spreadsheets

Data Engineer

2023 - 2024
Ilegra
  • Migrated dozens of T-SQL procedures and data transformation workflows from Azure to Databricks, leveraging Spark with Python to optimize performance and scalability, reducing costs and processing time.
  • Implemented complex business rules using PySpark's parallel processing capabilities, improving data processing efficiency and accelerating workflows by more than 100%, ensuring alignment with organizational objectives.
  • Handled communication between management and developers to identify and correctly translate business needs into code.
Technologies: Databricks, Spark, PySpark, T-SQL (Transact-SQL), Python, ETL, Data Engineering, Data Analytics

Senior Software Engineer

2023 - 2023
Virtasant
  • Developed Python-based analysis tools to evaluate cloud computing usage, identifying cost-saving opportunities through strategic service migrations to optimized environments, resulting in up to 80% savings.
  • Designed and maintained ETL pipelines to collect, preprocess, and store cloud usage data for analysis tools, leveraging AWS S3 for storage and Amazon Athena for efficient data access.
  • Worked on validations for the data used to find savings opportunities, creating and delegating tasks to teammates whenever a validation failed.
Technologies: Python, SQL, Amazon Athena, PySpark, ETL, APIs, Amazon RDS, AWS Glue, Data Engineering

Data Engineer

2021 - 2023
Color
  • Maintained legacy ETL workflows that ran on Python scripts orchestrated with Celery, accessing data through Django ORM and AWS CDK to deliver customer reports in the form of tables stored in Google BigQuery.
  • Replicated the company's primary database to Google BigQuery using Fivetran, speeding up data processing and facilitating the development of ETL jobs.
  • Led the migration of legacy ETL jobs to dbt models on Google BigQuery, optimizing performance by accelerating result generation and achieving significant improvements in processing speed and cost efficiency.
  • Orchestrated ETL jobs to Airflow, enhancing reliability with retry capabilities. Improved failure monitoring and a consistent scheduling platform.
  • Migrated a shell script-based data pipeline processing genomic data to a Python-based ETL workflow in Airflow, improving automation and scalability.
Technologies: Django, Celery, BigQuery, Apache Airflow, Python 3, Python, ETL, ETL Tools, Data Warehousing, Fivetran, Data Build Tool (dbt), Kubernetes, Data Wrangling, PostgreSQL, SQL, Data Engineering, Databases, Data Pipelines, Cloud Architecture, AWS Cloud Architecture, Architecture, Data Integration, ETL Development, Google BigQuery, Google Cloud Platform (GCP), APIs, Amazon EC2, Amazon RDS, Amazon Web Services (AWS)

Python Developer and Data Scientist

2020 - 2021
Bold Metrics
  • Contributed to the migration of AWS resources to a CDK-based infrastructure-as-code stack, simplifying resource management for the several AWS services required by the company's products.
  • Developed scripts to ingest and parse customer data from human-readable formats like Excel, accelerating the quality assurance process for training the company's recommendation systems.
  • Contributed to modifications in the API structure, particularly in-log storage, to enable efficient processing by big data tools.
Technologies: Python, Data Science, Python 3, NumPy, GitHub, Git, Flask, TensorFlow, PyTorch, Jupyter, Amazon S3 (AWS S3), AWS Lambda, Databases, Data Pipelines, Cloud Architecture, AWS Cloud Architecture, Architecture, APIs, Amazon EC2, Amazon RDS, Amazon Web Services (AWS), Data Analytics, SQLAlchemy, Spreadsheets, Back-end

Data Analyst

2018 - 2020
Warren Corretora de Titulos e Valores Mobiliarios e Cambio Ltda
  • Automated key BI metrics using SQL queries and Metabase dashboards, including customer segmentation insights and monthly churn rates.
  • Implemented an event-tracking pipeline with Snowplow, leveraging AWS services to enable the company to own and analyze data in multiple ways.
  • Built and maintained a data lake on AWS, managing data cataloging and access with services like S3 and AWS Glue, along with Python and PySpark-based ETL jobs for data ingestion.
  • Headed collaborative efforts across teams to define critical business metrics and indicators, ensuring alignment and accessibility of data across departments.
  • Maintained a data warehouse, transforming data from various sources into meaningful business metrics and supporting Metabase dashboards for reporting.
Technologies: Machine Learning, PySpark, Apache Spark, Amazon Web Services (AWS), MongoDB, MySQL, Spark, Python, Redshift, Amazon Athena, AWS Glue, Data Lakes, Data Lake Design, Data Warehousing, Data Warehouse Design, Snowplow Analytics, Database Design, SQL, Data Visualization, Data Analysis, Data Engineering, Data Architecture, Databases, Pandas, Data Pipelines, Cloud Architecture, AWS Cloud Architecture, Architecture, Data Integration, Database Integration, ETL Development, Data Science, Dashboards, Business Intelligence (BI), Dashboard Design, Amazon EC2, Amazon RDS, Amazon Redshift, ETL, Data Analytics, Amazon DynamoDB

Full-stack Developer

2018 - 2018
Bananas Music Branding
  • Handled the company's internal web-based management system, overseeing everything from the deployment tools to small changes in the system's interface. The system was made using PHP and React.
  • Contributed to migrating the old PHP-based system to a new one built using Python and Django.
  • Oversaw CI/CD processes related to the migration described in the item above.
Technologies: MySQL, React, PHP, Databases, SQLAlchemy, Back-end

Intern

2017 - 2017
Simbio
  • Contributed to a proprietary point-of-sale system that provided small business management tools. The system was built on Odoo, Python, Django, and PostgreSQL.
  • Made eventual changes to the company's website, mainly using HTML5 and CSS.
  • Handled changes in the system's database schema, which was built using PostgreSQL.
Technologies: OpenERP, Odoo, PostgreSQL, Django, MVC Frameworks, Databases, Back-end

Intern

2017 - 2017
Sthima (later renamed to Fleye)
  • Created a Node.js-based system that reads data generated from a call router (used in call centers) and displays it in a human-friendly way using dashboards built with AngularJS components.
  • Contributed to the management system of a telecom company, which provided tools for managing customers, technicians, and active/inactive cable routes using Django's MVC architecture.
  • Optimized Django ORM queries used by the systems the company developed.
Technologies: AngularJS, Node.js, Django, MVC Frameworks, Databases, Flask, SQLAlchemy, Back-end

Web Developer

2011 - 2014
Federal University of Health Sciences of Porto Alegre
  • Maintained a web-based scientific paper submission system based on PHP and MySQL on the back end and JavaScript and CSS on the front end. Professors and students used it from time to time.
  • Created and tested a new system based on C#, which students used to make reservations for study rooms.
  • Maintained several other systems used by the university's students.
  • Handled some deployment-related tasks using TortoiseSVN, making changes to the code hosted on the university's servers.
  • Worked on reports based on queries made on the students' database using MySQL and Oracle.
Technologies: C#, Oracle, MySQL, JavaScript, CSS, HTML, PHP, Databases, APIs, Back-end

Experience

Project Euler

https://github.com/henriquedpl/project-euler
Developed solutions to a series of challenging problems from Project Euler at projecteuler.net/archives, which blends mathematics and computer science. I leveraged my background in computer science and ongoing mathematics studies to assess and enhance my problem-solving and analytical skills through this unique intersection of disciplines.

According to the project's documentation, I am more than halfway to solving over 115 problems, a feat achieved by only 1% of the platform's users.

Education

2011 - 2019

Bachelor's Degree in Computer Science

Federal University of Rio Grande do Sul - Porto Alegre, Brazil

2014 - 2015

Exchange Program in Computer Science

Rijksuniversiteit Groningen - Groningen, Netherlands

Certifications

MARCH 2024 - MARCH 2027

AWS Certified Solutions Architect – Associate

Amazon Web Services Training and Certification

Skills

Libraries/APIs

PySpark, Matplotlib, React, Pandas, Node.js, Standard Template Library (STL), Keras, Scikit-learn, NumPy, TensorFlow, PyTorch, SQLAlchemy

Tools

AWS Glue, Jupyter, Seaborn, Spreadsheets, Amazon Elastic Block Store (EBS), Amazon Elastic MapReduce (EMR), Atom, GitHub, Odoo, PyCharm, Amazon Athena, Snowplow Analytics, Git, Celery, BigQuery, Apache Airflow, Microsoft Excel

Languages

SQL, Python, Python 3, PHP, Snowflake, JavaScript, HTML, CSS, C#, Bash Script, T-SQL (Transact-SQL)

Paradigms

ETL, Database Design, Business Intelligence (BI), REST, Functional Programming, Automation

Storage

MySQL, Amazon S3 (AWS S3), Amazon Aurora, PostgreSQL, Redshift, Data Lakes, Data Lake Design, Data Pipelines, Databases, MongoDB, Data Integration, Database Integration, Amazon DynamoDB

Frameworks

Spark, Django, Ruby on Rails 5, Apache Spark, Django REST Framework, AngularJS, Material UI, Flask

Platforms

Amazon Web Services (AWS), Amazon EC2, Jupyter Notebook, AWS Lambda, Ubuntu, Oracle, OpenERP, Mixpanel, MacOS, Kubernetes, Google Cloud Platform (GCP), Databricks, AWS IoT

Other

Data Engineering, APIs, Algorithms, Data Visualization, Dashboards, Dashboard Design, Reinforcement Learning, MVC Frameworks, Metabase, Amazon RDS, Amazon Redshift, Data Analytics, Back-end, Parquet, Data Warehousing, Data Warehouse Design, Fintech, AWS Database Migration Service (DMS), Data Science, ETL Tools, Machine Learning, Image Processing, Optimization, Software Engineering, 3D Image Processing, Fivetran, Data Build Tool (dbt), Data Wrangling, Data Analysis, Data Architecture, Cloud Architecture, AWS Cloud Architecture, Architecture, ETL Development, Google BigQuery, Document Parsing, PDF, Apache Pulsar, Business Requirements, Mathematics, Mathematical Logic, GitHub Actions

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring