Henrique is available for hire

Henrique de Paula Lopes

Verified Expert in Engineering

Data Warehousing Developer

Location

Porto Alegre - State of Rio Grande do Sul, Brazil

Toptal Member Since

October 2, 2018

After eight years of experience as a software developer, Henrique developed an interest in working with all things data-related. He then quickly adapted to the field of data analysis and engineering and learned how to implement data pipelines and perform data modeling and analysis. Now with six years of experience as a data analyst and engineer, Henrique has become proficient in query optimization and data integration to deliver fast and reliable solutions.

SQL MySQL Python Metabase PHP PostgreSQL Database Design Pandas Ruby on Rails 5 Spark PySpark ETL Amazon Web Services (AWS)Amazon S3 (AWS S3)Amazon Aurora OpenERP

Portfolio

Virtasant

Python, SQL, Amazon Athena, PySpark, ETL

Color

Django, Celery, BigQuery, Apache Airflow, Python 3, Python, ETL, ETL Tools...

Bold Metrics

Python, Data Science, Python 3, NumPy, GitHub, Git, Flask, TensorFlow, PyTorch...

Experience

SQL - 7 years Python - 6 years Database Design - 3 years ETL - 2 years Data Lake Design - 2 years Data Engineering - 2 years Data Warehousing - 2 years Data Lakes - 2 years

Availability

Part-time

Preferred Environment

GitHub, MacOS, PyCharm

The most amazing...

...problem I've solved was the migration of a genomic data ETL pipeline from a shell script-based job to an Airflow-managed Python job.

Work Experience

Senior Software Engineer

2023 - 2023

Virtasant

Worked on Python-based analysis tools that processed cloud computing usage for several companies, looking for opportunities to migrate services to a different environment with the goal of cutting down costs.
Worked on the ETL tools that collected, preprocessed, and stored the data used by the tools described above, storing them in AWS S3 and accessing them through Amazon Athena.
Worked on validations for the data used to find savings opportunities, creating and delegating tasks to teammates whenever a validation failed.

Technologies: Python, SQL, Amazon Athena, PySpark, ETL

Data Engineer

2021 - 2023

Color

Maintained old ETL jobs comprised of Python scripts orchestrated by Celery that accessed data through Django ORM to deliver customer reports in tables in a data warehouse stored in Google BigQuery.
Replicated the company's main database to Google BigQuery using Fivetran, speeding up the ingestion steps in the company's data lifecycle.
Led efforts to convert old failing jobs mentioned above to run as dbt models built on BigQuery, generating the same results in a fraction of the time and freeing resources on strained servers.
Handled the conversion of remaining jobs to run in Airflow, gaining retry capabilities, additional insight into failures, and a consistent scheduling platform.
Migrated a shell script-based data pipeline that processed genomic data to an ETL job that used Python in Airflow.

Technologies: Django, Celery, BigQuery, Apache Airflow, Python 3, Python, ETL, ETL Tools, Data Warehousing, Fivetran, Data Build Tool (dbt), Kubernetes, Data Wrangling, PostgreSQL, SQL, Data Engineering, Databases, Data Pipelines, Cloud Architecture, AWS Cloud Architecture, Architecture, Data Integration, ETL Development, Google BigQuery, Google Cloud Platform (GCP)

Python Developer and Data Scientist

2020 - 2021

Bold Metrics

Contributed to a project that aimed to migrate AWS resources to a CDK-based infrastructure-as-code stack, simplifying resource management for the several AWS services required by the company's products.
Devised scripts that ingested and parsed customer data from human-readable formats such as Excel, speeding up the quality assurance step of the training of the company's recommendation systems.
Handled the changes in the API structure, especially on how it stored its logs, so big data tools could process them later.

Technologies: Python, Data Science, Python 3, NumPy, GitHub, Git, Flask, TensorFlow, PyTorch, Jupyter, Amazon S3 (AWS S3), AWS Lambda, Databases, Data Pipelines, Cloud Architecture, AWS Cloud Architecture, Architecture, APIs

Data Analyst

2018 - 2020

Warren Corretora de Titulos e Valores Mobiliarios e Cambio Ltda

Automated several key BI metrics—from customer segmentation insights to monthly churn rates—using SQL queries and Metabase dashboards.
Deployed an event-tracking pipeline built with Snowplow on top of several AWS services so the company could own and analyze the data in many different ways.
Created and maintained a data lake on AWS and managed data cataloging and access using services like S3 and AWS Glue. The ingestion step included several ETL jobs based on Python and PySpark.
Created and maintained a data lake on AWS and managed data cataloging and access using services like S3 and AWS Glue.
Led efforts to define each team's important business metrics and indicators and how to derive them from the company's data. This was done with the help of people from each of the company's teams.
Maintained a data warehouse where data extracted from various sources would be transformed to extract information representing several business metrics, feeding dashboards on a Metabase instance.

Technologies: Machine Learning, PySpark, Apache Spark, Amazon Web Services (AWS), MongoDB, MySQL, Spark, Python, Redshift, Amazon Athena, AWS Glue, Data Lakes, Data Lake Design, Data Warehousing, Data Warehouse Design, Snowplow Analytics, Database Design, SQL, Data Visualization, Data Analysis, Data Engineering, Data Architecture, Databases, Pandas, Data Pipelines, Cloud Architecture, AWS Cloud Architecture, Architecture, Data Integration, Database Integration, ETL Development

Full-stack Developer

2018 - 2018

Bananas Music Branding

Handled the company's internal web-based management system, overseeing everything from the deployment tools to small changes in the system's interface. The system was made using PHP and React.
Contributed to migrating the old PHP-based system to a new one built using Python and Django.
Oversaw CI/CD processes related to the migration described in the item above.

Technologies: MySQL, React, PHP, Databases

Intern

2017 - 2017

Simbio

Contributed to a proprietary point-of-sale system that provided small business management tools. The system was built on Odoo, Python, Django, and PostgreSQL.
Made eventual changes to the company's website, mainly using HTML5 and CSS.
Handled changes in the system's database schema, which was built using PostgreSQL.

Technologies: OpenERP, Odoo, PostgreSQL, Django, MVC Frameworks, Databases

Intern

2017 - 2017

Sthima (later renamed to Fleye)

Created a Node.js-based system that reads data generated from a call router (used in call centers) and displays it in a human-friendly way using dashboards built with AngularJS components.
Contributed to the management system of a telecom company, which provided tools for managing customers, technicians, and active/inactive cable routes using Django's MVC architecture.
Optimized Django ORM queries used by the systems the company developed.

Technologies: AngularJS, Node.js, Django, MVC Frameworks, Databases

Web Developer

2011 - 2014

Federal University of Health Sciences of Porto Alegre

Maintained a web-based scientific paper submission system based on PHP and MySQL on the back end and JavaScript and CSS on the front end. Professors and students used it from time to time.
Created and tested a new system based on C#, which students used to make reservations for study rooms.
Maintained several other systems used by the university's students.
Handled some deployment-related tasks using TortoiseSVN, making changes to the code hosted on the university's servers.
Worked on reports based on queries made on the students' database using MySQL and Oracle.

Technologies: C#, Oracle, MySQL, JavaScript, CSS, HTML, PHP, Databases

Experience

Barbell

https://github.com/oprometeumoderno/barbell

Barbell is a tool that allows developers and researchers to develop episodic scenarios for training reinforcement learning agents.

There is a framework called Gym (https://gym.openai.com/) that provides a set of these episodic scenarios. Still, as reinforcement learning algorithms evolve, there is a need for the problems they are applied to evolve too.

Barbell benefits from the tools for creating scenarios that Gym provides. It lets its users quickly generate scenarios that use the same physics and game engines that Gym uses on its native scenarios.

Variant Transforms Pipeline from Shell Script to Airflow-managed Script

https://github.com/googlegenomics/gcp-variant-transforms

This is more of a story than a project, but it took me so long that I think it fits here.

The company I worked for had a .sh file managed by a cron job in an EC2 machine that ran the GCP Variant Transforms image on some data in S3 and placed the results in BigQuery. During a migration, the EC2 instance responsible for that was disabled. We were migrating our ETL from Celery/Cron to Airflow at the time, so all I needed to do was to translate the shell script to Python, right? Wrong.

I first had to reverse-engineer the script to see how to run it with the same parameters. After running into some permission issues, we noticed that the results in BigQuery were now partitioned by chromosome segments, while the table where the old script dumped its results was not.

In short, while reading the documentation to find out how to change that, we noticed that the person who wrote the documentation worked for our company before working for Google. And then it struck us: they had changed the inside code of the Docker image without documenting it anywhere. We had to make a view to combine the old data with the new, but it all worked in the end. This is a short version of the story; it took me months to finish that.

My Current Reading List

I'll use this section to keep the matchers up-to-date with the technical books I'm reading. I've always wanted to develop my skills, and acquiring knowledge on the matter is an essential part of it. Personally, I like technical books by O'Reilly and No Starch Press; I find them informative, entertaining, and pleasant to read.

Currently, I'm reading the following books:

• Fundamentals of Data Engineering by Joe Reis and Matt Housley (https://a.co/d/6VIob2T)
• Database Internals: A Deep Dive into How Distributed Data Systems Work by Alex Petrov (https://a.co/d/gV10DSZ)

Last Updated: May 5, 2023

Education

2011 - 2019

Bachelor's Degree in Computer Science

Federal University of Rio Grande do Sul - Porto Alegre, Brazil

2014 - 2015

Exchange Program in Computer Science

Rijksuniversiteit Groningen - Groningen, Netherlands

Skills

Libraries/APIs

PySpark, Matplotlib, React, Pandas, Node.js, Standard Template Library (STL), Keras, Scikit-learn, NumPy, TensorFlow, PyTorch

Tools

Jupyter, Seaborn, AWS Glue, Amazon EBS, Amazon Elastic MapReduce (EMR), Atom, GitHub, Odoo, PyCharm, Amazon Athena, Snowplow Analytics, Git, Celery, BigQuery, Apache Airflow

Storage

MySQL, Amazon S3 (AWS S3), Amazon Aurora, PostgreSQL, Data Lakes, Data Lake Design, Data Pipelines, Databases, MongoDB, Redshift, Data Integration, Database Integration

Languages

SQL, Python, Python 3, PHP, JavaScript, HTML, CSS, C#, Bash Script

Frameworks

Spark, Django, Ruby on Rails 5, Apache Spark, Django REST Framework, AngularJS, Material UI, Flask

Paradigms

Database Design, ETL, REST, Data Science, Functional Programming

Platforms

Amazon Web Services (AWS), Jupyter Notebook, Amazon EC2, Ubuntu, Oracle, OpenERP, Mixpanel, MacOS, AWS Lambda, Kubernetes, Google Cloud Platform (GCP)

Other

Dashboards, Dashboard Design, Reinforcement Learning, MVC Frameworks, Metabase, Parquet, Data Warehousing, Data Warehouse Design, Fintech, AWS Database Migration Service (DMS), Data Engineering, Data Visualization, ETL Tools, Machine Learning, Image Processing, Optimization, Software Engineering, 3D Image Processing, Fivetran, Data Build Tool (dbt), Data Wrangling, Data Analysis, Data Architecture, Cloud Architecture, AWS Cloud Architecture, Architecture, ETL Development, APIs, Google BigQuery

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring