Henrique de Paula Lopes
Verified Expert in Engineering
Data Warehousing Developer
After eight years of experience as a software developer, Henrique developed an interest in working with all things data-related. He then quickly adapted to the field of data analysis and engineering and learned how to implement data pipelines and perform data modeling and analysis. Now with six years of experience as a data analyst and engineer, Henrique has become proficient in query optimization and data integration to deliver fast and reliable solutions.
Portfolio
Experience
Availability
Preferred Environment
GitHub, MacOS, PyCharm
The most amazing...
...problem I've solved was the migration of a genomic data ETL pipeline from a shell script-based job to an Airflow-managed Python job.
Work Experience
Senior Software Engineer
Virtasant
- Worked on Python-based analysis tools that processed cloud computing usage for several companies, looking for opportunities to migrate services to a different environment with the goal of cutting down costs.
- Worked on the ETL tools that collected, preprocessed, and stored the data used by the tools described above, storing them in AWS S3 and accessing them through Amazon Athena.
- Worked on validations for the data used to find savings opportunities, creating and delegating tasks to teammates whenever a validation failed.
Data Engineer
Color
- Maintained old ETL jobs comprised of Python scripts orchestrated by Celery that accessed data through Django ORM to deliver customer reports in tables in a data warehouse stored in Google BigQuery.
- Replicated the company's main database to Google BigQuery using Fivetran, speeding up the ingestion steps in the company's data lifecycle.
- Led efforts to convert old failing jobs mentioned above to run as dbt models built on BigQuery, generating the same results in a fraction of the time and freeing resources on strained servers.
- Handled the conversion of remaining jobs to run in Airflow, gaining retry capabilities, additional insight into failures, and a consistent scheduling platform.
- Migrated a shell script-based data pipeline that processed genomic data to an ETL job that used Python in Airflow.
Python Developer and Data Scientist
Bold Metrics
- Contributed to a project that aimed to migrate AWS resources to a CDK-based infrastructure-as-code stack, simplifying resource management for the several AWS services required by the company's products.
- Devised scripts that ingested and parsed customer data from human-readable formats such as Excel, speeding up the quality assurance step of the training of the company's recommendation systems.
- Handled the changes in the API structure, especially on how it stored its logs, so big data tools could process them later.
Data Analyst
Warren Corretora de Titulos e Valores Mobiliarios e Cambio Ltda
- Automated several key BI metrics—from customer segmentation insights to monthly churn rates—using SQL queries and Metabase dashboards.
- Deployed an event-tracking pipeline built with Snowplow on top of several AWS services so the company could own and analyze the data in many different ways.
- Created and maintained a data lake on AWS and managed data cataloging and access using services like S3 and AWS Glue. The ingestion step included several ETL jobs based on Python and PySpark.
- Created and maintained a data lake on AWS and managed data cataloging and access using services like S3 and AWS Glue.
- Led efforts to define each team's important business metrics and indicators and how to derive them from the company's data. This was done with the help of people from each of the company's teams.
- Maintained a data warehouse where data extracted from various sources would be transformed to extract information representing several business metrics, feeding dashboards on a Metabase instance.
Full-stack Developer
Bananas Music Branding
- Handled the company's internal web-based management system, overseeing everything from the deployment tools to small changes in the system's interface. The system was made using PHP and React.
- Contributed to migrating the old PHP-based system to a new one built using Python and Django.
- Oversaw CI/CD processes related to the migration described in the item above.
Intern
Simbio
- Contributed to a proprietary point-of-sale system that provided small business management tools. The system was built on Odoo, Python, Django, and PostgreSQL.
- Made eventual changes to the company's website, mainly using HTML5 and CSS.
- Handled changes in the system's database schema, which was built using PostgreSQL.
Intern
Sthima (later renamed to Fleye)
- Created a Node.js-based system that reads data generated from a call router (used in call centers) and displays it in a human-friendly way using dashboards built with AngularJS components.
- Contributed to the management system of a telecom company, which provided tools for managing customers, technicians, and active/inactive cable routes using Django's MVC architecture.
- Optimized Django ORM queries used by the systems the company developed.
Web Developer
Federal University of Health Sciences of Porto Alegre
- Maintained a web-based scientific paper submission system based on PHP and MySQL on the back end and JavaScript and CSS on the front end. Professors and students used it from time to time.
- Created and tested a new system based on C#, which students used to make reservations for study rooms.
- Maintained several other systems used by the university's students.
- Handled some deployment-related tasks using TortoiseSVN, making changes to the code hosted on the university's servers.
- Worked on reports based on queries made on the students' database using MySQL and Oracle.
Experience
Barbell
https://github.com/oprometeumoderno/barbellThere is a framework called Gym (https://gym.openai.com/) that provides a set of these episodic scenarios. Still, as reinforcement learning algorithms evolve, there is a need for the problems they are applied to evolve too.
Barbell benefits from the tools for creating scenarios that Gym provides. It lets its users quickly generate scenarios that use the same physics and game engines that Gym uses on its native scenarios.
Variant Transforms Pipeline from Shell Script to Airflow-managed Script
https://github.com/googlegenomics/gcp-variant-transformsThe company I worked for had a .sh file managed by a cron job in an EC2 machine that ran the GCP Variant Transforms image on some data in S3 and placed the results in BigQuery. During a migration, the EC2 instance responsible for that was disabled. We were migrating our ETL from Celery/Cron to Airflow at the time, so all I needed to do was to translate the shell script to Python, right? Wrong.
I first had to reverse-engineer the script to see how to run it with the same parameters. After running into some permission issues, we noticed that the results in BigQuery were now partitioned by chromosome segments, while the table where the old script dumped its results was not.
In short, while reading the documentation to find out how to change that, we noticed that the person who wrote the documentation worked for our company before working for Google. And then it struck us: they had changed the inside code of the Docker image without documenting it anywhere. We had to make a view to combine the old data with the new, but it all worked in the end. This is a short version of the story; it took me months to finish that.
My Current Reading List
Currently, I'm reading the following books:
• Fundamentals of Data Engineering by Joe Reis and Matt Housley (https://a.co/d/6VIob2T)
• Database Internals: A Deep Dive into How Distributed Data Systems Work by Alex Petrov (https://a.co/d/gV10DSZ)
Last Updated: May 5, 2023
Skills
Languages
SQL, Python, Python 3, PHP, JavaScript, HTML, CSS, C#, Bash Script
Storage
MySQL, Amazon S3 (AWS S3), Amazon Aurora, PostgreSQL, Data Lakes, Data Lake Design, Data Pipelines, Databases, MongoDB, Redshift, Data Integration, Database Integration
Frameworks
Spark, Django, Ruby on Rails 5, Apache Spark, Django REST Framework, AngularJS, Material UI, Flask
Libraries/APIs
PySpark, Matplotlib, React, Pandas, Node.js, Standard Template Library (STL), Keras, Scikit-learn, NumPy, TensorFlow, PyTorch
Tools
Jupyter, Seaborn, AWS Glue, Amazon EBS, Amazon Elastic MapReduce (EMR), Atom, GitHub, Odoo, PyCharm, Amazon Athena, Snowplow Analytics, Git, Celery, BigQuery, Apache Airflow
Paradigms
Database Design, ETL, REST, Data Science, Functional Programming
Platforms
Amazon Web Services (AWS), Jupyter Notebook, Amazon EC2, Ubuntu, Oracle, OpenERP, Mixpanel, MacOS, AWS Lambda, Kubernetes, Google Cloud Platform (GCP)
Other
Dashboards, Dashboard Design, Reinforcement Learning, MVC Frameworks, Metabase, Parquet, Data Warehousing, Data Warehouse Design, Fintech, AWS Database Migration Service (DMS), Data Engineering, Data Visualization, ETL Tools, Machine Learning, Image Processing, Optimization, Software Engineering, 3D Image Processing, Fivetran, Data Build Tool (dbt), Data Wrangling, Data Analysis, Data Architecture, Cloud Architecture, AWS Cloud Architecture, Architecture, ETL Development, APIs, Google BigQuery
Education
Bachelor's Degree in Computer Science
Federal University of Rio Grande do Sul - Porto Alegre, Brazil
Exchange Program in Computer Science
Rijksuniversiteit Groningen - Groningen, Netherlands
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring