Rodrigo Lazarini Gil, Developer in São Paulo - State of São Paulo, Brazil
Rodrigo is available for hire
Hire Rodrigo

Rodrigo Lazarini Gil

Verified Expert  in Engineering

Data Engineer and ETL Developer

São Paulo - State of São Paulo, Brazil

Toptal member since October 21, 2024

Bio

Rodrigo has worked in various data domains, transitioning from roles as a developer and database administrator to his current position as a data engineer. Specializing in microservices architecture, continuous delivery, and scalable processes, he delves into big data back-end development, ranging from pandas and Spark ETL to configuring platform tools like Airflow, Kafka, JupyterHub, and data lakes. Rodrigo has adeptly crafted big data platforms using AWS, GCP, and Azure.

Portfolio

Toptal
Python, Docker, Apache Airflow, GitHub, GitHub Actions...
ThoughtWorks
Azure, Azure Functions, Docker, Technical Leadership, Pytest, Programming...
Neuralmed
Apache Airflow, Docker, GitHub, GitHub Actions, Google Cloud Platform (GCP)...

Experience

  • Programming - 15 years
  • SQL - 13 years
  • Python - 8 years
  • Apache Airflow - 8 years
  • Linux - 8 years
  • PostgreSQL - 6 years
  • Docker - 6 years
  • PySpark - 5 years

Availability

Part-time

Preferred Environment

Slack, Visual Studio Code (VS Code), Linux, Google Cloud Platform (GCP), Python, Apache Airflow, Docker, Kubernetes, GitHub, GitHub Actions

The most amazing...

...GCP project I've done involved creating a data warehouse from scratch, using a GKE cluster to oversee an Airflow instance along with Pub/Sub and BigQuery.

Work Experience

Senior Data Engineer

2022 - 2024
Toptal
  • Played a pivotal role in maintaining the platform and ensuring company data quality standards.
  • Established and managed pipelines in Luigi and Airflow.
  • Developed and enhanced an Airflow framework with the team to establish standards.
  • Improved CI/CD practices utilizing GitHub Actions.
Technologies: Python, Docker, Apache Airflow, GitHub, GitHub Actions, Google Cloud Platform (GCP), Kubernetes, Data Warehousing, Pytest, PostgreSQL, Programming, Relational Database Services (RDS), Luigi, Scala, ETL, Data Engineering, Data Pipelines, Git, Data Modeling, Pandas, CI/CD Pipelines, Software Engineering, Google BigQuery, APIs, Data Build Tool (dbt), Databases, Google Cloud

Data Engineer | Lead Consultant

2022 - 2022
ThoughtWorks
  • Headed a sizable team primarily consisting of data engineers alongside DevOps, developers, and QA professionals.
  • Worked with Scrum methodologies to refine, organize, and plan stories with the team.
  • Used desk check and kick-off methodologies to help people understand how to play a story and validate acceptance criteria.
  • Oversaw a team developing a streaming data platform on Azure, incorporating serverless resources such as Azure Functions and CosmosDB.
Technologies: Azure, Azure Functions, Docker, Technical Leadership, Pytest, Programming, Relational Database Services (RDS), ETL, Data Engineering, Data Pipelines, Git, Data Modeling, CI/CD Pipelines, Software Engineering, Azure Data Factory (ADF), Apache Spark, BigQuery

Data Engineer Specialist

2020 - 2021
Neuralmed
  • Created a data lake from scratch with a machine learning (ML) focus.
  • Created a custom PySpark Docker image to be run by Airflow and integrated with GitSync.
  • Ran pytest integration tests using GitHub Actions.
Technologies: Apache Airflow, Docker, GitHub, GitHub Actions, Google Cloud Platform (GCP), Kubernetes, Pub/Sub, FastAPI, Technical Leadership, Data Warehousing, Pytest, Spark, PostgreSQL, Programming, Relational Database Services (RDS), ETL, Data Engineering, Data Pipelines, PySpark, Git, Data Modeling, Flask, Pandas, Machine Learning Operations (MLOps), CI/CD Pipelines, Software Engineering, Apache Spark, Google BigQuery, BigQuery, APIs, MongoDB, Amazon RDS, Databases, Google Cloud

Senior Data Engineer

2019 - 2020
Grupo ZAP
  • Developed an Airflow solution using Kubernetes and an easy way to add new DAGs using YAML: https://medium.com/@nbrgil/scalable-airflow-with-kubernetes-git-sync-63c34d0edfc3.
  • Created CI/CD with CircleCI/Jenkins to build Docker images and deploy Kubernetes deployment pods.
  • Created a platform tool to help load relational databases to Apache Kafka using Debezium.
Technologies: Apache Airflow, Kubernetes, CircleCI, Jenkins, Debezium, Apache Kafka, PostgreSQL, Data Warehousing, Amazon S3 (AWS S3), Pytest, Spark, Programming, Relational Database Services (RDS), Luigi, ETL, Data Engineering, Data Pipelines, Microsoft SQL Server, Amazon Web Services (AWS), PySpark, Git, Data Modeling, Machine Learning Operations (MLOps), Amazon EC2, CI/CD Pipelines, Software Engineering, Apache Spark, APIs, Amazon RDS, Databases

Senior Data Engineer

2018 - 2019
Globo
  • Created a cluster of Kubernetes inside the Google Cloud Platform.
  • Developed an Airflow pipeline to control flows, creating scalable Kubernetes pods.
  • Defined CI/CD with GitLab/Kubernetes/Docker integration.
  • Created ETL processes to run on Spark (Google Cloud DataProc).
  • Developed Spark (Scala) jobs to process large-scale data.
Technologies: Apache Airflow, Kubernetes, GitLab, Google Cloud Dataproc, Spark, Scala, Pytest, Programming, Relational Database Services (RDS), ETL, Data Engineering, Data Pipelines, Amazon Web Services (AWS), PySpark, Git, Data Modeling, CI/CD Pipelines, Software Engineering, Apache Spark, Google BigQuery, BigQuery, Databases, Google Cloud

Senior Data Engineer

2018 - 2018
Searchmetrics
  • Created a microservice with an API in Python using Falcon.
  • Handled unit and integration tests with Pytest and AWS LocalStack.
  • Contributed to Docker integration with Travis and AWS ECS for easier deployment.
  • Handled scalable processing with Python Redis Queue and AWS Lambda.
  • Used AWS Athena to aggregate S3 files and display them as an external table.
Technologies: Python, Pytest, LocalStack, Travis CI, ECS, Redis, AWS Lambda, Amazon Athena, Amazon S3 (AWS S3), Programming, Relational Database Services (RDS), Falcon, ETL, Data Engineering, Data Pipelines, Git, Data Modeling, Flask, Amazon EC2, CI/CD Pipelines, Software Engineering, APIs, Amazon RDS, Databases

Senior Data Engineer

2015 - 2017
Geofusion
  • Worked with OKR-based squads (having the autonomy to make decisions inside the team).
  • Served as a squad tech leader focused on managing junior members.
  • Worked in a data warehouse environment to provide enriched data to many applications through API.
  • Developed and tuned data flow/ETL tools using Python, PostgreSQL, shell scripts (make), and Airbnb’s Airflow.
  • Handled test-driven development (TDD) unit, integration, and acceptance tests using Python.
  • Used GoCD/Jenkins with Docker machine deploy for continuous integration.
  • Used Docker containers to create tools that are easier to deploy and maintain.
  • Developed tools to integrate all databases, SQL and NoSQL.
  • Developed projects with a continuous delivery concept.
  • Advanced SQL and knowledge about all the database schemas.
Technologies: Objectives & Key Results (OKRs), Data Warehousing, Test-driven Development (TDD), Apache Airflow, GoCD, Jenkins, Docker, NoSQL, SQL, Technical Leadership, Pytest, PostgreSQL, Oracle, Programming, Relational Database Services (RDS), Oracle DBA, Luigi, ETL, Data Engineering, Data Pipelines, Git, Data Modeling, Software Engineering, MongoDB, Databases

Experience

Airflow with YAML DAGs

https://medium.com/@nbrgil/airflow-with-yaml-dags-and-kubernetes-operator-ee9594b96714
Spearheaded the development of an innovative Airflow framework tailored to YAML files, enabling seamless orchestration of diverse Docker images and Python code for any team, regardless of their Python expertise level.

Education

2010 - 2011

Master of Business Administration (MBA) in Oracle Database Administration

FIAP - São Paulo, Brazil

2004 - 2007

Bachelor's Degree in Computer Science

Faculty of Industrial Engineering (FEI) - São Bernardo do Campo, Brazil

Skills

Libraries/APIs

PySpark, Luigi, Pandas

Tools

Apache Airflow, Pytest, BigQuery, Slack, GitHub, Git, CircleCI, Jenkins, GitLab, Google Cloud Dataproc, Travis CI, Amazon Athena

Languages

Python, SQL, Java, Pascal, Scala, Falcon

Frameworks

Apache Spark, Spark, Flask

Paradigms

ETL, Objectives & Key Results (OKRs), Test-driven Development (TDD)

Storage

PostgreSQL, Data Pipelines, Databases, Oracle DBA, Amazon S3 (AWS S3), NoSQL, Google Cloud, Redis, Microsoft SQL Server, MongoDB

Platforms

Visual Studio Code (VS Code), Linux, Google Cloud Platform (GCP), Docker, Kubernetes, Oracle, Amazon, Amazon Web Services (AWS), Amazon EC2, Azure, Azure Functions, Debezium, Apache Kafka, LocalStack, AWS Lambda

Other

Programming, Relational Database Services (RDS), Data Engineering, CI/CD Pipelines, Google BigQuery, GitHub Actions, Pub/Sub, Technical Leadership, Data Warehousing, Data Modeling, Software Engineering, APIs, Amazon RDS, FastAPI, ECS, GoCD, Machine Learning Operations (MLOps), Azure Data Factory (ADF), Data Build Tool (dbt)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring