
Rodrigo Lazarini Gil
Verified Expert in Engineering
Data Engineer and ETL Developer
São Paulo - State of São Paulo, Brazil
Toptal member since October 21, 2024
Rodrigo has worked in various data domains, transitioning from roles as a developer and database administrator to his current position as a data engineer. Specializing in microservices architecture, continuous delivery, and scalable processes, he delves into big data back-end development, ranging from pandas and Spark ETL to configuring platform tools like Airflow, Kafka, JupyterHub, and data lakes. Rodrigo has adeptly crafted big data platforms using AWS, GCP, and Azure.
Portfolio
Experience
- Programming - 15 years
- SQL - 13 years
- Python - 8 years
- Apache Airflow - 8 years
- Linux - 8 years
- PostgreSQL - 6 years
- Docker - 6 years
- PySpark - 5 years
Availability
Preferred Environment
Slack, Visual Studio Code (VS Code), Linux, Google Cloud Platform (GCP), Python, Apache Airflow, Docker, Kubernetes, GitHub, GitHub Actions
The most amazing...
...GCP project I've done involved creating a data warehouse from scratch, using a GKE cluster to oversee an Airflow instance along with Pub/Sub and BigQuery.
Work Experience
Senior Data Engineer
Toptal
- Played a pivotal role in maintaining the platform and ensuring company data quality standards.
- Established and managed pipelines in Luigi and Airflow.
- Developed and enhanced an Airflow framework with the team to establish standards.
- Improved CI/CD practices utilizing GitHub Actions.
Data Engineer | Lead Consultant
ThoughtWorks
- Headed a sizable team primarily consisting of data engineers alongside DevOps, developers, and QA professionals.
- Worked with Scrum methodologies to refine, organize, and plan stories with the team.
- Used desk check and kick-off methodologies to help people understand how to play a story and validate acceptance criteria.
- Oversaw a team developing a streaming data platform on Azure, incorporating serverless resources such as Azure Functions and CosmosDB.
Data Engineer Specialist
Neuralmed
- Created a data lake from scratch with a machine learning (ML) focus.
- Created a custom PySpark Docker image to be run by Airflow and integrated with GitSync.
- Ran pytest integration tests using GitHub Actions.
Senior Data Engineer
Grupo ZAP
- Developed an Airflow solution using Kubernetes and an easy way to add new DAGs using YAML: https://medium.com/@nbrgil/scalable-airflow-with-kubernetes-git-sync-63c34d0edfc3.
- Created CI/CD with CircleCI/Jenkins to build Docker images and deploy Kubernetes deployment pods.
- Created a platform tool to help load relational databases to Apache Kafka using Debezium.
Senior Data Engineer
Globo
- Created a cluster of Kubernetes inside the Google Cloud Platform.
- Developed an Airflow pipeline to control flows, creating scalable Kubernetes pods.
- Defined CI/CD with GitLab/Kubernetes/Docker integration.
- Created ETL processes to run on Spark (Google Cloud DataProc).
- Developed Spark (Scala) jobs to process large-scale data.
Senior Data Engineer
Searchmetrics
- Created a microservice with an API in Python using Falcon.
- Handled unit and integration tests with Pytest and AWS LocalStack.
- Contributed to Docker integration with Travis and AWS ECS for easier deployment.
- Handled scalable processing with Python Redis Queue and AWS Lambda.
- Used AWS Athena to aggregate S3 files and display them as an external table.
Senior Data Engineer
Geofusion
- Worked with OKR-based squads (having the autonomy to make decisions inside the team).
- Served as a squad tech leader focused on managing junior members.
- Worked in a data warehouse environment to provide enriched data to many applications through API.
- Developed and tuned data flow/ETL tools using Python, PostgreSQL, shell scripts (make), and Airbnb’s Airflow.
- Handled test-driven development (TDD) unit, integration, and acceptance tests using Python.
- Used GoCD/Jenkins with Docker machine deploy for continuous integration.
- Used Docker containers to create tools that are easier to deploy and maintain.
- Developed tools to integrate all databases, SQL and NoSQL.
- Developed projects with a continuous delivery concept.
- Advanced SQL and knowledge about all the database schemas.
Experience
Airflow with YAML DAGs
https://medium.com/@nbrgil/airflow-with-yaml-dags-and-kubernetes-operator-ee9594b96714Education
Master of Business Administration (MBA) in Oracle Database Administration
FIAP - São Paulo, Brazil
Bachelor's Degree in Computer Science
Faculty of Industrial Engineering (FEI) - São Bernardo do Campo, Brazil
Skills
Libraries/APIs
PySpark, Luigi, Pandas
Tools
Apache Airflow, Pytest, BigQuery, Slack, GitHub, Git, CircleCI, Jenkins, GitLab, Google Cloud Dataproc, Travis CI, Amazon Athena
Languages
Python, SQL, Java, Pascal, Scala, Falcon
Frameworks
Apache Spark, Spark, Flask
Paradigms
ETL, Objectives & Key Results (OKRs), Test-driven Development (TDD)
Storage
PostgreSQL, Data Pipelines, Databases, Oracle DBA, Amazon S3 (AWS S3), NoSQL, Google Cloud, Redis, Microsoft SQL Server, MongoDB
Platforms
Visual Studio Code (VS Code), Linux, Google Cloud Platform (GCP), Docker, Kubernetes, Oracle, Amazon, Amazon Web Services (AWS), Amazon EC2, Azure, Azure Functions, Debezium, Apache Kafka, LocalStack, AWS Lambda
Other
Programming, Relational Database Services (RDS), Data Engineering, CI/CD Pipelines, Google BigQuery, GitHub Actions, Pub/Sub, Technical Leadership, Data Warehousing, Data Modeling, Software Engineering, APIs, Amazon RDS, FastAPI, ECS, GoCD, Machine Learning Operations (MLOps), Azure Data Factory (ADF), Data Build Tool (dbt)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring