Eduardo Bartolomeu, Developer in Recife - State of Pernambuco, Brazil
Eduardo is currently unavailable

Eduardo Bartolomeu

Senior Data Engineer and Developer

Recife - State of Pernambuco, Brazil

Toptal member since January 26, 2024

Bio

Eduardo is a senior data engineer with over 14 years of experience in the data field. He has worked as an Oracle and SQL Server database administrator and a PL/SQL and Transact-SQL (T-SQL) developer. With experience in the financial, retail, health, and education industries. As a data engineer, he worked with AWS, GCP, and Azure environments with different kinds of complexity, creating data warehouses and data lakes from scratch and participating in that creation.

Portfolio

DataArt
Snowflake, SQL, Azure Data Factory (ADF), Azure Logic Apps, Azure SQL, ADF...
2am.tech
Liquibase, Snowflake, MySQL, PostgreSQL, Microsoft SQL Server...
Truelogic Software
Terraform, AWS Glue, Redshift, Amazon Athena, AWS CodeBuild, AWS CodePipeline...

Experience

  • SQL - 12 years
  • Databases - 12 years
  • ETL - 8 years
  • Amazon Web Services (AWS) - 5 years
  • PySpark - 5 years
  • Data Lake Design - 5 years
  • Python - 5 years
  • AWS Glue - 4 years

Preferred Environment

Amazon Web Services (AWS), SQL, Python, PySpark, Google Cloud Platform (GCP), Apache Airflow, ETL, Data Modeling, Big Data, Data Lake Design

The most amazing...

...feature I've created for data science teams predicts hospitalizations in healthcare plans, helping to save lives.

Work Experience

Senior Data Engineer

2023 - PRESENT
DataArt
  • Created the entire architecture of a Data Warehouse since the consumption of the data from APIs and folders on the Blob Storage to the Snowflake until the Semantic Models consumed by the PowerBI Dashboads.
  • Created the documentation talking about the project the main sources how to consume them. Created the SnowPark Code and SnowSQL Code and commit that on Git and deploying it on different stages.
  • Created the ADF Pipelines to consume data from API, files on Blob Storage and to ingest that in Snowflake Databases.
  • Developed stored procedures on Snowflake to consolidate data assets consumed by dashboards.
  • Made Azure Data Factory pipelines for ETL Excel business files in Snowflake.
  • Created Logic Apps to get email attachments and save them in Blob Storage.
  • Created Azure Functions to run after a file arrives in the storage container doing transformation and load a new CSV to be consumed as a Snowflake stage.
  • Built data warehouse objects such as dimensions and fact tables in RedShift using Matillion.
  • Created reverse engineering to build the data model from legacy databases to be migrated from AWS to GCP using the best practices of governance.
  • Conducted technical interviews for hiring new database administrators and data engineers for DataArt.
Technologies: Snowflake, SQL, Azure Data Factory (ADF), Azure Logic Apps, Azure SQL, ADF, Azure Data Lake, Azure SQL Data Warehouse, Azure Functions, Azure Virtual Machines, Python, Blob Storage, Matillion ETL for Redshift, erwin Data Modeler, Pandas, Stakeholder Interviews, Documentation, Data Governance, Google Cloud Platform (GCP), Google Cloud SQL, Amazon RDS, PostgreSQL, Data Build Tool (dbt), Snowpark, CI/CD Pipelines, APIs, Azure DevOps, Git, Microsoft Power BI, Data Architecture, Medallion Architecture, ETL Pipelines, Geolocation

Senior Data Engineer and SQL Developer

2023 - 2024
2am.tech
  • Translated procedures from Transact-SQL (T-SQL) to SnowSQL.
  • Created tables, stages, pipes, streams, procedures, and functions in the Snowflake data lake, taking data from SQL Server, PostgreSQL, and MySQL.
  • Maintained and tested the SQL scripts using Liquibase.
Technologies: Liquibase, Snowflake, MySQL, PostgreSQL, Microsoft SQL Server, Transact-SQL (T-SQL), Amazon Web Services (AWS), Amazon S3 (AWS S3), AWS Database Migration Service (DMS), SnowSQL, SQL, Data Engineering, Data Lakes, Git, ETL Pipelines

Senior Data Engineer

2022 - 2023
Truelogic Software
  • Created AWS Glue jobs using PySpark to transform data between the data lake zones.
  • Performed dimensional modeling for data warehouses stored on Redshift.
  • Documented processes using Confluence linked to Jira tickets.
  • Built data pipelines from scratch from databases to the data lake and Redshift.
Technologies: Terraform, AWS Glue, Redshift, Amazon Athena, AWS CodeBuild, AWS CodePipeline, Amazon RDS, MySQL, PostgreSQL, SQLAlchemy, Python, PySpark, AWS Database Migration Service (DMS), AWS Lambda, Amazon API Gateway, SQL, Data Engineering, Data Lakes, Databases, Git

Senior Data Engineer

2019 - 2022
Neurotech
  • Created data pipelines using Composer, Apache Airflow, and BigQuery, building datasets to be used by data science teams to predict hospitalizations and people with chronic diseases.
  • Imported database files from the Brazilian public healthcare system to our data lake on AWS using EMR clusters and PySpark.
  • Oversaw other data engineers on their tasks, helping them to achieve the company's expectations.
  • Improved the performance of PySpark jobs running on EMR clusters.
Technologies: Amazon Web Services (AWS), Amazon Elastic MapReduce (EMR), PySpark, Python, Amazon Athena, AWS Glue, Metabase, Google Data Studio, Google BigQuery, Apache Airflow, Google Cloud Dataproc, Google Compute Engine (GCE), Amazon EC2, Amazon S3 (AWS S3), Google Cloud Composer, Google Cloud Platform (GCP), Jupyter Notebook, ETL, ELT, Data Pipelines, SQL, SQL Performance, Performance Tuning, Databases, Data Engineering, Data Lakes, Git, ETL Pipelines, Looker Studio, Data Build Tool (dbt)

Senior Database Administrator

2018 - 2019
Nyx Soluções
  • Installed Oracle and SQL Server's database environment from scratch.
  • Oversaw environment health statuses using monitoring tools.
  • Improved the query performance for several clients, particularly in the retail industry.
  • Created monthly environment health status reports for clients to monitor KPIs, including disk space, tablespace usage, heavy queries, and processor usage.
Technologies: Linux, Windows Server, Oracle Database, Microsoft SQL Server, PL/SQL, Transact-SQL (T-SQL), SQL, PL/SQL Tuning, Performance Tuning, Oracle Database Tuning

Experience

Snowflake Data Lake

http://www.emsmc.com
A data lake to store data from several company systems in Snowflake. We leveraged ELT methods to keep all the data in Snowflake and then be able to perform all necessary transformations. Both the raw zone of data and the trusted and refined zone were stored in Snowflake.

Data Lake and Data Warehouse

https://www.vectorsolutions.com/
An education company acquired smaller ones, prompting the need to establish a centralized data lake to manage data from various company systems. We built that required data lake, as well as data pipelines and data warehouses from scratch, leveraging the AWS tech stack. We used document management systems (DMS) to migrate data from relational database services (RDS), MySQL, and PostgreSQL. We also employed AWS Glue jobs, workflows, and crawlers for ETL processes across different data lake layers. To automate CI/CD, we implemented Lambda Functions through API Gateways triggered by Git actions. Terraform was used as infrastructure as code (IaC).

Cancer Identifier

http://portal.sulamericaseguros.com.br
A major healthcare company in Brazil possessed historical data of beneficiaries but lacked access to exam results. Collaborating with data engineers and a business team, including doctors and nurses, we developed an algorithm that assesses whether a beneficiary has a low, medium, or high probability of having cancer. This project enabled the company to undertake proactive measures for these beneficiaries, enhancing their health journey.

Initially, we migrated data sheets to BigQuery, focusing on the most common procedures for beneficiaries with cancer. Subsequently, we developed a Python algorithm to export the results for storage as both CSV files and tables. Everything was orchestrated using Composer and Apache Airflow.

LA County Dashboads

http://www.eso.com
I worked as a data engineer on a project to integrate and analyze accident-related data across Los Angeles County. I unified information from LAPD APIs, XML files, and existing Snowflake tables into a single analytics environment. I built ingestion and transformation pipelines using Azure Data Factory, SnowPark, and SnowSQL, applying business rules and data normalization. All development was version-controlled in Git, and deployments were automated through Azure DevOps CI/CD, including ADF environment management via ARM templates.

I organized data into analytical domains, implemented Slowly Changing Dimensions to preserve history, and enriched records with Service Planning Area (SPA) geographic boundaries for regional insights. I also delivered semantic models for Power BI, enabling structured and governed dashboard consumption. Additionally, I documented data lineage and technical details in Confluence, ensuring clarity, maintainability, and continuity.

Education

2017 - 2018

Master of Business Administration (MBA) in Business Intelligence

Institute of Management in Information Technology (IGTI) - Belo Horizonte, Brazil

2008 - 2014

Bachelor's Degree in Information Systems

Faculdade Estácio do Recife - Recife, Brazil

Certifications

DECEMBER 2025 - DECEMBER 2027

Databricks Certified Data Engineer Associate

Databricks

OCTOBER 2025 - OCTOBER 2026

Microsoft Certified: Fabric Data Engineer Associate

Microsoft

JULY 2024 - JULY 2025

Microsoft Certified: Azure Data Engineer Associate

Microsoft

AUGUST 2023 - AUGUST 2026

AWS Certified Cloud Practitioner

Amazon Web Services

JULY 2018 - PRESENT

Splunk Core Certified Power User

Splunk

JUNE 2017 - PRESENT

ITIL Foundation Certificate in IT Service Management

Axelos

JUNE 2017 - PRESENT

Oracle Database 11g Administrator Certified Professional

Oracle

Skills

Libraries/APIs

PySpark, Liquibase, SQLAlchemy, Pandas, Snowpark

Tools

AWS Glue, BigQuery, Amazon Elastic MapReduce (EMR), Apache Airflow, Splunk, Terraform, Amazon Athena, AWS CodeBuild, Google Cloud Dataproc, Google Compute Engine (GCE), Google Cloud Composer, Git, SnowSQL, Composer, Azure Logic Apps, Matillion ETL for Redshift, Microsoft Power BI, Confluence

Languages

SQL, Transact-SQL (T-SQL), Python, Snowflake, XML

Storage

Databases, PL/SQL, Data Lake Design, Oracle 11g, Oracle Database Tuning, MySQL, PostgreSQL, Microsoft SQL Server, Amazon S3 (AWS S3), Database Administration (DBA), Redshift, Data Pipelines, Google Cloud Storage, SQL Performance, Data Lakes, Azure SQL, Google Cloud SQL, JSON

Paradigms

ETL, Business Intelligence (BI), ITIL, Azure DevOps

Platforms

AWS Lambda, Jupyter Notebook, Oracle Database, Amazon Web Services (AWS), Google Cloud Platform (GCP), Amazon EC2, Linux, Windows Server, Azure SQL Data Warehouse, Azure Functions, Azure, Azure Synapse, Azure Event Hubs, Microsoft Fabric, Databricks

Frameworks

ADF

Other

Google BigQuery, Data Modeling, Big Data, Oracle Performance Tuning, Amazon RDS, ELT, Software Development, IT Project Management, Product Management, Data Warehousing, AWS CodePipeline, AWS Database Migration Service (DMS), Amazon API Gateway, Metabase, Google Data Studio, Relational Database Services (RDS), CSV Import, CSV Export, Information Systems, Document Management Systems (DMS), Lambda Functions, API Gateways, PL/SQL Tuning, Performance Tuning, Data Engineering, Azure Data Factory (ADF), Azure Data Lake, Azure Virtual Machines, Blob Storage, Azure Databricks, Azure Stream Analytics, erwin Data Modeler, Stakeholder Interviews, Documentation, Data Governance, Data Build Tool (dbt), Delta Lake, Delta Tables, Parquet, CI/CD Pipelines, APIs, Data Architecture, Medallion Architecture, ETL Pipelines, Looker Studio, Geolocation, Regular Expressions, Semantic Models, Unity Catalog

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring