Alexandre is available for hire

Alexandre França de Magalhães

Verified Expert in Engineering

Data Warehousing Developer

Location

Salvador - State of Bahia, Brazil

Toptal Member Since

February 4, 2022

Alexandre is a senior data engineer with over six years of professional experience. His main experiences are designing and building data lakes and warehouses and processing data with available resources, such as Spark, SQL from their databases, Pandas, etc. Alexandre is familiar with Azure and the AWS stack but is open to working with other clouds.

Data Engineering Data Modeling Data Warehousing Azure Data Lake SQL Python Data Pipelines REST APIs Spark SQL Spark Databricks Apache Airflow PySpark Pandas Data Cleaning Data Lakehouse MLflow

Portfolio

PepsiCo

Data Engineering, Python, Advertising, Media, Azure, Over-the-top Content (OTT)...

BCG - Gamma

SQL, PySpark, Automation, Azure, Amazon Web Services (AWS), Microsoft Azure...

Via Varejo

Azure, Databricks, Azure Data Factory, Azure Data Lake, Azure Synapse...

Experience

Data Pipelines - 5 years Data Warehousing - 5 years Data Engineering - 5 years Data Modeling - 5 years SQL - 5 years Apache Airflow - 3 years PySpark - 3 years Databricks - 3 years

Availability

Part-time

Preferred Environment

Spark SQL, Spark, SQL, Azure, Databricks, Python, Amazon Web Services (AWS), Apache Airflow, Azure Data Factory, Amazon Elastic MapReduce (EMR)

The most amazing...

...project I've developed was a data lake architecture from scratch with cloud, on-premise, and API data sources.

Work Experience

Data Engineer

2022 - PRESENT

PepsiCo

Developed PepsiCo's global media data warehouse from scratch, with the goal of concentrating all media measurement data in a corporate and centralized environment. The sources are mixed between APIs, ODBC/JDBC, and cloud storages.
Worked on PySpark code optimization to enhance performance and standardization.
Developed simple machine learning models on Databricks with MLflow to track performance, metrics, and artifacts.

Technologies: Data Engineering, Python, Advertising, Media, Azure, Over-the-top Content (OTT), Databricks, Samba, Roku, YouTube, Dynamic Data, Snowflake, APIs, Azure Data Factory, Data Warehousing, Data Lakes, Azure Data Lake, Amazon S3 (AWS S3), Google Cloud Storage, Azure Blobs, PySpark, Spark, MLflow, Machine Learning

Lead Data Engineer

2022 - 2022

BCG - Gamma

Developed data pipelines with data scientists to productionize experiments, data extractions, data modeling, data cleaning, and quality checking on multiple cloud environments.
Worked on large datasets, using Spark as a processing tool.
Developed SQL queries to query, analyze and manipulate data on many platforms, such as Spark, Hive, and relational data sources.

Technologies: SQL, PySpark, Automation, Azure, Amazon Web Services (AWS), Microsoft Azure, Apache Airflow, Pandas, Docker, Amazon S3 (AWS S3), Spark, Python, Parquet, CSV, JSON, Delta Lake, CI/CD Pipelines, Azure Blobs, Data Extraction, Data Cleaning, Hue, Apache Hive, Amazon Elastic MapReduce (EMR)

Senior Data Engineer

2021 - 2022

Via Varejo

Refactored the fraud analysis pipeline for performance improvements to be ready for Black Friday in 2021, achieving constant execution times on increased batch data loads.
Worked on developments at the company's fraud data marts.
Developed various pipelines to solve ingestions and data processing necessities.

Technologies: Azure, Databricks, Azure Data Factory, Azure Data Lake, Azure Synapse, Azure Event Hubs, Data Engineering, Data Warehousing, Data Modeling, ETL, ETL Tools, SQL, Data Management

Senior Data Engineer

2021 - 2021

Radix

Developed generic ingestion data pipelines for relational data sources, accelerating the process of new ingestions using simple configuration files.
Developed a Delta Lake architecture from scratch for secure and efficient data processing.
Worked on developing and maintaining a corporate data warehouse on the Azure Synapse platform.

Technologies: Azure, Azure Data Factory, Databricks, Azure Data Lake, Azure Synapse, Google Cloud Storage, Data Engineering, Data Modeling, Data Warehousing, ETL, ETL Tools, SQL, PySpark, Synapse, Apache Kafka, Microsoft Azure, Amazon Web Services (AWS)

Senior Data Engineer

2020 - 2021

Bridgestone

Supported and enhanced corporate data lakes built on Azure Cloud Services with on-premise data sources, such as SQL Server, Oracle, and Kafka streams for sensor data.
Developed SSIS packages for data pipelines with SQL, PL/SQL, and T-SQL.
Managed the third-party team in charge of on-site software and data support demands.

Technologies: Azure, Azure Data Factory, Databricks, Oracle, Pandas, SQL Server 2016, Data Engineering, Data Warehousing, ETL, ETL Tools, Data Modeling, SQL, PySpark, Oracle PL/SQL

Software Developer

2019 - 2020

Chemtech

Developed data extractions and features to help data science teams train and validate machine learning models built on top of Python, Pandas, and Scikit-learn technologies for various projects in the company.
Developed data pipelines for multiple client companies to serve data lake and warehouse architectures.
Tracked and developed user histories using Jira as a reporting tool.

Technologies: Oracle, SQL Server 2016, Python, Pandas, ETL Tools, ETL, Data Engineering, Data Warehousing, Data Modeling, Oracle PL/SQL, T-SQL (Transact-SQL), Data Pipelines, Microsoft Azure, Amazon Web Services (AWS), HDFS, Spark

Software Developer

2017 - 2019

Braskem

Developed SQL scripts for data ETL in a corporate data warehouse.
Created complex queries for production reports serving business analyst needs.
Developed C# back-end applications for the manufacturing of execution systems.

Technologies: Python, SQL, Oracle, SQL Server 2016, MongoDB, Data Warehousing, ETL Tools, ETL, Data Engineering, Data Modeling, Oracle PL/SQL, Amazon Web Services (AWS), Microsoft Azure, Data Pipelines, Pandas, C#

Experience

Data Lakehouse For an Educational Company

I developed an Azure Data Lake and Data Warehouse with dynamically configurable ingestion pipelines for various data sources such as on-premise Oracle and SQL Server databases, Google Cloud Platform (GCP) storage, and external API providers. The cloud infrastructure was previously nonexistent, so I established the hierarchical storage patterns and partitioning for each data source, processing most of the data on the Databricks platform with a data lakehouse configuration.

Data Lake for Rubber and Tire Industry

Evolved and migrated corporate data lake from on premisses Hadoop to Azure ADLS2. Developed pipelines for data ingestion and started the data modeling for a data warehousing solution where the processing was performed with Spark Databricks. The data warehouse was hosted on a synapse-dedicated SQL pool.

Refactoring of Fraud Detection Pipeline for Retail Company

I refactored an existing fraud detection pipeline used to analyze credit card purchases for abnormal behavior with an ML model. The main source of the problem lies in many calculations that could and should be executed in parallel. With this change, it was also necessary to rethink the calculated values. This reduced execution time from two hours to 20 minutes, consistent even on last year's Black Friday.

Skills

Languages

SQL, Python, T-SQL (Transact-SQL), Batch, C#, Snowflake

Frameworks

Spark, Hadoop, Data Lakehouse

Libraries/APIs

PySpark, Pandas, REST APIs

Tools

Spark SQL, Apache Airflow, Synapse, Hue, Amazon Elastic MapReduce (EMR)

Paradigms

ETL, Automation, Samba

Platforms

Azure, Databricks, Oracle, Azure Synapse, Azure Event Hubs, Apache Kafka, Amazon Web Services (AWS), Docker, YouTube

Storage

Data Pipelines, SQL Server 2016, Oracle PL/SQL, MongoDB, Data Lake Design, Data Lakes, HDFS, Amazon S3 (AWS S3), JSON, Azure Blobs, PostgreSQL, Apache Hive, Google Cloud Storage

Other

Azure Data Factory, Azure Data Lake, Data Engineering, Data Modeling, ETL Tools, Data Warehousing, Data Management, Data Cleaning, Microsoft Azure, Streaming, Parquet, CSV, Delta Lake, CI/CD Pipelines, Data Extraction, Advertising, Media, Over-the-top Content (OTT), Roku, Dynamic Data, APIs, MLflow, Machine Learning

Education

2013 - 2018

Bachelor's Degree in Engineering

Federal University of Bahia - Salvador, Brazil

Certifications

SEPTEMBER 2022 - PRESENT

Certified Data Engineer Associate

Databricks

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring