Ericles is available for hire

Ericles Monfrê

Verified Expert in Engineering

Data Engineer and Developer

Location

São Paulo - State of São Paulo, Brazil

Toptal Member Since

January 22, 2024

Ericles is a senior data engineer with a strong passion for data. With over seven years of comprehensive experience, he has excelled as a full-stack developer, machine learning engineer, and data engineer. Ericles has navigated diverse market segments across startup and corporate environments, including telecommunication, retail, IoT, electric vehicle chargers (EVCs), and banking. Committed to continual personal and professional growth, he consistently seeks challenges to enhance his skills.

Programming Python 3 SQL Databases Docker Git GitHub Back-end Big Data Architecture Data Engineering Front-end Natural Language Processing (NLP)MySQL Amazon Web Services (AWS)NoSQL DataOps

Portfolio

Santander Brasil

Python 3, SQL, Spark, Kubernetes, ETL, Azure Databricks, Delta Lake...

Powerdot

Python 3, Spark, Amazon Web Services (AWS), ETL, MySQL, CI/CD Pipelines...

Dextra

Apache Airflow, Spark, Python 3, CI/CD Pipelines, Google Cloud Platform (GCP)...

Experience

Python 3 - 7 years SQL - 7 years Amazon Web Services (AWS) - 5 years Apache Kafka - 4 years ETL - 4 years Delta Lake - 4 years Spark - 4 years Azure - 4 years

Availability

Full-time

Preferred Environment

Python 3, SQL, Amazon Web Services (AWS), Azure, Apache Kafka, Delta Lake, ETL, APIs, Databricks, Spark

The most amazing...

...thing I've developed is a Python data ingestor used to migrate over 3,000 data jobs to the cloud from a 15-petabyte data lake.

Work Experience

Data Engineer

2021 - PRESENT

Santander Brasil

Built data pipelines using Jenkins and GitLab while adhering to DataOps best practices and incorporating CI/CD methodologies.
Led the rewriting of the Scala library for Python. This solution was used to ingest the data while proficiently handling various file formats and managing substantial data volumes ranging from gigabytes to terabytes.
Contributed to the extract, transform, and load (ETL) process and the data ingestion process into a data lake using the Delta library within the Azure Databricks cluster for efficient data management and processing.
Developed administration environment packages for the CDP and CDH system, including Spark and Hadoop ecosystem, HDFS, Hive, and Apache HBase from Cloudera. Managed routines using on-premise OpenShift and Azure Kubernetes Service (AKS) for the cloud.
Collaborated on implemented frameworks to facilitate continuous data consumption for business teams, data scientists, data analysts, and other stakeholders.
Created data ingestion processes that delivered processed data to a sandbox and data warehouse layer, facilitating integration with visualization and analysis tools.

Technologies: Python 3, SQL, Spark, Kubernetes, ETL, Azure Databricks, Delta Lake, CI/CD Pipelines, Apache Kafka, Agile Sprints, Data Architecture, GitHub, Data Engineering, PySpark

Data Engineer

2022 - 2023

Powerdot

Led a data pipeline development, adhering to the best practices of DataOps with CI/CD principles. Implemented the pipeline using a GitLab pipeline for effective and streamlined development processes.
Headed the construction of a data lake on the cloud, specifically on AWS, utilizing a Kubernetes pod orchestration environment through Amazon EKS.
Developed and assisted in implementing best practices in data mesh, modeling, governance, and monitoring.
Utilized Meltano and Airbyte to extract and load data to integrate it with various sources, including HubSpot, Stripe, and Google Sheets.
Utilized Airflow to extract and load data from Powerdot's internal MySQL consumption, explicitly focusing on electric vehicle charges (EVCs) data.
Constructed ETL processes to transform data from the data lake layers, employing AWS Glue with Delta Lake. Implemented an AWS Glue crawler to read from the raw layer and move the data to the gold layer.
Led the design of data architecture based on the modern data stack (MDS) recommended by the data community.

Technologies: Python 3, Spark, Amazon Web Services (AWS), ETL, MySQL, CI/CD Pipelines, Apache Airflow, Docker, Kubernetes, Delta Lake, Data Architecture, GitHub, Data Engineering, PySpark

Data Engineer

2020 - 2021

Dextra

Developed a web scraping tool for data discovery, employed to populate client databases. This tool facilitated cross-referencing of data with additional information, enabling the extraction of valuable insights.
Conducted analysis to provide updates to stakeholders. Built new KPIs using PySpark in the Azure Databricks environment to be available in the client's Power BI.
Cleaned and transformed data sourced from various file types and platforms using ETL tools, including Airflow, Spark, Databricks, and Delta Lake.
Developed ETL jobs for ingesting customer source data into the different tiers of the data lake, including raw, bronze, silver, and gold.

Technologies: Apache Airflow, Spark, Python 3, CI/CD Pipelines, Google Cloud Platform (GCP), ETL, Delta Lake, Databricks, Agile Sprints, APIs, Data Architecture, GitHub, Data Engineering, PySpark

Machine Learning Engineer

2020 - 2020

Mutant

Headed the MLOps project in collaboration with Microsoft, leveraging various cloud services such as machine learning as a service (MLaaS), store accounts, container registries, Azure Kubernetes Service (AKS), and Azure Container Instances (ACI).
Applied best practices for model versioning, model training, supervised learning (SL), classification, and natural language processing (NLP).
Integrated Virtual Service Agents (VDA) and chatbots that used NLP techniques and cross-validation scores. The focus was consistently on improving model accuracy and optimizing the model lifecycle.

Technologies: Spark ML, CI/CD Pipelines, Agile Sprints, Azure Machine Learning, MySQL, TensorFlow, Natural Language Processing (NLP), Azure Kubernetes Service (AKS), Machine Learning Operations (MLOps), Python 3, GitHub, PySpark

Full-stack Developer

2019 - 2020

Mutant

Built a microservices application using Azure Functions to support the machine learning platform used by the data scientist team.
Developed a web platform for managing the knowledge base of the machine learning training model using Python for the API and React for the front end.
Designed and implemented a new relational database model using MySQL to store machine learning models created by the data scientist team.
Optimized the workload using the new platform, integrating it with cloud services for training machine learning models, and databases storing datasets utilized in the training process.

Technologies: Docker, Python 3, Agile Sprints, Git, GitLab CI/CD, APIs, Amazon Web Services (AWS), NoSQL, Azure, React, MySQL, GitHub

Full-stack Developer

2017 - 2019

FX Retail Analytics

Built a microservices application using AWS Lambda and the Serverless Framework, reducing the response time of the APIs.
Collaborated with project stakeholders and assisted them in gathering requirements for the development of new features on both the Python back end and React front end.
Served as a database administrator, overseeing both relational and non-relational databases. Automated routine processes using Python and shell scripts to enhance operational efficiency.
Integrated an API interface between the FX platform and its clients, addressing and resolving bugs and potential issues to enhance the overall functionality and integration efficiency.
Utilized pandas, JupyterHub, and Matplotlib to generate statistical data for marketing purposes by collecting information from IoT devices. Facilitated the monitoring of AWS infrastructure services.

Technologies: Docker, Python 3, APIs, Amazon Web Services (AWS), NoSQL, GitLab CI/CD, Git, MySQL, Agile Sprints, React, GitHub

Junior System Analyst

2016 - 2017

FX Retail Analytics

Delivered customer service to end customers by addressing their inquiries and concerns, ensuring satisfaction through effective communication and resolution.
Conducted audits of flow counting equipment using computer vision technology.
Analyzed statistical data for retail flow counts, extracting insights and patterns to inform decision-making and optimize operations.

Technologies: Python 3, Agile Sprints, Docker, Amazon Web Services (AWS), GitHub

Experience

New Infrastructure

https://www.fxdata.com.br/solucoes/fx-go-analytics/

A project focused on improving the legacy infrastructure. The primary objective of this new topology was to optimize cloud resources, reducing costs while addressing issues identified in the legacy infrastructure. I actively contributed to rewriting applications and APIs initially developed in Java and PHP. The migration involved adopting Python, Node.js, and React, with automated deployment facilitated by Docker and CI/CD processes.

Move to Cloud

Decommission of the Cloudera clusters due to high costs and performance loss in the databases on the bank's on-premise servers. To solve this problem, it was necessary to migrate the entire data lake and legacy applications to Azure's cloud. This challenge included legacy bases with over 3,000 daily jobs feeding them from legacy sources such as the mainframe, SQL Server, Oracle Exadata Database Machine, SaaS, MySQL, and PostgreSQL.

My team and I built capabilities to serve as a data platform. This platform was designed to support the bank's teams in migrating their data and applications (jobs) to the cloud environment. The architecture devised for this migration facilitated the work of the bank's teams, leveraging a cloud-native stack from Azure, including Azure Data Lake Storage Gen2, Data Factory, Databricks, Key Vault, and service principal names (SPNs).

We developed a Python library (PySpark 3) for ETL processes, ensuring governance, quality, performance, access control, and compliance with LGPD, Brazil's General Data Protection Law. Having managed this library from 2021, we've migrated 80% of the data lake's data and 40% of the applications. This migration is expected to reduce costs by about 30% in the coming years.

Skillset

Languages

Python 3, SQL

Tools

GitLab CI/CD, Git, GitHub, Spark SQL, Azure Machine Learning, Azure Kubernetes Service (AKS), Apache Airflow

Platforms

Docker, Amazon Web Services (AWS), Azure, Databricks, Kubernetes, Apache Kafka, Google Cloud Platform (GCP), AWS IoT

Storage

Databases, NoSQL, MySQL

Other

APIs, Programming, Operating Systems, Agile Sprints, Delta Lake, Projects, Front-end, Back-end, Engineering Software, Big Data Architecture, CI/CD Pipelines, Natural Language Processing (NLP), Machine Learning Operations (MLOps), Azure Databricks, Data Architecture, Data Engineering, SAP ERP, IT Security

Frameworks

Spark, Hadoop, YARN

Libraries/APIs

React, Spark ML, TensorFlow, PySpark, Node.js

Paradigms

ETL

Education

2020 - 2021

Master of Business Administration (MBA) in Data Engineering

FIAP - São Paulo, Brazil

2014 - 2018

Bachelor's Degree in Information Systems

Uninove - São Paulo, Brazil

Certifications

NOVEMBER 2023 - PRESENT

Microsoft Certified: Azure Fundamentals

Microsoft

JULY 2021 - PRESENT

Apache Spark (TM) SQL for Data Analysts

Coursera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring