Renato is available for hire

Renato Pedroso Neto

Verified Expert in Engineering

Data Engineer and Developer

Location

São Paulo - State of São Paulo, Brazil

Toptal Member Since

April 14, 2022

Renato has 13+ years of experience in big data projects. He has worked for Databricks, Capco, and financial institutions. Renato has migrated petabytes of data to on-premise and cloud data lake environments, architected entire lakehouses, implemented machine learning models that provided intelligent suggestions to clients and managed multicultural data teams that delivered data projects to top-notch banks in Brazil. He has a master's degree in big data.

Data Analysis Data Analytics Big Data Data Engineering Machine Learning Data Warehousing Databases Spark Python SQL PySpark ETL Apache Spark Amazon Web Services (AWS)Databricks Databricks Platform Logic Apache Superset

Portfolio

Databricks

Spark, Databricks, Big Data, Client Relationship Management, Redash, Delta Lake...

Comniscient Technologies LLC dba Comlinkdata

Data Engineering, Data Pipelines, Python, Scala, Spark, Amazon Athena...

An Online Freelance Agency

Spark, Python, Apache Kafka, Cloud, Amazon Web Services (AWS), PySpark, ETL...

Experience

Python - 8 years Big Data - 8 years SQL - 8 years Spark - 8 years Data Engineering - 8 years Data Lakes - 8 years Data Science - 6 years Databricks - 1 year

Availability

Part-time

Preferred Environment

Spark, Databricks, Python, Amazon Web Services (AWS), Google Cloud Platform (GCP), Machine Learning, Big Data, Amazon Elastic MapReduce (EMR), SQL, Amazon RDS

The most amazing...

...project was for a Brazilian open banking data ingestion with machine learning to guarantee quality and provide a good data source for financial institutions.

Work Experience

Delivery Solutions Architect

2021 - PRESENT

Databricks

Increased customer usage by 2x by analyzing their data and suggesting improvements.
Performed stress tests on a Spark environment generating and hashing 1 trillion lines in 28 minutes.
Acquired AWS Solutions Architect and Spark Developer certifications.

Technologies: Spark, Databricks, Big Data, Client Relationship Management, Redash, Delta Lake, Python, Amazon Web Services (AWS), PySpark, ETL, Data Lakes, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), Amazon S3 (AWS S3), Amazon EC2, Snowflake, AWS Glue, ELT, Databases

Data Engineer

2022 - 2023

Comniscient Technologies LLC dba Comlinkdata

Developed new metrics for telecom market data and insights platform, using Spark to help the customer understand customers' behavior.
Helped to construct and evolve a product to check network operators' competitiveness in a country.
Implemented, in Airflow, new DAGs to transform telecom data using Spark.

Technologies: Data Engineering, Data Pipelines, Python, Scala, Spark, Amazon Athena, Amazon Web Services (AWS), Data Analytics, Business Intelligence (BI), Redshift, Data Warehousing, Amazon S3 (AWS S3), Amazon EC2, AWS Glue, ELT, Databases

Data Engineer

2022 - 2022

An Online Freelance Agency

Worked with a client to architect, construct, and support data pipelines from an on-premise to cloud environment.
Rearchitected the client's data pipeline in the cloud, reducing the total cost of ownership (TCO) by 40%.
Provided consulting on Python code, including general guidance and best practices.

Technologies: Spark, Python, Apache Kafka, Cloud, Amazon Web Services (AWS), PySpark, ETL, Data Lakes, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), Redshift, Data Warehousing, Amazon S3 (AWS S3), Amazon EC2, PostgreSQL, ELT, Databases

Lead Data Engineer | Architect | Scientist

2016 - 2021

Capco

Standardized the data practice and shipped it as an official Capco product.
Owned all data projects for Capco's consultancy and Innovation Labs.
Led the development of open banking data ingestion and standardization to deliver directly to financial institutions.
Created and fine-tuned a natural language model for financial institutions.
Developed a market data pipeline for Capco's client prospecting.

Technologies: Google Cloud Platform (GCP), Python, Machine Learning, Data Engineering, Data Architecture, Big Data, Prototyping, Spark, People Management, Amazon Web Services (AWS), PySpark, ETL, Redshift, Data Lakes, Message Queues, Stream Processing, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), Amazon S3 (AWS S3), Amazon EC2, PostgreSQL, MongoDB, AWS Glue, Predictive Modeling, ELT, Redis, Databases

Big Data Systems Engineer

2014 - 2016

Banco Itaú

Migrated 10PB of data from a mainframe to Hadoop environment, creating reliable data pipelines.
Delivered 99.99% data availability in a Hadoop distributed file system (HDFS) environment.
Created a central hub of information for the whole bank.
Institutionalized parallel processing using Spark, delivering fast results to business areas.

Technologies: Spark, Hadoop, Apache Hive, MySQL, Mainframe, PySpark, ETL, Data Lakes, Data Warehousing, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), ELT, Databases

Experience

Open Banking Data Ingestion

Open banking data ingestion, cleaning, and standardization. The project aimed to capture all open banking data in Brazil and sell subscriptions for financial institutions to access it. The whole project was developed in GCP using serverless architecture and parallel processing principles. The data quality was guaranteed using machine learning models.

Financial Data Web Scraping

Constructed web scraping software to capture data from a Brazilian broker. This data was ingested in a MySQL database for further analysis to help with investment backtesting. This project counted with threading and parallel processing techniques using native Python objects.

Beacon Data Analysis

Beacon (IoT) data analysis to predict customer behavior and theft at ATMs. The prototype analyzed NRT data from several IoT devices to track paths inside a bank branch, survey ATMs, and inform the security team about abnormalities

Monolith Decomposition

Cobol code analysis and machine learning model implementations to help financial institutions analyze the best way to break a Cobol monolith application into meaningful microservices for platform modernizations.

Sentiment Analysis for Financial Institutions

Sentiment analysis model as a service for financial institutions. The idea was to train a model for banks to better extract sentiment over text and voice to increase client retention and satisfaction.

Mainframe to Big Data Environment Engineering

A huge data transfer pipeline from mainframe environments to on-premise Hadoop provided by Cloudera. The project consisted of creating the data quality layer, ingestion, and delivery to business areas.

Education

2015 - 2016

Specialization in Data Science

Johns Hopkins University | via Coursera - Sao Paulo, Brazil

2013 - 2015

Master's Degree in Big Data

Faculdade de Informática e Administração Paulista (FIAP) - Sao Paulo, Brazil

2007 - 2011

Bachelor's Degree in Computer Science

Mackenzie University - Sao Paulo, Brazil

Certifications

JANUARY 2023 - PRESENT

Databricks Certified Machine Learning Professional

Databricks

MAY 2022 - PRESENT

Databricks Certified Data Engineer Professional

Databricks

FEBRUARY 2022 - FEBRUARY 2024

Databricks Certified Associate Developer for Apache Spark 3.0

Databricks

JANUARY 2022 - JANUARY 2025

AWS Certified Solutions Architect Associate

AWS

FEBRUARY 2018 - PRESENT

Machine Learning Engineer

Udacity

OCTOBER 2016 - PRESENT

Data Science Specialization

Coursera

MAY 2016 - PRESENT

Getting and Cleaning Data

Coursera

MARCH 2016 - PRESENT

Dell EMC Data Science Associate (EMCDSA)

Dell EMC

MARCH 2014 - PRESENT

Linux Professional Institute 101 (LPIC-1)

Linux Professional Institute

Skills

Libraries/APIs

Spark Streaming, PySpark, Pandas, Scikit-learn, NumPy, Beautiful Soup, Selenium WebDriver

Tools

Git, Apache Airflow, Amazon Elastic MapReduce (EMR), Redash, BigQuery, Amazon Simple Queue Service (SQS), Amazon Transcribe, Amazon QuickSight, Amazon Athena, AWS Glue, Apache Maven

Frameworks

Spark, Apache Spark, Hadoop, Flask, Selenium, Scrapy

Paradigms

Data Science, ETL, Business Intelligence (BI), Logic Programming

Languages

Python, SQL, COBOL, XPath, Scala, Snowflake

Platforms

Databricks, Amazon Web Services (AWS), Linux, Amazon EC2, Google Cloud Platform (GCP), Apache Kafka

Storage

Databases, Apache Hive, Data Pipelines, Redshift, Data Lakes, Amazon S3 (AWS S3), NoSQL, MySQL, Google Cloud Datastore, PostgreSQL, MongoDB, Redis

Other

Machine Learning, Big Data, Data Engineering, Data Warehousing, Data, Data Analysis, Data Analytics, ELT, Systems Analysis, Cloud, Stream Processing, Scraping, Data Scraping, Web Scraping, Predictive Modeling, Amazon RDS, Operating Systems, IT Systems Architecture, Neural Networks, Statistics, Deep Learning, Data Modeling, Mainframe, Data Architecture, Prototyping, People Management, Client Relationship Management, Delta Lake, Google Cloud Functions, Pub/Sub, Vertex, Apache Superset, Clustering, Reporting, Natural Language Processing (NLP), APIs, Message Queues, GPT, Generative Pre-trained Transformers (GPT), Processing & Threading

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring