Renato Pedroso Neto, Developer in São Paulo - State of São Paulo, Brazil
Renato is available for hire
Hire Renato

Renato Pedroso Neto

Verified Expert  in Engineering

Data Engineer and Developer

São Paulo - State of São Paulo, Brazil
Toptal Member Since
April 14, 2022

Renato has 13+ years of experience in big data projects. He has worked for Databricks, Capco, and financial institutions. Renato has migrated petabytes of data to on-premise and cloud data lake environments, architected entire lakehouses, implemented machine learning models that provided intelligent suggestions to clients and managed multicultural data teams that delivered data projects to top-notch banks in Brazil. He has a master's degree in big data.


Spark, Databricks, Big Data, Client Relationship Management, Redash, Delta Lake...
Comniscient Technologies LLC dba Comlinkdata
Data Engineering, Data Pipelines, Python, Scala, Spark, Amazon Athena...
An Online Freelance Agency
Spark, Python, Apache Kafka, Cloud, Amazon Web Services (AWS), PySpark, ETL...




Preferred Environment

Spark, Databricks, Python, Amazon Web Services (AWS), Google Cloud Platform (GCP), Machine Learning, Big Data, Amazon Elastic MapReduce (EMR), SQL, Amazon RDS

The most amazing...

...project was for a Brazilian open banking data ingestion with machine learning to guarantee quality and provide a good data source for financial institutions.

Work Experience

Delivery Solutions Architect

2021 - PRESENT
  • Increased customer usage by 2x by analyzing their data and suggesting improvements.
  • Performed stress tests on a Spark environment generating and hashing 1 trillion lines in 28 minutes.
  • Acquired AWS Solutions Architect and Spark Developer certifications.
Technologies: Spark, Databricks, Big Data, Client Relationship Management, Redash, Delta Lake, Python, Amazon Web Services (AWS), PySpark, ETL, Data Lakes, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), Amazon S3 (AWS S3), Amazon EC2, Snowflake, AWS Glue, ELT, Databases

Data Engineer

2022 - 2023
Comniscient Technologies LLC dba Comlinkdata
  • Developed new metrics for telecom market data and insights platform, using Spark to help the customer understand customers' behavior.
  • Helped to construct and evolve a product to check network operators' competitiveness in a country.
  • Implemented, in Airflow, new DAGs to transform telecom data using Spark.
Technologies: Data Engineering, Data Pipelines, Python, Scala, Spark, Amazon Athena, Amazon Web Services (AWS), Data Analytics, Business Intelligence (BI), Redshift, Data Warehousing, Amazon S3 (AWS S3), Amazon EC2, AWS Glue, ELT, Databases

Data Engineer

2022 - 2022
An Online Freelance Agency
  • Worked with a client to architect, construct, and support data pipelines from an on-premise to cloud environment.
  • Rearchitected the client's data pipeline in the cloud, reducing the total cost of ownership (TCO) by 40%.
  • Provided consulting on Python code, including general guidance and best practices.
Technologies: Spark, Python, Apache Kafka, Cloud, Amazon Web Services (AWS), PySpark, ETL, Data Lakes, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), Redshift, Data Warehousing, Amazon S3 (AWS S3), Amazon EC2, PostgreSQL, ELT, Databases

Lead Data Engineer | Architect | Scientist

2016 - 2021
  • Standardized the data practice and shipped it as an official Capco product.
  • Owned all data projects for Capco's consultancy and Innovation Labs.
  • Led the development of open banking data ingestion and standardization to deliver directly to financial institutions.
  • Created and fine-tuned a natural language model for financial institutions.
  • Developed a market data pipeline for Capco's client prospecting.
Technologies: Google Cloud Platform (GCP), Python, Machine Learning, Data Engineering, Data Architecture, Big Data, Prototyping, Spark, People Management, Amazon Web Services (AWS), PySpark, ETL, Redshift, Data Lakes, Message Queues, Stream Processing, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), Amazon S3 (AWS S3), Amazon EC2, PostgreSQL, MongoDB, AWS Glue, Predictive Modeling, ELT, Redis, Databases

Big Data Systems Engineer

2014 - 2016
Banco Itaú
  • Migrated 10PB of data from a mainframe to Hadoop environment, creating reliable data pipelines.
  • Delivered 99.99% data availability in a Hadoop distributed file system (HDFS) environment.
  • Created a central hub of information for the whole bank.
  • Institutionalized parallel processing using Spark, delivering fast results to business areas.
Technologies: Spark, Hadoop, Apache Hive, MySQL, Mainframe, PySpark, ETL, Data Lakes, Data Warehousing, Apache Spark, Data, Data Analysis, Data Analytics, Business Intelligence (BI), ELT, Databases

Open Banking Data Ingestion

Open banking data ingestion, cleaning, and standardization. The project aimed to capture all open banking data in Brazil and sell subscriptions for financial institutions to access it. The whole project was developed in GCP using serverless architecture and parallel processing principles. The data quality was guaranteed using machine learning models.

Financial Data Web Scraping

Constructed web scraping software to capture data from a Brazilian broker. This data was ingested in a MySQL database for further analysis to help with investment backtesting. This project counted with threading and parallel processing techniques using native Python objects.

Beacon Data Analysis

Beacon (IoT) data analysis to predict customer behavior and theft at ATMs. The prototype analyzed NRT data from several IoT devices to track paths inside a bank branch, survey ATMs, and inform the security team about abnormalities

Monolith Decomposition

Cobol code analysis and machine learning model implementations to help financial institutions analyze the best way to break a Cobol monolith application into meaningful microservices for platform modernizations.

Sentiment Analysis for Financial Institutions

Sentiment analysis model as a service for financial institutions. The idea was to train a model for banks to better extract sentiment over text and voice to increase client retention and satisfaction.

Mainframe to Big Data Environment Engineering

A huge data transfer pipeline from mainframe environments to on-premise Hadoop provided by Cloudera. The project consisted of creating the data quality layer, ingestion, and delivery to business areas.
2015 - 2016

Specialization in Data Science

Johns Hopkins University | via Coursera - Sao Paulo, Brazil

2013 - 2015

Master's Degree in Big Data

Faculdade de Informática e Administração Paulista (FIAP) - Sao Paulo, Brazil

2007 - 2011

Bachelor's Degree in Computer Science

Mackenzie University - Sao Paulo, Brazil


Databricks Certified Machine Learning Professional



Databricks Certified Data Engineer Professional



Databricks Certified Associate Developer for Apache Spark 3.0



AWS Certified Solutions Architect Associate



Machine Learning Engineer



Data Science Specialization



Getting and Cleaning Data



Dell EMC Data Science Associate (EMCDSA)

Dell EMC


Linux Professional Institute 101 (LPIC-1)

Linux Professional Institute


Spark Streaming, PySpark, Pandas, Scikit-learn, NumPy, Beautiful Soup, Selenium WebDriver


Git, Apache Airflow, Amazon Elastic MapReduce (EMR), Redash, BigQuery, Amazon Simple Queue Service (SQS), Amazon Transcribe, Amazon QuickSight, Amazon Athena, AWS Glue, Apache Maven


Spark, Apache Spark, Hadoop, Flask, Selenium, Scrapy


Data Science, ETL, Business Intelligence (BI), Logic Programming


Python, SQL, COBOL, XPath, Scala, Snowflake


Databricks, Amazon Web Services (AWS), Linux, Amazon EC2, Google Cloud Platform (GCP), Apache Kafka


Databases, Apache Hive, Data Pipelines, Redshift, Data Lakes, Amazon S3 (AWS S3), NoSQL, MySQL, Google Cloud Datastore, PostgreSQL, MongoDB, Redis


Machine Learning, Big Data, Data Engineering, Data Warehousing, Data, Data Analysis, Data Analytics, ELT, Systems Analysis, Cloud, Stream Processing, Scraping, Data Scraping, Web Scraping, Predictive Modeling, Amazon RDS, Operating Systems, IT Systems Architecture, Neural Networks, Statistics, Deep Learning, Data Modeling, Mainframe, Data Architecture, Prototyping, People Management, Client Relationship Management, Delta Lake, Google Cloud Functions, Pub/Sub, Vertex, Apache Superset, Clustering, Reporting, Natural Language Processing (NLP), APIs, Message Queues, Generative Pre-trained Transformers (GPT), Processing & Threading

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring