Bruno Ruaro de Souza, Developer in Porto Alegre - State of Rio Grande do Sul, Brazil
Bruno is available for hire
Hire Bruno

Bruno Ruaro de Souza

Verified Expert  in Engineering

Data Engineer and Developer

Porto Alegre - State of Rio Grande do Sul, Brazil

Toptal member since August 23, 2024

Bio

Bruno has over 15 years of experience with data, building robust, reliable, and scalable data pipelines for big industries and IT consulting companies. He specializes in SQL, Python, PySpark, Airflow, and cloud environments such as GCP, Azure, and AWS. Bruno is a motivated engineer excited to take on his next challenge.

Portfolio

Hvar
Python, SQL, PySpark, Google Cloud Platform (GCP), Amazon Web Services (AWS)...
Raizen
Python, SQL, Apache Airflow, Data Modeling, Microsoft Azure, Kubernetes...
CMPC
Python, Google Cloud Platform (GCP), PySpark, Data Modeling, Apache Beam, Spark...

Experience

  • SQL - 15 years
  • Google Cloud Platform (GCP) - 7 years
  • Python - 7 years
  • PySpark - 7 years
  • Data Engineering - 7 years
  • Data Modeling - 7 years
  • Data Pipelines - 7 years
  • Apache Airflow - 3 years

Availability

Full-time

Preferred Environment

Python, Google Cloud Platform (GCP), SQL

The most amazing...

...achievement was reducing the processing time of a data pipeline that ranked graph logistic routes for oil derivatives distribution from 90 minutes to 5 seconds.

Work Experience

Tech Lead

2023 - PRESENT
Hvar
  • Led a project to adapt all the analytics data pipelines of a financial company that provides credit for customers of a world-class car manufacturer in Brazil.
  • Spearheaded a migration of a world-class car manufacturing eCommerce application from AWS to GCP in Brazil.
  • Coordinated a data migration from Microsoft SQL Server, running on AWS, to PostgreSQL, running on GCP, within the planned schedule and under critical circumstances, from the Middle East to Brazil for a big CRM company.
  • Assessed and planned the structure of a new analytics department, using GCP services and an event-oriented architecture to deliver near real-time data insights.
Technologies: Python, SQL, PySpark, Google Cloud Platform (GCP), Amazon Web Services (AWS), Leadership, Data Engineering, Data Architecture, Data Modeling, Technical Leadership, Data, Spark, Looker, Data Pipelines, ETL, Software, ETL Tools, Apache, Data Cleaning, Data Cleansing, Apache Kafka, Pub/Sub, Google Pub/Sub, Big Data Architecture, Event-driven Architecture, AWS Lambda, AWS Glue, Amazon S3 (AWS S3), Apache Airflow, Big Data, BigQuery, Cloud Firestore, Database Modeling, DataViz, Data Visualization, Data Lakes, Data Warehousing, Data Warehouse Design, Data Catalog Implementation, Data Quality, APIs, RESTful Services, Microservices, Web Services, Microservices Architecture

Staff Data Engineer

2021 - 2023
Raizen
  • Reduced the processing time from 90 minutes to just five seconds and significantly reduced the computational resources used to generate oil derivative transport routes graphs.
  • Worked on a project to optimize the two-month planning of the oil derivatives trade as a technical reference and mentored one middle and one intern data engineer.
  • Contributed to a project to optimize the one-week planning of the oil derivatives trade as a technical reference and mentored one senior, two middle, and one intern data engineer.
  • Maintained a data quality service that use the Great Expectations framework, an application for manual data uploads built with Node.js and Vue, and a data pipeline for ingesting manual data.
Technologies: Python, SQL, Apache Airflow, Data Modeling, Microsoft Azure, Kubernetes, Node.js, Technical Leadership, Data, PySpark, Spark, Big Data, Leadership, Data Engineering, Database Modeling, Data Pipelines, ETL, Excel 2016, Microsoft Excel, Apache, ETL Tools, Data Cleaning, Data Cleansing, Apache Kafka, RESTful Microservices, RESTful Services, APIs, Web Services, Microservices, Microservices Architecture, Event-driven Architecture, Data Warehousing, Azure Data Lake, Azure Synapse, Data Lakehouse, Data Lakes, Data Lake Design, Azure Data Lake Storage, Azure Storage, Azure

Senior Data Engineer

2021 - 2021
CMPC
  • Designed a data model and a data ingestion pipeline for more than 250 industrial process variables to run nearly in real time, inserting data into BigQuery and Firestore every minute.
  • Developed data pipelines using Google Cloud Dataflow, Apache Beam, and other GCP services.
  • Created ETL processes to transfer data from OSIsoft PI System to GCP.
  • Created a cost forecast spreadsheet, estimating cloud expenditures regarding storage, CPU, memory, and network.
Technologies: Python, Google Cloud Platform (GCP), PySpark, Data Modeling, Apache Beam, Spark, SQL, Data, Big Data, Data Engineering, Database Modeling, Data Pipelines, Data Orchestration, ETL, Excel 2016, Microsoft Excel, Apache, ETL Tools, Data Cleaning, Data Cleansing, Data Quality, Data Warehousing, Data Warehouse Design, Data Lakes, Data Lake Design, Microservices, REST, RESTful Services, RESTful Microservices, Web Services

Data Engineer

2020 - 2021
HVAR Consulting
  • Created deployment scripts of all the product components used to get KPIs from audio recorded in call centers, aiming to develop strategies and take relevant business decisions to improve profit.
  • Developed data pipelines for a product that does audio analytics using GCP services.
  • Improved the orchestration of Taka and services data pipelines of an audio processing product used to generate business metrics.
Technologies: Python, Docker, BigQuery, Google Cloud Platform (GCP), SQL, Looker, Data Pipelines, ETL, Data Engineering, Data, Big Data, Apache, Apache Kafka, Pub/Sub, Google Pub/Sub, Data Cleaning, Data Cleansing, Data Quality, REST, RESTful Services, RESTful Microservices, Microservices, Microservices Architecture, Event-driven Architecture, Data Warehousing, Data Warehouse Design, Data Lakes, Google Compute Engine (GCE), Google Cloud Storage

Middle Business Analyst

2011 - 2014
Senai - PR
  • Gathered information about research groups and patents on nanotechnology for a computer manufacturer, which generated tests and business opportunities by improving its products.
  • Interviewed leaders from Brazilian high-tech and pioneering applied research laboratories, companies, and organizations, gathering information for marketing research for Senai's to-be innovation institutes.
  • Created spreadsheets and criteria to prioritize the matching between innovative laboratories, companies and organizations, and Senai's to-be innovation institutes.
Technologies: Microsoft Excel, Microsoft PowerPoint, Office 2010

Engineering Intern

2010 - 2010
Enercons Renewable Energy Consulting
  • Created a spreadsheet to transform azimuths and distances of land boundaries into coordinates, automating the calculation and the drawing of property maps for wind energy prospecting.
  • Optimized turbine configurations for hydroelectric plants, using Excel solver add-in.
  • Carried out comparative studies of different types of dam gates, evaluating their technical and environmental viability.
Technologies: AutoCAD, Excel 2010, Excel VBA, Microsoft Excel, Excel Add-ins

Experience

Leis Anotadas

An app that facilitates the study of Brazilian laws by allowing users to comment on each paragraph of legal documents. I designed, implemented, and deployed the app entirely. It comprises a back end, a responsive front end, a MongoDB database, and a crawler.

Carta Farol (Lighthouse)

An app that monitors industrial process variables in almost real time, identifying the probability of each one going out of the configured range in the next few minutes. I contributed to modeling, designing, and implementing the entire data engine for the app, composed of the extraction from the OSISoft PI system to GCP, the calculation of the z-scores of each variable, and the insertion of the data at Cloud Firestore.

The client was CPMC, a pulp and paper company, and the users were the continuous improvement team, which monitors the app on a big screen in a control room at the factory, acting when the probability of a variable going out of range is significant. The data implementation was done in only two months.

This application significantly reduced the amount of industrial maintenance and, consequently, significantly reduced manufacturing costs.

Data Integration for SOS-RS

Worked on SOS-RS, an app to match shelter needs and donations during the 2024 floods in Rio Grande do Sul, Brazil. I created data pipelines to export data from SOS-RS API to Airtable so the back office team could work on the tables more easily. I also made a map of shelters with information about them.

The users were people affected by the floods, shelter seekers, volunteers, donors, and the government. The app greatly helped coordinate resources during an unexpected natural disaster.

Education

2005 - 2010

Bachelor's Degree in Electrical Engineering

Federal University of Paraná - Curitiba, Paraná, Brazil

Certifications

JANUARY 2022 - PRESENT

Astronomer Certification for Apache Airflow Fundamentals

Astronomer

Skills

Libraries/APIs

PySpark, Node.js, Google Maps, Google Sheets API

Tools

BigQuery, Apache Airflow, Apache Beam, NGINX, Google Compute Engine (GCE), Google Sheets, Looker, Microsoft Excel, Microsoft PowerPoint, Excel 2016, AutoCAD, Excel 2010, Apache, AWS Glue, DataViz, Office 2010

Languages

Python, SQL, JavaScript, HTML, CSS, Excel VBA

Paradigms

ETL, Event-driven Architecture, Microservices, Microservices Architecture, REST

Storage

Data Pipelines, API Databases, MongoDB, Cloud Firestore, Database Modeling, Amazon S3 (AWS S3), Data Lakes, Data Lake Design, Azure Storage, Google Cloud Storage

Platforms

Google Cloud Platform (GCP), Docker, Amazon Web Services (AWS), Kubernetes, Databricks, Apache Kafka, AWS Lambda, Azure Synapse, Azure Data Lake Storage, Azure

Frameworks

Spark, Data Lakehouse

Other

Data Engineering, Data Modeling, Data Quality, Data Management, Distributed Systems, Leadership, Data Architecture, Microsoft Azure, Directed Acrylic Graphs (DAG), Orchestration, Scheduling, Programming, Technical Leadership, Data, Big Data, Data Orchestration, Excel Add-ins, Software, Medical Equipment, Digital Signal Processing, Data Processing, Neural Networks, Statistics, Machine Learning, ETL Tools, Data Cleaning, Data Cleansing, Pub/Sub, Google Pub/Sub, Big Data Architecture, Data Visualization, Data Warehousing, Data Warehouse Design, Data Catalog Implementation, Content Management Systems (CMS), Web Scraping, APIs, RESTful Services, Web Services, RESTful Microservices, Azure Data Lake

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring