Edson Cavalcanti Neto, Developer in Fortaleza - State of Ceará, Brazil
Edson is available for hire
Hire Edson

Edson Cavalcanti Neto

Verified Expert  in Engineering

Bio

Edson is a data specialist with 12+ years of experience building and managing complex data pipelines and architectures. He's an expert in integrating diverse technologies, including Python, AWS Glue, Apache Airflow, Data Build Tool (dbt), Apache Kafka, Spark, and Docker for scalable data solutions. Edson has a strong track record of delivering cost-effective and timely solutions, enhancing data-driven decision-making, and leading teams to innovate and optimize operations.

Portfolio

Cavalcanti Inovações Tecnológicas
Python, Pandas, Redis, Data Build Tool (dbt), Azure, AWS Glue, AWS IAM...
Mobit Brasil Ltda
Python, Pandas, Redis, Cassandra, Apache Kafka, Spark, Apache Airflow, OpenCV...
Instituto Atlântico
Machine Learning, Data Engineering, Hadoop, PySpark, AutoML, Python...

Experience

  • Python - 12 years
  • Data Science - 10 years
  • Git - 10 years
  • SQL - 8 years
  • ETL - 6 years
  • Data Engineering - 5 years
  • AWS Glue - 3 years
  • PySpark - 2 years

Availability

Part-time

Preferred Environment

Python, PySpark, AWS Glue, SQL, Apache Airflow, ETL, Data Engineering, Git, Amazon Web Services (AWS), Amazon S3 (AWS S3)

The most amazing...

...system I've developed is a monitor for equipment in Brazilian states that decreases the average incident detection time from five days to three hours.

Work Experience

Data Engineer

2021 - PRESENT
Cavalcanti Inovações Tecnológicas
  • Created a jobs pipeline for data ingestion and data cleaning.
  • Developed jobs to access sources, including APIs, SharePoint, Amazon S3 (AWS S3), SFTP, and Google Drive, to ingest data to the data lake.
  • Leveraged AWS Glue and Apache Airflow to run scripts to export the data from Amazon S3 to Snowflake.
  • Performed data transformation and load to Snowflake using dbt.
  • Built a CI/CD pipeline for automatic dashboard deployment and update.
  • Implemented Redshift tables and views for the data analysis team.
  • Actively collaborated with software engineers and data analysts.
  • Provided Docker infrastructure to run data pipelines, including ETL solutions, data transformation, cleaning, and model training.
Technologies: Python, Pandas, Redis, Data Build Tool (dbt), Azure, AWS Glue, AWS IAM, Amazon S3 (AWS S3), PySpark, Pytest, Apache Airflow, Flask, SQL, Snowflake, SFTP, APIs, REST, CI/CD Pipelines, GitLab, SharePoint, Shopify API, Google Analytics 4, Google Analytics, Data Engineering, Data Analysis, Microsoft Power BI, Redshift, Data Pipelines, Business Requirements, Jira, GitHub, Requirements Analysis, Amazon EMR Studio, Data Modeling, ETL, Cloud, Tableau, Amazon Web Services (AWS), Business Intelligence (BI), Data Analytics, Data Reporting, Databases, Azure Data Lake, Azure Synapse, Microsoft Fabric, Azure Data Factory (ADF), Data Lakes, Data Warehousing, Databricks, Data Management, Google Sheets, Business Logic, Microsoft Excel, Warehouses, ELT, NoSQL, Docker, Fivetran, Generative Pre-trained Transformers (GPT), BI Reporting, Financial Planning & Analysis (FP&A), Alteryx, Azure Data Lake Analytics, Modeling, Analytics, Data Manipulation, Amazon Aurora, Amazon CloudWatch, Web Scraping, Orchestration, Data Visualization, Data, A/B Testing, Machine Learning Operations (MLOps), Artificial Intelligence (AI), Amazon Textract, Optical Character Recognition (OCR), Beautiful Soup, Selenium, Data Architecture, Database Design, Synapse, Architecture, Data Lineage, dbt Cloud, Reports, Database Architecture, Google Sheets API, AI Consulting, Consulting, PyTorch, Django

Data Architect

2020 - 2021
Mobit Brasil Ltda
  • Created big data infrastructure to process and store IoT data.
  • Designed data lake structures on-premise using Hadoop with HDFS for data storage and PySpark for data processing.
  • Managed teams in product development with a focus on innovation and operational cost reduction. I also coordinated the data science team, focusing on innovation and improvement of internal processes.
  • Deployed a distributed application with Cassandra, Redis, PostgreSQL, and load balancers. I also managed 20 workers processing information from 1,000 pieces of equipment, sending telemetry each minute.
  • Developed algorithms to analyze and manage contract loss through data.
  • Built a computer vision MLOps process with the Computer Vision Annotation Tool, MLflow, and Jupyter Notebook for analysis.
  • Generated summarized information using MapReduce to help managers during decision-making phases.
Technologies: Python, Pandas, Redis, Cassandra, Apache Kafka, Spark, Apache Airflow, OpenCV, Flask, Pytest, CI/CD Pipelines, Jira, Scikit-learn, PostgreSQL, Hadoop, HDFS, PySpark, SQL, Data Pipelines, Business Requirements, GitHub, Requirements Analysis, Data Build Tool (dbt), Data Modeling, ETL, Cloud, Amazon Web Services (AWS), Microsoft Power BI, Automation, Business Intelligence (BI), Data Analytics, Data Reporting, Databases, APIs, Azure, Azure Data Lake, Azure Synapse, Microsoft Fabric, Azure Data Factory (ADF), Data Lakes, Data Warehousing, Databricks, Data Management, Google Sheets, Business Logic, Microsoft Excel, Warehouses, ELT, Dask, NoSQL, Docker, Fivetran, Generative Pre-trained Transformers (GPT), BI Reporting, Azure Data Lake Analytics, Modeling, Analytics, Data Manipulation, Big Data Architecture, Web Scraping, Orchestration, Leadership, Data, A/B Testing, Blockchain, Kubeflow, Machine Learning Operations (MLOps), Artificial Intelligence (AI), Amazon Textract, Optical Character Recognition (OCR), Beautiful Soup, Selenium, Data Architecture, Database Design, Microsoft SQL Server, Architecture, Data Lineage, dbt Cloud, Reports, Database Architecture, AI Consulting, Edge AI, PyTorch, Internet of Things (IoT), Django

Senior Data Scientist

2019 - 2020
Instituto Atlântico
  • Created ML architecture for security systems and developed related end-to-end software applications.
  • Developed an AutoML structure to find the best model for the dataset.
  • Leveraged various technologies and frameworks, including Python, scikit-learn, Pandas, Git, Flask, NumPy, MySQL, PostgreSQL, Splunk, Sentry, and RabbitMQ.
  • Built an AutoML pipeline to split data, train classifiers, and optimize hyperparameters.
  • Designed applications with Python, using technologies such as Flask, Pandas, scikit-learn, and other data science libraries.
  • Employed solutions with CI/CD for deploys in Jenkins, working with cloud and other technologies.
  • Decreased data processing time by 50% using PySpark from the Hadoop ecosystem.
  • Implemented the Agile methodology and Scrum, converting ideas into strategic plans, delegating tasks to the team, mentoring, and providing feedback.
Technologies: Machine Learning, Data Engineering, Hadoop, PySpark, AutoML, Python, Scikit-learn, Pandas, Git, Flask, NumPy, MySQL, PostgreSQL, Sentry, RabbitMQ, CI/CD Pipelines, Jenkins, Platform as a Service (PaaS), SaaS, SQLite, Agile, Scrum, SQL, Business Requirements, Jira, GitHub, Requirements Analysis, Data Modeling, ETL, Cloud, Tableau, Apache Airflow, Amazon Web Services (AWS), Automation, Business Intelligence (BI), Data Analytics, Data Reporting, Databases, APIs, Azure Databricks, Databricks, Google Sheets, Business Logic, Microsoft Excel, Warehouses, ELT, NoSQL, Apache Kafka, Docker, BI Reporting, Modeling, Analytics, Data Manipulation, Big Data Architecture, Web Scraping, Orchestration, Data Visualization, Leadership, Data, Machine Learning Operations (MLOps), Artificial Intelligence (AI), Optical Character Recognition (OCR), Beautiful Soup, Selenium, Data Architecture, Database Design, Microsoft SQL Server, Architecture, Reports, Database Architecture, AI Consulting, Edge AI, PyTorch, Internet of Things (IoT), Django

Embedded Systems Designer

2017 - 2019
Mobit Brasil Ltda
  • Designed software architecture for computer vision solutions applied to intelligent transportation systems.
  • Created Python libraries for vehicle tracking using computer vision.
  • Optimized a deep learning net for character recognition and plate detection.
  • Deployed a solution on embedded systems with x86 processors running on Unix.
  • Leveraged various technologies and frameworks, including Python, scikit-learn, Git, NumPy, Linux, OpenCV, Vagrant, Keras, and TensorFlow.
  • Developed a distributed processing data application to gather information and images from intelligent transportation systems through ETL.
  • Implemented application tests for systems based on x86 processors running on Unix.
Technologies: Python, Pandas, AWS IoT, Redshift, Snowflake, Spark, OpenCV, Amazon S3 (AWS S3), PostgreSQL, Jupyter, Linux, NumPy, Git, C++, Deep Learning, Machine Learning, Vagrant, Keras, TensorFlow, ETL, SQL, Business Requirements, Jira, GitHub, Requirements Analysis, Cloud, Amazon Web Services (AWS), Automation, Data Analytics, Data Reporting, Databases, Azure, Azure Data Lake, Azure Synapse, Microsoft Fabric, Azure Data Factory (ADF), Google Sheets, Business Logic, Microsoft Excel, Kubeflow, Docker, Azure Data Lake Analytics, Analytics, Data Manipulation, Excel VBA, Stitch Data, Orchestration, Leadership, Data, Machine Learning Operations (MLOps), Artificial Intelligence (AI), Optical Character Recognition (OCR), AI Consulting, Edge AI, PyTorch, Internet of Things (IoT)

R&D Engineer

2014 - 2016
LESC – Computer Systems Engineering Laboratory
  • Created algorithms in Python for quality control in the factory line.
  • Developed a test plan to validate algorithms and build automated reports.
  • Leveraged various technologies and frameworks, including C, Git, OpenCV, Android, and MATLAB.
  • Implemented computer vision and ML algorithms for quality control and provided technical documentation in English and Portuguese.
  • Integrated C++ with a Java Android application using JNI and NDK.
  • Built a C application leveraging OpenCV and Python.
  • Ran application tests for systems based on ARM and x86 processors.
Technologies: C, Computer Vision, HDR Photography, Android, Python, Java, Git, OpenCV, MATLAB, C#, C++, ARM, Assembler x86, Linux, JNI, NDK, Business Requirements, PostgreSQL, Jira, GitHub, Requirements Analysis, Data Modeling, Automation, Data Analytics, Databases, Business Logic, Microsoft Excel, Data Manipulation, Excel VBA, Stitch Data, Data, Artificial Intelligence (AI), Optical Character Recognition (OCR), AI Consulting, Edge AI

R&D Researcher

2012 - 2013
Innovation Technology Lab
  • Developed algorithms for optical character recognition. I also created algorithms in Python, C, and C++ for embedded systems to run on an ARM processor.
  • Built a Linux interface to manage image acquisition.
  • Integrated C++ with a Java Android application using JNI and NDK.
  • Designed navigation systems for a mobile robot using mobile devices and gesture recognition. I leveraged computational intelligence and digital image processing.
  • Worked on an image processing project to recognize license plates using C, C++, and computational intelligence.
  • Handled modeling, design, and development tasks for an autonomous submersible vehicle that collects data on reservoir lake ecosystem characteristics.
Technologies: ARM, Electronics, PCB Layout, C, Git, OpenCV, MATLAB, Python, PostgreSQL, Jira, C#, GitHub, Automation, Data Analytics, Data Manipulation, Excel VBA, Data, Artificial Intelligence (AI), Optical Character Recognition (OCR), Edge AI

Experience

Computer Vision Algorithm for COVID-19 Detection

https://pypi.org/project/covid-vision/
During the COVID-19 pandemic in May 2020, I led a team of professional colleagues focused on developing computer vision algorithms to detect the virus in images from computed tomographies and X-rays.

3D AUTOCUT: A 3D Segmentation Algorithm Based on Cellular Automata

https://autocut3d.readthedocs.io/en/latest/
An improved 3D cellular automaton segmentation algorithm is proposed. The purpose is to create a tri-directional automaton in order to segment 3D volumes. The improvement is that a force is made in the RGB space to improve the result of the 3D algorithm. Experiments on several synthetic volumes have shown that the proposed method achieves better segmentation results and efficiency than the other methods.

BrazIlian Vehicle Identification Using a New Embedded Plate Recognition System

Expert parking lot access control systems are developed in vehicle management through tracking and number recognition. These systems use cameras to identify a vehicle through its license plates based on intelligent and optical character recognition techniques.

This paper presents a new system to detect and recognize Brazilian vehicle license plates, in which the registered users have permission to enter the location. For this, techniques of digital image processing were used, such as Hough transform, morphology, thresholding, and Canny edge detector to extract characters, as well as least squares, least mean squares, extreme learning machine, and neural network multilayer perceptron to identify the numbers and letters.

The system was tested with 700 videos with a resolution of 640×480 pixels and AVI format, granting access only when the plate was registered, getting a 98,5% success rate on the tested cases. The movement detection step is linked to the system, becoming faster and more accurate in real-time. Thus, it can be concluded that the proposed system is a promising tool with high potential that can be applied commercially.

Education

2015 - 2018

PhD in Data Science

Federal University of Ceara - Fortaleza, Brazil

2014 - 2015

Master's Degree in Data Science

Federal University of Ceara - Fortaleza, Brazil

2008 - 2013

Bachelor's Degree in Mechatronics Engineering

Federal Institute of Ceara - Fortaleza, Brazil

Certifications

JULY 2023 - PRESENT

Hands-on Essentials – Data Warehouse

Snowflake

JULY 2018 - PRESENT

Deep Learning Specialization

Coursera

APRIL 2018 - PRESENT

MTA: Introduction to Programming Using Python

Microsoft

MARCH 2018 - PRESENT

Data Science Foundations – Level 2 (V2)

IBM

MARCH 2018 - PRESENT

Data Science Foundations – Level 2

IBM

MARCH 2018 - PRESENT

Data Science Foundations – Level 1

IBM

MARCH 2018 - PRESENT

Python for Data Science

IBM

Skills

Libraries/APIs

PySpark, Beautiful Soup, PyTorch, Pandas, OpenCV, Scikit-learn, NumPy, Shopify API, Keras, TensorFlow, NDK, Dask, Google Sheets API

Tools

Jira, Microsoft Power BI, GitHub, Google Sheets, Microsoft Excel, dbt Cloud, Apache Airflow, Tableau, Amazon CloudWatch, Stitch Data, Amazon Textract, AWS Glue, Git, MATLAB, Pytest, AutoML, Sentry, RabbitMQ, Jenkins, AWS IAM, GitLab, Google Analytics, Jupyter, Vagrant, PCB Layout, Synapse

Languages

Python, SQL, Snowflake, C#, Processing, C++, C, Java, Assembler x86, Scala, R, Embedded C++, Excel VBA

Paradigms

ETL, Automation, Business Intelligence (BI), Requirements Analysis, Database Design, Agile, Scrum, REST, Database Development

Platforms

Amazon Web Services (AWS), Databricks, Docker, Azure, Azure Synapse, Microsoft Fabric, Kubeflow, Apache Kafka, SharePoint, AWS IoT, Linux, Android, Alteryx, Blockchain

Storage

PostgreSQL, Data Pipelines, Databases, Amazon S3 (AWS S3), Redshift, Data Lakes, NoSQL, Amazon Aurora, Microsoft SQL Server, Database Architecture, Redis, Cassandra, MySQL, SQLite, JSON, HDFS

Frameworks

Selenium, Django, Spark, Flask, Hadoop, JNI

Other

Data Engineering, Machine Learning, Data Science, Data Analysis, Data Build Tool (dbt), Data Warehousing, Business Requirements, Data Analytics, Azure Databricks, Business Logic, Warehouses, ELT, Fivetran, Analytics, Data Manipulation, Web Scraping, Orchestration, Data, Artificial Intelligence (AI), Optical Character Recognition (OCR), Reports, Consulting, Edge AI, APIs, Amazon EMR Studio, Data Modeling, Cloud, Data Reporting, Azure Data Lake, Azure Data Factory (ADF), Data Management, Generative Pre-trained Transformers (GPT), BI Reporting, Azure Data Lake Analytics, Modeling, Big Data Architecture, Data Visualization, Leadership, A/B Testing, Machine Learning Operations (MLOps), Data Architecture, Architecture, Data Lineage, AI Consulting, Internet of Things (IoT), Computer Vision, Engineering, Signals, CI/CD Pipelines, Platform as a Service (PaaS), SaaS, SFTP, Google Analytics 4, Deep Learning, HDR Photography, ARM, Electronics, Programming, Workbench, Statistics, Neural Networks, Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Sequence Models, Recurrent Neural Networks (RNNs), Mechatronics, Financial Planning & Analysis (FP&A)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring