
Edson Cavalcanti Neto
Verified Expert in Engineering
Data Specialist and Developer
Fortaleza - State of Ceará, Brazil
Toptal member since January 15, 2024
Edson is a data specialist with 12+ years of experience building and managing complex data pipelines and architectures. He's an expert in integrating diverse technologies, including Python, AWS Glue, Apache Airflow, Data Build Tool (dbt), Apache Kafka, Spark, and Docker for scalable data solutions. Edson has a strong track record of delivering cost-effective and timely solutions, enhancing data-driven decision-making, and leading teams to innovate and optimize operations.
Portfolio
Experience
- Python - 12 years
- Data Science - 10 years
- Git - 10 years
- SQL - 8 years
- ETL - 6 years
- Data Engineering - 5 years
- AWS Glue - 3 years
- PySpark - 2 years
Availability
Preferred Environment
Python, PySpark, AWS Glue, SQL, Apache Airflow, ETL, Data Engineering, Git, Amazon Web Services (AWS), Amazon S3 (AWS S3)
The most amazing...
...system I've developed is a monitor for equipment in Brazilian states that decreases the average incident detection time from five days to three hours.
Work Experience
Data Engineer
Cavalcanti Inovações Tecnológicas
- Created a jobs pipeline for data ingestion and data cleaning.
- Developed jobs to access sources, including APIs, SharePoint, Amazon S3 (AWS S3), SFTP, and Google Drive, to ingest data to the data lake.
- Leveraged AWS Glue and Apache Airflow to run scripts to export the data from Amazon S3 to Snowflake.
- Performed data transformation and load to Snowflake using dbt.
- Built a CI/CD pipeline for automatic dashboard deployment and update.
- Implemented Redshift tables and views for the data analysis team.
- Actively collaborated with software engineers and data analysts.
- Provided Docker infrastructure to run data pipelines, including ETL solutions, data transformation, cleaning, and model training.
Data Architect
Mobit Brasil Ltda
- Created big data infrastructure to process and store IoT data.
- Designed data lake structures on-premise using Hadoop with HDFS for data storage and PySpark for data processing.
- Managed teams in product development with a focus on innovation and operational cost reduction. I also coordinated the data science team, focusing on innovation and improvement of internal processes.
- Deployed a distributed application with Cassandra, Redis, PostgreSQL, and load balancers. I also managed 20 workers processing information from 1,000 pieces of equipment, sending telemetry each minute.
- Developed algorithms to analyze and manage contract loss through data.
- Built a computer vision MLOps process with the Computer Vision Annotation Tool, MLflow, and Jupyter Notebook for analysis.
- Generated summarized information using MapReduce to help managers during decision-making phases.
Senior Data Scientist
Instituto Atlântico
- Created ML architecture for security systems and developed related end-to-end software applications.
- Developed an AutoML structure to find the best model for the dataset.
- Leveraged various technologies and frameworks, including Python, scikit-learn, Pandas, Git, Flask, NumPy, MySQL, PostgreSQL, Splunk, Sentry, and RabbitMQ.
- Built an AutoML pipeline to split data, train classifiers, and optimize hyperparameters.
- Designed applications with Python, using technologies such as Flask, Pandas, scikit-learn, and other data science libraries.
- Employed solutions with CI/CD for deploys in Jenkins, working with cloud and other technologies.
- Decreased data processing time by 50% using PySpark from the Hadoop ecosystem.
- Implemented the Agile methodology and Scrum, converting ideas into strategic plans, delegating tasks to the team, mentoring, and providing feedback.
Embedded Systems Designer
Mobit Brasil Ltda
- Designed software architecture for computer vision solutions applied to intelligent transportation systems.
- Created Python libraries for vehicle tracking using computer vision.
- Optimized a deep learning net for character recognition and plate detection.
- Deployed a solution on embedded systems with x86 processors running on Unix.
- Leveraged various technologies and frameworks, including Python, scikit-learn, Git, NumPy, Linux, OpenCV, Vagrant, Keras, and TensorFlow.
- Developed a distributed processing data application to gather information and images from intelligent transportation systems through ETL.
- Implemented application tests for systems based on x86 processors running on Unix.
R&D Engineer
LESC – Computer Systems Engineering Laboratory
- Created algorithms in Python for quality control in the factory line.
- Developed a test plan to validate algorithms and build automated reports.
- Leveraged various technologies and frameworks, including C, Git, OpenCV, Android, and MATLAB.
- Implemented computer vision and ML algorithms for quality control and provided technical documentation in English and Portuguese.
- Integrated C++ with a Java Android application using JNI and NDK.
- Built a C application leveraging OpenCV and Python.
- Ran application tests for systems based on ARM and x86 processors.
R&D Researcher
Innovation Technology Lab
- Developed algorithms for optical character recognition. I also created algorithms in Python, C, and C++ for embedded systems to run on an ARM processor.
- Built a Linux interface to manage image acquisition.
- Integrated C++ with a Java Android application using JNI and NDK.
- Designed navigation systems for a mobile robot using mobile devices and gesture recognition. I leveraged computational intelligence and digital image processing.
- Worked on an image processing project to recognize license plates using C, C++, and computational intelligence.
- Handled modeling, design, and development tasks for an autonomous submersible vehicle that collects data on reservoir lake ecosystem characteristics.
Experience
Computer Vision Algorithm for COVID-19 Detection
https://pypi.org/project/covid-vision/3D AUTOCUT: A 3D Segmentation Algorithm Based on Cellular Automata
https://autocut3d.readthedocs.io/en/latest/BrazIlian Vehicle Identification Using a New Embedded Plate Recognition System
This paper presents a new system to detect and recognize Brazilian vehicle license plates, in which the registered users have permission to enter the location. For this, techniques of digital image processing were used, such as Hough transform, morphology, thresholding, and Canny edge detector to extract characters, as well as least squares, least mean squares, extreme learning machine, and neural network multilayer perceptron to identify the numbers and letters.
The system was tested with 700 videos with a resolution of 640×480 pixels and AVI format, granting access only when the plate was registered, getting a 98,5% success rate on the tested cases. The movement detection step is linked to the system, becoming faster and more accurate in real-time. Thus, it can be concluded that the proposed system is a promising tool with high potential that can be applied commercially.
Education
PhD in Data Science
Federal University of Ceara - Fortaleza, Brazil
Master's Degree in Data Science
Federal University of Ceara - Fortaleza, Brazil
Bachelor's Degree in Mechatronics Engineering
Federal Institute of Ceara - Fortaleza, Brazil
Certifications
Hands-on Essentials – Data Warehouse
Snowflake
Deep Learning Specialization
Coursera
MTA: Introduction to Programming Using Python
Microsoft
Data Science Foundations – Level 2 (V2)
IBM
Data Science Foundations – Level 2
IBM
Data Science Foundations – Level 1
IBM
Python for Data Science
IBM
Skills
Libraries/APIs
PySpark, Beautiful Soup, PyTorch, Pandas, OpenCV, Scikit-learn, NumPy, Shopify API, Keras, TensorFlow, NDK, Dask, Google Sheets API
Tools
Jira, Microsoft Power BI, GitHub, Google Sheets, Microsoft Excel, dbt Cloud, Apache Airflow, Tableau, Amazon CloudWatch, Stitch Data, Amazon Textract, AWS Glue, Git, MATLAB, Pytest, AutoML, Sentry, RabbitMQ, Jenkins, AWS IAM, GitLab, Google Analytics, Jupyter, Vagrant, PCB Layout, Synapse
Languages
Python, SQL, Snowflake, C#, Processing, C++, C, Java, Assembler x86, Scala, R, Embedded C++, Excel VBA
Paradigms
ETL, Automation, Business Intelligence (BI), Requirements Analysis, Database Design, Agile, Scrum, REST, Database Development
Platforms
Amazon Web Services (AWS), Databricks, Docker, Azure, Azure Synapse, Microsoft Fabric, Kubeflow, Apache Kafka, SharePoint, AWS IoT, Linux, Android, Alteryx, Blockchain
Storage
PostgreSQL, Data Pipelines, Databases, Amazon S3 (AWS S3), Redshift, Data Lakes, NoSQL, Amazon Aurora, Microsoft SQL Server, Database Architecture, Redis, Cassandra, MySQL, SQLite, JSON, HDFS
Frameworks
Selenium, Django, Spark, Flask, Hadoop, JNI
Other
Data Engineering, Machine Learning, Data Science, Data Analysis, Data Build Tool (dbt), Data Warehousing, Business Requirements, Data Analytics, Azure Databricks, Business Logic, Warehouses, ELT, Fivetran, Analytics, Data Manipulation, Web Scraping, Orchestration, Data, Artificial Intelligence (AI), Optical Character Recognition (OCR), Reports, Consulting, Edge AI, APIs, Amazon EMR Studio, Data Modeling, Cloud, Data Reporting, Azure Data Lake, Azure Data Factory (ADF), Data Management, Generative Pre-trained Transformers (GPT), BI Reporting, Azure Data Lake Analytics, Modeling, Big Data Architecture, Data Visualization, Leadership, A/B Testing, Machine Learning Operations (MLOps), Data Architecture, Architecture, Data Lineage, AI Consulting, Internet of Things (IoT), Computer Vision, Engineering, Signals, CI/CD Pipelines, Platform as a Service (PaaS), SaaS, SFTP, Google Analytics 4, Deep Learning, HDR Photography, ARM, Electronics, Programming, Workbench, Statistics, Neural Networks, Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Sequence Models, Recurrent Neural Networks (RNNs), Mechatronics, Financial Planning & Analysis (FP&A)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring