Aldo Orozco
Verified Expert in Engineering
Data Engineer and Software Developer
Aldo has over ten years of experience as a software engineer, five of which he has spent focused on data solutions. His broad expertise in embedded Linux systems, cloud infrastructure, site reliability engineering (SRE), and high-performant data architectures gives him the upper hand in dealing with complex systems. Throughout his career, Aldo has assumed multiple roles, including the ones of a developer, consultant, architect, and lead.
Portfolio
Experience
Availability
Preferred Environment
Apache Airflow, Spark, Python, Terraform, Kubernetes, Google Cloud Platform (GCP), Amazon Web Services (AWS), Helm, Data Warehousing, Big Data Architecture
The most amazing...
...things I've done include huge Apache Airflow re-platforming on Kubernetes and Spark pipeline optimizations, reducing execution by 90%.
Work Experience
Senior Data Engineer II
Etsy
- Led an Airflow migration from 1.10 on VMs to Airflow 2 on Kubernetes while upgrading and validating thousands of legacy-directed acyclic graphs (DAGs). Reduced deployment times from hours to about a minute with zero downtime.
- Rearchitected a SQL parsing service and introduced multiprocessing, effectively reducing the processing time from 30 to five minutes.
- Implemented skew inference and input/output collection services on Spark jobs executed across the company, allowing users to optimize their pipelines.
- Implemented several RESTful microservices on Python to manage ad-hoc testing of the Airflow environment.
Staff Data Engineer
Wizeline
- Drove an internal program to transition software engineers to data. Eighteen graduates were successfully assigned to long-term projects, increasing profit by making engineers billable and filling in project gaps.
- Counseled solution architects in new customer interactions to offer performant data architectures, acquiring new deals and better resource utilization since data architectures and expectations were more realistic.
- Architected an IoT cloud ingestion and monitoring solution for several million devices in collaboration with the SRE team. The project successfully kicked off, and a dozen engineers were assigned to it.
- Collaborated on a change data capture (CDC) pipeline with Delta Lake, Kafka, and Spark, which ingested financial data from other teams and third-party platforms and aggregated information in a data lake ultimately consumed by data scientists.
- Led a data community with over 20 members for over a year. The community was an educational platform where members presented relevant topics to improve the tool's efficiency, deal with real-world errors, and explore data architecture trends.
- Coordinated several mentorship programs aimed at training software engineers in data engineering, thus reducing the demand for the discipline in the Americas.
AWS Big Data Architect
Triolabs
- Implemented a service to ingest and prune brain imaging data from private laboratories in near real time using Apache Kafka and Python on AWS. The resulting data was consumed by a machine learning model in R to extract insights from the scans.
- Rearchitected a data warehouse data model to handle TB scale queries faster and speed up a drug research analysis.
- Coordinated a team of two to create aggregation pipelines on Apache Spark on the brain imaging results so that the researchers could fine-tune drug research.
Big Data Engineer
Apex Systems
- Developed over ten Spark pipelines to aggregate and store TB of data to Hive, Elasticsearch, and MongoDB for the marketing team. These aggregations were exposed via APIs, enabling the analytics team to generate campaigns to attract new users.
- Created a library for Spark jobs to efficiently enrich data sets with location data from Google Maps APIs, thus allowing for better-targeted marketing campaigns.
- Optimized Spark settings of several production workloads. Reduced cost and execution time of overnight execution to a third, minimizing report delivery delays to senior management.
Embedded Software Engineer
Continental Automotive Systems
- Developed a data pipeline in Hadoop MapReduce to aggregate historical cellular network data, helping to reduce errors in a middleware service by half.
- Architected and led the development of a service to automatically re-configure a car while driving and seamlessly connect back and forth between cellular stations.
- Assisted in containerizing the development environment of people in my area, effectively helping reduce pain and setting up a stable environment.
- Gave a series of training sessions for 30 developers on unit tests and coverage using a proprietary tool. Helped remove dozens of unreachable code snippets.
Experience
Adaptive Big Data Pipelines
https://github.com/aldoorozco/adaptive_data_pipelinesMarketing Recommendation System
https://www.vrbo.com/es-mx/Brain Imaging Prediction
https://neumoratx.com/Education
Master's Degree in Computer Science
ITESO, Jesuit University of Guadalajara - Guadalajara, Mexico
Bachelor's Degree in Mechatronics Engineering
Centro de Enseñanza Tecnica Industrial - Guadalajara, Mexico
Certifications
GCP Professional Data Engineer
Google Cloud
Skills
Languages
Python, Bash, SQL, Python 3, C++, Java, C, Scala, Snowflake, R, PHP
Frameworks
Spark, Apache Spark, Hadoop, Spring, Flask
Libraries/APIs
PySpark, Google Maps API (GeoJSON)
Tools
Apache Airflow, Git, BigQuery, Terraform, Google Compute Engine (GCE), Amazon Athena, AWS Glue, Helm, Qubole, Docker Compose, Jenkins, Grafana, AWS Batch
Paradigms
ETL, Microservices
Platforms
Linux, Docker, Kubernetes, Google Cloud Platform (GCP), Amazon Web Services (AWS), Apache Kafka, Databricks, Amazon EC2, Buildkite, Azure, AWS Lambda
Storage
Data Pipelines, Amazon S3 (AWS S3), Databases, Data Lakes, PostgreSQL, Google Cloud Storage, Google Cloud SQL, Apache Hive, MongoDB, Elasticsearch, Google Bigtable, Redshift
Other
Data Warehousing, Big Data Architecture, Software, Pipelines, Data Engineering, Google BigQuery, Data Architecture, Scaling, Big Data, Complex Problem Solving, Teamwork, Data Warehouse Design, Google Cloud Build, ELT, Amazon RDS, Data Analysis, Machine Learning, CI/CD Pipelines, Streaming Data, Prometheus, StatsD, Google Cloud Functions, Amazon Kinesis, CDC, Amazon API Gateway, Data Quality, Azure Data Factory
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring