Pablo Estrada
Verified Expert in Engineering
Software Engineer and Developer
Seattle, WA, United States
Toptal member since June 2, 2021
Pablo is a software engineer with experience with many big data technologies, primarily Apache Beam, but also BigQuery, Kafka, Flink and Spark, PubSub, and Kinesis. He worked as an open-source developer on Apache Beam and thus comprehends it deeply. He helped many Beam community members onboard and written various integrations. Pablo believes good software starts from good testing and good observability. If we build systems starting from that, we can get more reliable results.
Portfolio
Experience
Availability
Preferred Environment
Linux, Ubuntu, IntelliJ IDEA, GitHub
The most amazing...
...project I've been a part of is Apache Beam. I started working on it in its early days, as it was growing and capturing an audience.
Work Experience
Software Engineer
- Developed Batch and Streaming IO connectors in Java and Python for various systems like BigQuery, distributed file systems, Debezium, and JDBC, including ensuring exactly-once guarantees, scalability, debugging, profiling, and improving performance.
- Developed the metrics collection system for the Python SDK, including runtime, data size, and custom metrics.
- Built template pipelines for general use cases, such as database migration, replication, and CDC.
- Worked on a local runner for streaming pipelines that can manage multiple language runtimes and speed up local development.
- Educated customers and partners in online and in-person meetings, helping debug pipelines and providing guidance on implementation.
Software Developer
Oracle
- Inherited and stabilized a codebase in time for the release of the new Oracle version.
- Added new job types for the Oracle Scheduler using shell scripts and PL SQL.
- Supported four team members to onboard onto the project.
Experience
Apache Beam
https://github.com/apache/beamAll my code changes can be found on the following link: https://github.com/apache/beam/pulls?q=is%3Apr+author%3Apabloem+sort%3Acreated-desc.
My favorite code changes:
- https://github.com/apache/beam/pull/7655
- https://github.com/apache/beam/pull/7677
- https://github.com/apache/beam/pull/4387
- https://github.com/apache/beam/pull/8394
A Solution for Continuous CDC to BigQuery
https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/master/v2/cdc-parentFramework to Assert for Airflow DAG Invariants
https://pypi.org/project/dagcheck/Dagcheck was created so that Airflow users could write tests for their DAGs with these characteristics:
• They are easy to read through and understand;
• They do not orchestrate real infrastructure changes;
• They run on a local development environment;
• They run quickly as part of a developer's flow;
• They can be run in CI/CD and catch issues in the future.
Education
Master's Degree in Computer Science
Seoul National University - Seoul, South Korea
Bachelor's Degree in Computer Engineering
National University of Mexico (UNAM) - Mexico City, Mexico
Skills
Libraries/APIs
REST APIs
Tools
BigQuery, Cloud Dataflow, Apache Beam, Apache Airflow, IntelliJ IDEA, GitHub
Languages
Python, Java, SQL, C, JavaScript
Frameworks
Ray, Spark, Hadoop, Apache Spark
Paradigms
ETL, Parallel Programming
Platforms
Google Cloud Platform (GCP), Apache Flink, Apache Kafka, Linux, Ubuntu, Debezium
Storage
Databases, JSON, NoSQL, MongoDB
Other
Data Engineering, Data, CSV, Google BigQuery, Data Visualization, Data Analysis, Google Data Studio
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring