Pablo Estrada, Developer in Seattle, WA, United States
Pablo is available for hire
Hire Pablo

Pablo Estrada

Verified Expert  in Engineering

Software Engineer and Developer

Location
Seattle, WA, United States
Toptal Member Since
June 2, 2021

Pablo is a software engineer with experience with many big data technologies, primarily Apache Beam, but also BigQuery, Kafka, Flink and Spark, PubSub, and Kinesis. He worked as an open-source developer on Apache Beam and thus comprehends it deeply. He helped many Beam community members onboard and written various integrations. Pablo believes good software starts from good testing and good observability. If we build systems starting from that, we can get more reliable results.

Availability

Part-time

Preferred Environment

Linux, Ubuntu, IntelliJ IDEA, GitHub

The most amazing...

...project I've been a part of is Apache Beam. I started working on it in its early days, as it was growing and capturing an audience.

Work Experience

Software Engineer

2016 - PRESENT
Google
  • Developed Batch and Streaming IO connectors in Java and Python for various systems like BigQuery, distributed file systems, Debezium, and JDBC, including ensuring exactly-once guarantees, scalability, debugging, profiling, and improving performance.
  • Developed the metrics collection system for the Python SDK, including runtime, data size, and custom metrics.
  • Built template pipelines for general use cases, such as database migration, replication, and CDC.
  • Worked on a local runner for streaming pipelines that can manage multiple language runtimes and speed up local development.
  • Educated customers and partners in online and in-person meetings, helping debug pipelines and providing guidance on implementation.
Technologies: Python, Java, SQL, BigQuery, Cloud Dataflow, Apache Beam, Apache Flink, Apache Kafka, Spark, NoSQL, Google Data Studio, Google Cloud Platform (GCP), Apache Spark, MongoDB

Software Developer

2011 - 2013
Oracle
  • Inherited and stabilized a codebase in time for the release of the new Oracle version.
  • Added new job types for the Oracle Scheduler using shell scripts and PL SQL.
  • Supported four team members to onboard onto the project.
Technologies: C, Java, Python

Apache Beam

https://github.com/apache/beam
As a core member of the Apache Beam community, I worked all over the stack, including IO connectors for different systems, internal and external monitoring, and usability improvements.

All my code changes can be found on the following link: https://github.com/apache/beam/pulls?q=is%3Apr+author%3Apabloem+sort%3Acreated-desc.

My favorite code changes:
- https://github.com/apache/beam/pull/7655
- https://github.com/apache/beam/pull/7677
- https://github.com/apache/beam/pull/4387
- https://github.com/apache/beam/pull/8394

A Solution for Continuous CDC to BigQuery

https://github.com/GoogleCloudPlatform/DataflowTemplates/tree/master/v2/cdc-parent
Developed a sample solution that lets you ingest a stream of changed data from any MySQL database on version 5.6 and above (self-managed and on-prem) and sync it to a dataset in BigQuery with low latency.

Framework to Assert for Airflow DAG Invariants

https://pypi.org/project/dagcheck/
Dagcheck is a framework to assert DAG invariants. Users of dagcheck can define DAG invariants to test via assertions, and dagcheck will generate DAG-run scenarios that verify these invariants.

Dagcheck was created so that Airflow users could write tests for their DAGs with these characteristics:

• They are easy to read through and understand;
• They do not orchestrate real infrastructure changes;
• They run on a local development environment;
• They run quickly as part of a developer's flow;
• They can be run in CI/CD and catch issues in the future.

Languages

Python, Java, SQL, C, JavaScript

Tools

BigQuery, Cloud Dataflow, Apache Beam, Apache Airflow, IntelliJ IDEA, GitHub

Paradigms

ETL, Parallel Programming

Platforms

Google Cloud Platform (GCP), Apache Flink, Apache Kafka, Linux, Ubuntu

Storage

Databases, JSON, NoSQL, MongoDB

Other

Data Engineering, Data, CSV, Google BigQuery, Ray, Data Visualization, Data Analysis, Google Data Studio, Debezium

Frameworks

Spark, Hadoop, Apache Spark

Libraries/APIs

REST APIs

2013 - 2016

Master's Degree in Computer Science

Seoul National University - Seoul, South Korea

2006 - 2010

Bachelor's Degree in Computer Engineering

National University of Mexico (UNAM) - Mexico City, Mexico

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring