Pablo Estrada, Software Developer in Seattle, WA, United States
Pablo Estrada

Software Developer in Seattle, WA, United States

Member since June 2, 2021
Pablo is a software engineer with experience with many big data technologies, primarily Apache Beam, but also BigQuery, Kafka, Flink and Spark, PubSub, and Kinesis. He worked as an open-source developer on Apache Beam and thus comprehends it deeply. He helped many Beam community members onboard and written various integrations. Pablo believes good software starts from good testing and good observability. If we build systems starting from that, we can get more reliable results.
Pablo is now available for hire


  • Google
    Python, Java, SQL, BigQuery, Cloud Dataflow, Apache Beam, Apache Flink...
  • Oracle
    C, Java, Python



Seattle, WA, United States



Preferred Environment

Linux, Ubuntu, IntelliJ, GitHub

The most amazing...

...project I've been a part of is Apache Beam. I started working on it in its early days, as it was growing and capturing an audience.


  • Software Engineer

    2016 - PRESENT
    • Developed Batch and Streaming IO connectors in Java and Python for various systems like BigQuery, distributed file systems, Debezium, and JDBC, including ensuring exactly-once guarantees, scalability, debugging, profiling, and improving performance.
    • Developed the metrics collection system for the Python SDK, including runtime, data size, and custom metrics.
    • Built template pipelines for general use cases, such as database migration, replication, and CDC.
    • Worked on a local runner for streaming pipelines that can manage multiple language runtimes and speed up local development.
    • Educated customers and partners in online and in-person meetings, helping debug pipelines and providing guidance on implementation.
    Technologies: Python, Java, SQL, BigQuery, Cloud Dataflow, Apache Beam, Apache Flink, Apache Kafka, Spark, NoSQL, Google Data Studio, Google Cloud Platform (GCP), Apache Spark, MongoDB
  • Software Developer

    2011 - 2013
    • Inherited and stabilized a codebase in time for the release of the new Oracle version.
    • Added new job types for the Oracle Scheduler using shell scripts and PL SQL.
    • Supported four team members to onboard onto the project.
    Technologies: C, Java, Python


  • Apache Beam

    As a core member of the Apache Beam community, I've worked all over the stack, including IO connectors for different systems, internal and external monitoring, and usability improvements.

    All my code changes can be found on the following link:

    My favorite code changes:

  • A Solution for Continuous CDC to BigQuery

    Developed a sample solution that lets you ingest a stream of changed data from any MySQL database on version 5.6 and above (self-managed and on-prem) and sync it to a dataset in BigQuery with low latency.


  • Languages

    Python, Java, SQL, C, JavaScript
  • Tools

    BigQuery, Cloud Dataflow, Apache Beam, IntelliJ, GitHub
  • Paradigms

    ETL, Parallel Programming
  • Platforms

    Google Cloud Platform (GCP), Apache Flink, Apache Kafka, Linux, Ubuntu
  • Storage

    Databases, JSON, NoSQL, MongoDB
  • Other

    Data Engineering, Data, CSV, Google BigQuery, Ray, Data Visualization, Data Analyst, Google Data Studio, Debezium
  • Frameworks

    Hadoop, Apache Spark, Spark
  • Libraries/APIs



  • Master's Degree in Computer Science
    2013 - 2016
    Seoul National University - Seoul, South Korea
  • Bachelor's Degree in Computer Engineering
    2006 - 2010
    National University of Mexico (UNAM) - Mexico City, Mexico

To view more profiles

Join Toptal
Share it with others