Brenda Kao, Developer in Palo Alto, CA, United States
Brenda is available for hire
Hire Brenda

Brenda Kao

Verified Expert  in Engineering

Bio

Brenda is a leader in data engineering with experience working closely with data analytics, data science, and marketing teams on big data. She offers expertise in accessing and creating the best solutions and is passionate about managing and analyzing data and extracting value from data. She architected a new accounting data platform for DoorDash to facilitate immutable data sets. Brenda is eager for opportunities to work closely with business teams on turning data into actionable insights.

Portfolio

Houzz
AWS IoT, Hadoop, Apache Kafka, Flink, Apache Airflow, Luigi, Python, SQL...
DoorDash
AWS IoT, Snowflake, Python, Apache Hive, Chartio, Sigma.js, Tableau...
Ipsy
AWS IoT, Databricks, Apache Hive, Python, PostgreSQL, Amazon Athena, Redash...

Experience

  • SQL - 20 years
  • Hadoop - 13 years
  • Apache Hive - 10 years
  • Apache Kafka - 7 years
  • PySpark - 7 years
  • AWS IoT - 6 years
  • Databricks - 4 years
  • Snowflake - 2 years

Availability

Part-time

Preferred Environment

Databricks, Hadoop, Snowflake, Apache Hive, SQL, Apache Kafka, Python, Spark, ETL Tools, Amazon Web Services (AWS)

The most amazing...

...project I've contributed to is leading data engineering teams to build an enterprise data platform to serve consumable data to users.

Work Experience

Senior Data Engineering Manager

2022 - 2023
Houzz
  • Designed, built, and supported pipelines for ingesting click stream, event, log, and experiment data from a custom-built streaming system and curating datasets for downstream usage, including BI reporting, Blueshift, and product applications.
  • Built, designed, and supported the Salesforce data pipeline to ingest and curate data for data users. Created a framework to upload processed data back to Salesforce and Cassandra for product use.
  • Constructed revenue and finance pipelines to ingest NetSuite, Stripe, Intercom, and Zuora data and curated datasets for data users to consume.
  • Led the marketplace pipelines revamp project by designing and implementing the pipelines. Improved pipelines from 170+ datasets to less than 20 to deliver the same target datasets, reducing the run time from over five to two hours.
  • Managed the core pipelines revamp project by designing and implementing the pipelines to allow incremental loads and using indexes in the custom-built Sqoop framework.
  • Introduced a lookup table in the data platform by extracting the hard-coded values and descriptions from the product applications. Collaborated with the product engineering teams to co-share the lookup table to support the data echo system.
  • Led the proof of concept (POC) and stand-up data observability tool, Monte Carlo. Migrated existing monitors and notifications to Monte Carlo, integrating with Airflow and Spark for full lineage and task information.
  • Managed the POC project for AWS data platform infrastructure evaluation.
  • Spearheaded projects to migrate jobs from Luigi and the in-house built scheduler to Airflow. Created a handshake mechanism among all scheduling systems.
  • Defined and executed the data archive and housekeeping policy to conduct AWS cost savings.
Technologies: AWS IoT, Hadoop, Apache Kafka, Flink, Apache Airflow, Luigi, Python, SQL, PySpark, Apache Hive, Redash, Tableau, Jupyter Notebook, Monte Carlo, PagerDuty, GitHub, Jira, Slack, Amplitude, Data Engineering, API Integration, ETL, Data Analysis, Data Analytics, Software Development, Microsoft Excel, Data Visualization, Data Modeling, Data Warehousing, Data Pipelines, AWS CLI, ETL Development, ETL Implementation & Design, ETL Testing, Data Migration, Apache Spark, Amazon Web Services (AWS), ELT, NoSQL, Kubernetes, Business Intelligence (BI)

Senior Manager of Data Engineering and Business Intelligence

2021 - 2022
DoorDash
  • Interfaced with the controller and chief accounting officer to build out scalable data pipelines and automate our back-office operations.
  • Supported the accounting and tax teams for all data and technical needs, including month-end close process, automated and manual journal entries, account reconciliations, variance analysis, financial bug and fraud analysis, compliance, and audit.
  • Developed and maintained tools to create automated journal entries and continuously integrate changes for accounting impacts from new product releases and financial bug fixes.
  • Supported audit and compliance with the Sarbanes-Oxley Act (SOX), California Consumer Privacy Act (CCPA), and General Data Protection Regulation (GDPR). Developed a tool to audit data to external auditors, cutting new request lead time by two days.
  • Architected a new accounting data platform to create and facilitate immutable data sets to support accounting month-end close, reconciliation, flux analysis, anomaly monitoring, financial bug tracking, compliance, and audit.
  • Led a credit sub-ledger project to build a credit data mart with customers' credit issuance, redemption, and usage information as the source of truth of the credit sub-ledger.
  • Spearheaded the revenue sub-ledger project to build a revenue sub-ledger as the immutable source of truth of revenue activities to be used by the accounting book close.
  • Managed the tax data mart project to design and build a tax data mart as the source of truth of tax at item level and jurisdiction level details for orders, dashers income, merchants income, and sales tax.
  • Guided the LTC (Litecoin) to UTC (UltraCoin) conversion for 70+ Chartio accounting dashboards in support of the accounting book close and reporting, which allows for consistency in financial reporting based on a standardized timeframe.
  • Steered the Chartio to Sigma migration for 100+ accounting dashboards and worked with internal and external auditors on certifying the audit process of the migration and Sigma go-live.
Technologies: AWS IoT, Snowflake, Python, Apache Hive, Chartio, Sigma.js, Tableau, Apache Airflow, GitHub, Confluence, Slack, Data Engineering, API Integration, ETL, Data Build Tool (dbt), Data Analysis, Data Analytics, Software Development, Alteryx, Microsoft Excel, Data Visualization, Data Modeling, Data Warehousing, Data Pipelines, AWS CLI, SQL, ETL Development, ETL Implementation & Design, ETL Testing, Data Migration, Apache Spark, Amazon Web Services (AWS), ELT, NoSQL, Business Intelligence (BI)

Senior Data Engineering Manager

2019 - 2020
Ipsy
  • Architected and refactored data infrastructure and ETL pipelines.
  • Spearheaded the CCPA (California Consumer Privacy Act) project to support new compliance regulations in effect on January 1, 2020.
  • Managed the data engineering team to deliver a seamless solution to support application architecture refactoring by adopting microservices. Built new pipelines to support legacy and new data models.
  • Led the offshore machine learning platform team in Argentina. Designed and built initial MLOps infrastructure, automated machine learning training and execution processes, created a dashboard to manage models, and implemented API for model serving.
  • Carried out continuous development, support, and maintenance of the custom-built events system, user attributes system, experiments system, and customer beauty quiz system to support business operations and new product features.
  • Guided the CRM team to integrate the Iterable marketing platform and customer data platform (CDB) with the in-house events system, machine learning platform, and data lake.
  • Explored AWS and Databricks services cost savings opportunities and made adjustments accordingly.
  • Created data engineering roadmaps, led scrum teams, managed projects, and provided technical direction.
  • Conducted talent acquisition and employee development.
Technologies: AWS IoT, Databricks, Apache Hive, Python, PostgreSQL, Amazon Athena, Redash, Amazon Kinesis, Amazon Kinesis Data Firehose, Tableau, Segment, Data Engineering, API Integration, ETL, Data Analysis, Data Analytics, Software Development, Microsoft Excel, Data Visualization, Data Modeling, Data Warehousing, Data Pipelines, AWS CLI, SQL, ETL Development, ETL Implementation & Design, ETL Testing, Data Migration, Apache Spark, Amazon Web Services (AWS), ELT, NoSQL, Business Intelligence (BI), AWS Lambda

Data Engineering Manager

2014 - 2019
Auction.com
  • Designed, implemented, supported, and maintained ETL pipelines and data products to support commercial business platforms and residential business platform releases. Source from internal and external data. Built frameworks for configurable reuse.
  • Supported and maintained data product pipelines, which generated email recommendations and uploaded the data to the marketing automation engagement platform Marketo.
  • Designed, implemented, supported, and maintained ETL pipelines and dashboards for click-stream data such as Omniture (Adobe Analytics) and Google Analytics to ingest data to data platforms. Transformed and curated consumable data sets for data users.
  • Oversaw AWS migration project to migrate existing on-prem Hadoop and Linux applications, processes, and data to AWS cloud. Analyzed environment differences, created migration plans, and led the data engineering team through execution.
  • Led data engineering team to address the change impacts from a two-year microservices adoption in application architecture design project. Reduced the impacts by switching data sources behind the scenes and making it transparent to the users.
Technologies: Hadoop, AWS Glue, Amazon S3 (AWS S3), Amazon Athena, Apache Hive, Python, PySpark, Apache Kafka, HBase, SQL, ETL, ETL Development, ETL Implementation & Design, ETL Testing, Data Migration, Apache Spark, Amazon Web Services (AWS), ELT, NoSQL, Docker, Business Intelligence (BI), AWS Lambda, AWS Step Functions

Senior Development Manager - Data Engineering and Analytics

2010 - 2014
Intuit
  • Led TurboTax marketing analytics data mart project to ingest and derive TurboTax e-filing data for analytics, including architecting, designing, and implementing ETL framework and data warehouse schemas.
  • Oversaw Intuit Analytic Cloud project to ingest data from sources owned by business units using Netezza, Hadoop, Hive, Pig, Python, and Informatica Big Data edition.
  • Led metadata management system project to document Intuit data and enable access. Partnered with business units to analyze data streams, load metadata, and create data lineages. Teamed with legal and data governance to leverage usage for data security.
  • Refactored ETL frameworks to metadata-driven architecture. Implemented daily process monitors and alerts, created audit checks, and generated daily process reports.
Technologies: Python, Shell Scripting, SQL, Netezza, Hadoop, Apache Hive, Apache Pig, Informatica, ETL, ETL Development, ETL Implementation & Design, ETL Testing, Data Migration, ELT, Business Intelligence (BI), Intuit TurboTax

VP of Software Engineering

2009 - 2010
Acellent Technology, Inc.
  • Led implementation of a family of Structure Health Monitoring software. The software processed the collected sensor signals and used the company-owned patented algorithms to calculate and monitor structure health.
  • Designed and developed a listener app for a passive detective system.
  • Led software implementation for the Lockheed Reasoner project. Developed structure monitoring system in C++. Integrated distributed MATLAB program with the C++-based system. Tested with the hardware system and sensor-mounted coupon.
  • Set up a QA environment for software, hardware, and sensor integration tests. Defined QA procedures and set up a bug-tracking system.
  • Led ISO 9000, ITIL, and SDLC introduction, implementation, and certifications.
Technologies: Python, Java, C++, Shell Scripting, Data Migration

Consultant

2008 - 2009
Lockheed Martin
  • Worked on various projects to develop ETL interfaces using Informatica to load data to data warehouse fact tables and dimension tables. Applied sources and targets, including Oracle, Microsoft SQL Server, flat file, Hyperion, and SAP.
  • Designed architecture and conducted data analysis, profiling, and quality assurance. Designed and implemented reusable modules to share with the team. Partnered with users to create interface control documents and generated test cases/system test documents.
  • Migrated ETL interfaces from PowerCenter v7 to v8. Conducted code reviews and created extensive documentation of standards, best practices, and ETL procedures.
  • Developed pmcmd command wrapper to ensure flexibility in supporting two domains on one machine. Defined Control-M run books for job scheduling; planned and performed production deployment.
  • Worked on continuous improvement by evaluating and merging similar scripts into one generic and dynamic script in each category.
Technologies: Shell Scripting, Informatica, Oracle, Microsoft SQL Server, Hyperion, SAP, SQL, ETL, ETL Development, ETL Implementation & Design, ETL Testing, Data Migration, ELT, Business Intelligence (BI)

AVP, Data Management

2004 - 2007
NOMURA
  • Led architecture design and built enterprise Informatica infrastructure in five environments; provided user/group/folder maintenance, code migration, repository backup and restore, upgrades, patches, performance tuning, and disaster recovery.
  • Developed UNIX and Perl scripts to automate repository backup, file purge, server monitors, lookup cache directory monitor, and user account manipulation in order to generate SOX-compliant logs.
  • Designed and developed ETL interfaces to load data from PeopleSoft Trade Settlement systems to data warehouses.
  • Designed and developed applications/workflows for the help desk, change management, approval process, custom templates, survey, and configuration management database. Leveraged ITIL concepts for users to streamline IT management and service desks.
  • Integrated Remedy ITSM with BMC Patrol, AlarmPoint, LDAP, Exchange Server, eTrust, and HR applications based on web services, filter API, Java, and Python scripting. Developed BI reports using Remedy reporting tools and Business Objects.
Technologies: Informatica, BMC Remedy, Shell Scripting, Python, APIs, Java, Perl, SQL, ETL, ETL Development, ETL Implementation & Design, ETL Testing, T-SQL (Transact-SQL), Data Migration, ELT, Business Intelligence (BI)

Technical Lead

2000 - 2004
Brown Brothers Harriman
  • Delivered a web application called stochastic asset liability strategy analysis (SALSA), which was developed to meet the investment demands of insurance companies and other institutional clients.
  • Architected, designed, and implemented a web application called Know Your Client (KYC) to automate electronic approvals on new accounts in order to comply with PATRIOT ACT certifications.
  • Implemented an application to generate the KYC document in PDF from the web application. Created a batch process to post KYC PDF files onto Mobius, which was a write-once permanent document retention system on the mainframe.
  • Designed and developed a process to automate trade settlements by extracting data from the Portia investment operations platform and joining it with clearing information. Upon approval from the web client, it was sent to the custody/bids system.
  • Developed scripts to monitor scheduled processes and to detect FTP errors with email alert capability. Designed and developed a Java program to control users' access to a web application hosted by a vendor.
Technologies: SQL, Java, Shell Scripting, JavaScript, HTML, SAS, Oracle, T-SQL (Transact-SQL), Data Migration, Business Intelligence (BI)

Migrate Data Platform from On-prem to AWS

Led an AWS migration project to migrate existing applications, processes, and data to AWS cloud. Analyzed environment differences, created migration plans, and led the data engineering team through execution.

Intelligent Support Platform for SHM System

A POC on a prognostic system that integrates data from a structural health monitoring (SHM) diagnostic system, maintenance manuals, and service center operations.

I evaluated the knowledge base of AI models and architected an Intelligent support platform for the SHM system. I also contributed to the performance improvement of the knowledge base.

Libraries/APIs

PySpark, Luigi, Sigma.js

Tools

Apache Airflow, Microsoft Excel, AWS CLI, AWS Glue, Flink, Redash, Tableau, GitHub, Jira, Slack, Chartio, Confluence, Amazon Athena, Amazon Kinesis Data Firehose, Apache Sqoop, Rundeck, Bitbucket, Hyperion, BMC Remedy, AWS Step Functions, Intuit TurboTax, AI Prompts

Languages

SQL, Snowflake, Python, T-SQL (Transact-SQL), Java, C++, Perl, JavaScript, HTML, SAS

Frameworks

Hadoop, Apache Spark, Spark

Paradigms

ETL, ETL Implementation & Design, Business Intelligence (BI)

Storage

Apache Hive, Data Pipelines, NoSQL, Amazon S3 (AWS S3), PostgreSQL, HBase, MySQL, Netezza, Microsoft SQL Server

Platforms

AWS IoT, Databricks, Apache Kafka, Amazon Web Services (AWS), Jupyter Notebook, PagerDuty, Apache Pig, Alteryx, Oracle, Kubernetes, Docker, AWS Lambda, Linux

Other

Data Engineering, Real Estate, Data Analysis, Software Development, Data Modeling, Data Warehousing, ETL Development, ETL Testing, Data Migration, ELT, API Integration, Data Analytics, Data Visualization, Monte Carlo, Amplitude, Amazon Kinesis, Segment, Data Build Tool (dbt), ETL Tools, Shell Scripting, Informatica, SAP, APIs, AI Model Training, AI Model Intergration, Command Prompt (CMD)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring