Brenda Kao
Verified Expert in Engineering
Data Engineer and Developer
Palo Alto, CA, United States
Toptal member since November 21, 2023
Brenda is a leader in data engineering with experience working closely with data analytics, data science, and marketing teams on big data. She offers expertise in accessing and creating the best solutions and is passionate about managing and analyzing data and extracting value from data. She architected a new accounting data platform for DoorDash to facilitate immutable data sets. Brenda is eager for opportunities to work closely with business teams on turning data into actionable insights.
Portfolio
Experience
- SQL - 20 years
- Hadoop - 13 years
- Apache Hive - 10 years
- Apache Kafka - 7 years
- PySpark - 7 years
- AWS IoT - 6 years
- Databricks - 4 years
- Snowflake - 2 years
Availability
Preferred Environment
Databricks, Hadoop, Snowflake, Apache Hive, SQL, Apache Kafka, Python, Spark, ETL Tools, Amazon Web Services (AWS)
The most amazing...
...project I've contributed to is leading data engineering teams to build an enterprise data platform to serve consumable data to users.
Work Experience
Senior Data Engineering Manager
Houzz
- Designed, built, and supported pipelines for ingesting click stream, event, log, and experiment data from a custom-built streaming system and curating datasets for downstream usage, including BI reporting, Blueshift, and product applications.
- Built, designed, and supported the Salesforce data pipeline to ingest and curate data for data users. Created a framework to upload processed data back to Salesforce and Cassandra for product use.
- Constructed revenue and finance pipelines to ingest NetSuite, Stripe, Intercom, and Zuora data and curated datasets for data users to consume.
- Led the marketplace pipelines revamp project by designing and implementing the pipelines. Improved pipelines from 170+ datasets to less than 20 to deliver the same target datasets, reducing the run time from over five to two hours.
- Managed the core pipelines revamp project by designing and implementing the pipelines to allow incremental loads and using indexes in the custom-built Sqoop framework.
- Introduced a lookup table in the data platform by extracting the hard-coded values and descriptions from the product applications. Collaborated with the product engineering teams to co-share the lookup table to support the data echo system.
- Led the proof of concept (POC) and stand-up data observability tool, Monte Carlo. Migrated existing monitors and notifications to Monte Carlo, integrating with Airflow and Spark for full lineage and task information.
- Managed the POC project for AWS data platform infrastructure evaluation.
- Spearheaded projects to migrate jobs from Luigi and the in-house built scheduler to Airflow. Created a handshake mechanism among all scheduling systems.
- Defined and executed the data archive and housekeeping policy to conduct AWS cost savings.
Senior Manager of Data Engineering and Business Intelligence
DoorDash
- Interfaced with the controller and chief accounting officer to build out scalable data pipelines and automate our back-office operations.
- Supported the accounting and tax teams for all data and technical needs, including month-end close process, automated and manual journal entries, account reconciliations, variance analysis, financial bug and fraud analysis, compliance, and audit.
- Developed and maintained tools to create automated journal entries and continuously integrate changes for accounting impacts from new product releases and financial bug fixes.
- Supported audit and compliance with the Sarbanes-Oxley Act (SOX), California Consumer Privacy Act (CCPA), and General Data Protection Regulation (GDPR). Developed a tool to audit data to external auditors, cutting new request lead time by two days.
- Architected a new accounting data platform to create and facilitate immutable data sets to support accounting month-end close, reconciliation, flux analysis, anomaly monitoring, financial bug tracking, compliance, and audit.
- Led a credit sub-ledger project to build a credit data mart with customers' credit issuance, redemption, and usage information as the source of truth of the credit sub-ledger.
- Spearheaded the revenue sub-ledger project to build a revenue sub-ledger as the immutable source of truth of revenue activities to be used by the accounting book close.
- Managed the tax data mart project to design and build a tax data mart as the source of truth of tax at item level and jurisdiction level details for orders, dashers income, merchants income, and sales tax.
- Guided the LTC (Litecoin) to UTC (UltraCoin) conversion for 70+ Chartio accounting dashboards in support of the accounting book close and reporting, which allows for consistency in financial reporting based on a standardized timeframe.
- Steered the Chartio to Sigma migration for 100+ accounting dashboards and worked with internal and external auditors on certifying the audit process of the migration and Sigma go-live.
Senior Data Engineering Manager
Ipsy
- Architected and refactored data infrastructure and ETL pipelines.
- Spearheaded the CCPA (California Consumer Privacy Act) project to support new compliance regulations in effect on January 1, 2020.
- Managed the data engineering team to deliver a seamless solution to support application architecture refactoring by adopting microservices. Built new pipelines to support legacy and new data models.
- Led the offshore machine learning platform team in Argentina. Designed and built initial MLOps infrastructure, automated machine learning training and execution processes, created a dashboard to manage models, and implemented API for model serving.
- Carried out continuous development, support, and maintenance of the custom-built events system, user attributes system, experiments system, and customer beauty quiz system to support business operations and new product features.
- Guided the CRM team to integrate the Iterable marketing platform and customer data platform (CDB) with the in-house events system, machine learning platform, and data lake.
- Explored AWS and Databricks services cost savings opportunities and made adjustments accordingly.
- Created data engineering roadmaps, led scrum teams, managed projects, and provided technical direction.
- Conducted talent acquisition and employee development.
Data Engineering Manager
Auction.com
- Designed, implemented, supported, and maintained ETL pipelines and data products to support commercial business platforms and residential business platform releases. Source from internal and external data. Built frameworks for configurable reuse.
- Supported and maintained data product pipelines, which generated email recommendations and uploaded the data to the marketing automation engagement platform Marketo.
- Designed, implemented, supported, and maintained ETL pipelines and dashboards for click-stream data such as Omniture (Adobe Analytics) and Google Analytics to ingest data to data platforms. Transformed and curated consumable data sets for data users.
- Oversaw AWS migration project to migrate existing on-prem Hadoop and Linux applications, processes, and data to AWS cloud. Analyzed environment differences, created migration plans, and led the data engineering team through execution.
- Led data engineering team to address the change impacts from a two-year microservices adoption in application architecture design project. Reduced the impacts by switching data sources behind the scenes and making it transparent to the users.
Senior Development Manager - Data Engineering and Analytics
Intuit
- Led TurboTax marketing analytics data mart project to ingest and derive TurboTax e-filing data for analytics, including architecting, designing, and implementing ETL framework and data warehouse schemas.
- Oversaw Intuit Analytic Cloud project to ingest data from sources owned by business units using Netezza, Hadoop, Hive, Pig, Python, and Informatica Big Data edition.
- Led metadata management system project to document Intuit data and enable access. Partnered with business units to analyze data streams, load metadata, and create data lineages. Teamed with legal and data governance to leverage usage for data security.
- Refactored ETL frameworks to metadata-driven architecture. Implemented daily process monitors and alerts, created audit checks, and generated daily process reports.
VP of Software Engineering
Acellent Technology, Inc.
- Led implementation of a family of Structure Health Monitoring software. The software processed the collected sensor signals and used the company-owned patented algorithms to calculate and monitor structure health.
- Designed and developed a listener app for a passive detective system.
- Led software implementation for the Lockheed Reasoner project. Developed structure monitoring system in C++. Integrated distributed MATLAB program with the C++-based system. Tested with the hardware system and sensor-mounted coupon.
- Set up a QA environment for software, hardware, and sensor integration tests. Defined QA procedures and set up a bug-tracking system.
- Led ISO 9000, ITIL, and SDLC introduction, implementation, and certifications.
Consultant
Lockheed Martin
- Worked on various projects to develop ETL interfaces using Informatica to load data to data warehouse fact tables and dimension tables. Applied sources and targets, including Oracle, Microsoft SQL Server, flat file, Hyperion, and SAP.
- Designed architecture and conducted data analysis, profiling, and quality assurance. Designed and implemented reusable modules to share with the team. Partnered with users to create interface control documents and generated test cases/system test documents.
- Migrated ETL interfaces from PowerCenter v7 to v8. Conducted code reviews and created extensive documentation of standards, best practices, and ETL procedures.
- Developed pmcmd command wrapper to ensure flexibility in supporting two domains on one machine. Defined Control-M run books for job scheduling; planned and performed production deployment.
- Worked on continuous improvement by evaluating and merging similar scripts into one generic and dynamic script in each category.
AVP, Data Management
NOMURA
- Led architecture design and built enterprise Informatica infrastructure in five environments; provided user/group/folder maintenance, code migration, repository backup and restore, upgrades, patches, performance tuning, and disaster recovery.
- Developed UNIX and Perl scripts to automate repository backup, file purge, server monitors, lookup cache directory monitor, and user account manipulation in order to generate SOX-compliant logs.
- Designed and developed ETL interfaces to load data from PeopleSoft Trade Settlement systems to data warehouses.
- Designed and developed applications/workflows for the help desk, change management, approval process, custom templates, survey, and configuration management database. Leveraged ITIL concepts for users to streamline IT management and service desks.
- Integrated Remedy ITSM with BMC Patrol, AlarmPoint, LDAP, Exchange Server, eTrust, and HR applications based on web services, filter API, Java, and Python scripting. Developed BI reports using Remedy reporting tools and Business Objects.
Technical Lead
Brown Brothers Harriman
- Delivered a web application called stochastic asset liability strategy analysis (SALSA), which was developed to meet the investment demands of insurance companies and other institutional clients.
- Architected, designed, and implemented a web application called Know Your Client (KYC) to automate electronic approvals on new accounts in order to comply with PATRIOT ACT certifications.
- Implemented an application to generate the KYC document in PDF from the web application. Created a batch process to post KYC PDF files onto Mobius, which was a write-once permanent document retention system on the mainframe.
- Designed and developed a process to automate trade settlements by extracting data from the Portia investment operations platform and joining it with clearing information. Upon approval from the web client, it was sent to the custody/bids system.
- Developed scripts to monitor scheduled processes and to detect FTP errors with email alert capability. Designed and developed a Java program to control users' access to a web application hosted by a vendor.
Experience
Migrate Data Platform from On-prem to AWS
Intelligent Support Platform for SHM System
I evaluated the knowledge base of AI models and architected an Intelligent support platform for the SHM system. I also contributed to the performance improvement of the knowledge base.
Skills
Libraries/APIs
PySpark, Luigi, Sigma.js
Tools
Apache Airflow, Microsoft Excel, AWS CLI, AWS Glue, Flink, Redash, Tableau, GitHub, Jira, Slack, Chartio, Confluence, Amazon Athena, Amazon Kinesis Data Firehose, Apache Sqoop, Rundeck, Bitbucket, Hyperion, BMC Remedy, AWS Step Functions, Intuit TurboTax, AI Prompts
Languages
SQL, Snowflake, Python, T-SQL (Transact-SQL), Java, C++, Perl, JavaScript, HTML, SAS
Frameworks
Hadoop, Apache Spark, Spark
Paradigms
ETL, ETL Implementation & Design, Business Intelligence (BI)
Storage
Apache Hive, Data Pipelines, NoSQL, Amazon S3 (AWS S3), PostgreSQL, HBase, MySQL, Netezza, Microsoft SQL Server
Platforms
AWS IoT, Databricks, Apache Kafka, Amazon Web Services (AWS), Jupyter Notebook, PagerDuty, Apache Pig, Alteryx, Oracle, Kubernetes, Docker, AWS Lambda, Linux
Other
Data Engineering, Real Estate, Data Analysis, Software Development, Data Modeling, Data Warehousing, ETL Development, ETL Testing, Data Migration, ELT, API Integration, Data Analytics, Data Visualization, Monte Carlo, Amplitude, Amazon Kinesis, Segment, Data Build Tool (dbt), ETL Tools, Shell Scripting, Informatica, SAP, APIs, AI Model Training, AI Model Intergration, Command Prompt (CMD)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring