Hang Guo, Developer in Irvine, CA, United States
Hang is available for hire
Hire Hang

Hang Guo

Verified Expert  in Engineering

Bio

Hang is a skilled data and database engineer with a proven track record in writing programs and extracting valuable insights from data. He excels in back-end database development, ETL, data pipeline creation, and business intelligence solutions. Proficient in AWS technologies such as Lambda, Step Functions, EMR, DynamoDB, SES, and API Gateway, Hang is adept at working with on-premise databases (SQL Server, PostgreSQL), cloud databases (AWS Redshift), and big data technologies (Spark).

Portfolio

Amazon.com
Python, SQL, PySpark, Redshift, Amazon S3 (AWS S3), AWS Glue, AWS Lambda...
Honda
SQL, Python, SQL Server Integration Services (SSIS), JSON, APIs, C#...
Honda
SQL, Python, C#, SQL Server Integration Services (SSIS), Data Modeling...

Experience

Availability

Part-time

Preferred Environment

Linux, AWS Cloud Development Kit (CDK), Python, AWS Step Functions, AWS Lambda, Amazon S3 (AWS S3), EMR, Scala, SQL, Spark

The most amazing...

...achievement was being recognized as a high performer at Amazon in 2023, earning an "Exceeds High Bar" rating for exceptional contributions and impact.

Work Experience

Data Engineer II

2022 - PRESENT
Amazon.com
  • Achieved a 90% reduction in delivery latency. Led the design and development of data pipelines for a marketing measurement ML model, significantly improving efficiency and performance.
  • Saved millions through optimization. Optimized slow-running Spark jobs, resulting in substantial cost savings and drastically reducing software product delivery latency.
  • Enhanced team capabilities and efficiency. Mentored junior engineers and interns, provided critical data engineering support for data scientists and economists, and designed automation solutions for ETL workflows.
  • Demonstrated advanced data warehousing skills by developing and maintaining large-scale distributed data warehouses, ensuring robust data management and accessibility.
Technologies: Python, SQL, PySpark, Redshift, Amazon S3 (AWS S3), AWS Glue, AWS Lambda, Amazon Web Services (AWS), Data Engineering, Amazon Elastic Container Service (ECS), Docker, Spark, AWS Step Functions, AWS Cloud Development Kit (CDK), EMR, Scala, React, JavaScript, ETL, NoSQL, Apache Airflow, Data Modeling, pgAdmin, Jupyter Notebook, Linux, PostgreSQL, Apache Spark, Data Pipelines, Data Migration, T-SQL (Transact-SQL), Apache Kafka, Hadoop, Apache Flink, Amazon Aurora, Amazon RDS, Data Science, Data Warehouse Design, Data Warehousing, Data Analytics

Senior Database and Data Warehouse Engineer

2019 - 2022
Honda
  • Maintained and supported 150+ OLTP database ETL processes and 20+ OLAP data warehouse ETL processes. Achieved a 99.9% resolution rate for production issues, effectively addressing process failures from both application and analytics teams.
  • Completed a sensitive data migration project involving PII and non-PII data. Managed secure data feeds to the cloud by decrypting sensitive information and processing the reverse feed from the cloud, ensuring data integrity and security.
  • Developed a validation process to identify and load missing data into the data warehouse. This addressed data inaccuracies that had caused concerns for the business and analytics teams, restoring data integrity and trust.
  • Identified and resolved the root cause of performance issues in a large back-end database store procedure returning hierarchical data for API usage. Eliminated the timeout issue and unblocked the downstream application workflow.
  • Migrated a database and data warehouse to the cloud, thoroughly testing and ensuring all database and ETL programs functioned correctly post-migration, maintaining data integrity and operational efficiency.
  • Developed helper programs in Python and C# to automate repeatable tasks, including database server interactions, FTP/SFTP, file system and security, cloud operations, and Spark processes, greatly reducing manual effort and increasing efficiency.
Technologies: SQL, Python, SQL Server Integration Services (SSIS), JSON, APIs, C#, Amazon Web Services (AWS), Data Engineering, Docker, Spark, SQL Server Management Studio (SSMS), Jupyter Notebook, Apache Spark, Databricks, Data Pipelines, Data Migration, T-SQL (Transact-SQL), MySQL, MySQL DBA, Microsoft SQL Server, SQL Server 2014, SQL Server 2017, Data Science, Data Warehouse Design, Data Warehousing, Data Analytics

Database Engineer

2019 - 2019
Honda
  • Completed an ETL project in two weeks and deployed it to production; it loads large text files to a relational database and it had been left unfinished by the previous contractor due to its complexity.
  • Provided business intelligence solutions by creating and maintaining business intelligence reports and data visualizations.
  • Worked with the project team on database design, database programs, and development.
Technologies: SQL, Python, C#, SQL Server Integration Services (SSIS), Data Modeling, Dimensional Modeling, SQL Server Management Studio (SSMS), Jupyter Notebook, Data Migration, T-SQL (Transact-SQL), MySQL, MySQL DBA, Microsoft SQL Server, SQL Server 2014, SQL Server 2017, Data Science, Data Warehouse Design, Data Warehousing, Data Analytics

Database Engineer

2018 - 2019
Helm360
  • Assisted a senior database engineer with database development.
  • Designed a relational data model to meet project needs.
  • Contributed to the development of ETL pipelines and business intelligence reports.
  • Wrote SQL functions and store procedures for use by the API and ETL data pipelines.
Technologies: SQL, SQL Server Integration Services (SSIS), C#, Python, Data Modeling, Dimensional Modeling, Microsoft Power BI, Data Migration, T-SQL (Transact-SQL), Microsoft SQL Server, SQL Server 2014, SQL Server 2017, Data Warehouse Design, Data Warehousing

Support and Development of a Powersports Website Database and ETL

TASKS ACCOMPLISHED
• Supported and enhanced a back-end database and ETL process for the main web application API.
• Supported cross-environment data publishing and data validation.
• Troubleshot database performance issues caused by large database programs.

Inventory Data Flow and Database

TASKS ACCOMPLISHED
• Built and supported an auto inventory and data management database (OLTP).
• Created an ETL program to manage inventory and offer data flow across different environments.
• Developed database programs for the use of the Web API.

Cloud Sensitive Data Migration

The project involved migrating sensitive PII information from an on-premise database to the cloud. As the main developer, I created a custom ETL package.

CUSTOM ETL PACKAGE PROCESS
• Extracted and decrypted the PII information.
• Transformed and applied the cloud business rule to the data and sent it to the destination.
• Retrieved data from the cloud and built an automation process to populate cloud data back to the on-premise database.

Weblog Data ETL Pipeline

TASKS ACCOMPLISHED
• Created ETL data pipeline loads for JSON weblogs from AWS S3 using PySpark; it then writes the Parquet files back to AWS S3.
• Developed a program that quickly reads and analyzes these Parquet files using PySpark.

The company's consumer application team uses the weblog ETL/analyze process.

Jira System's Data ETL and Generation of a Campaign Calendar

PROJECT PROCESS
• Extracted the data from the old ticketing system and loaded it into the Jira ticketing system.
• Extracted the data from the Jira ticketing system daily and loaded it into a data warehouse.
• Transformed the data into a format required by the business.
• Generated a marketing campaign calendar based on the data set, and then the calendar was emailed.

Enhancement of a Data Warehouse Lead System

This is the process that loads the feeds sent by 3rd-party vendors to a data warehouse. The old process used fuzzy logic to map the lead information between the OLTP and OLAP and sent the result to the analytics team, which caused data inaccuracies in rare cases.

We decided to enhance the logic by introducing a unique identifier between the two systems. The process involves redesigning a few tables belonging to the data warehouse and modifying the ETL process.

Data Pipeline of ML Model

TASKS ACCOMPLISHED
• Set up an RDS database and ETL workflow (S3 and Lambda) to store ML model metadata.
• Built an automation workflow to ingest and merge transaction data with 3rd-party data in an EMR environment, driven by AWS Step Functions and Lambda.
• Encrypted PII using an internal library.

Validation API Gateway

Developed an API gateway for a web app to pull validation results, handling authentication with AWS Cognito and backed by AWS Lambda. I built the infrastructure using AWS CDK, ensuring secure and efficient user authentication.
2013 - 2015

Master's Degree in Information Technology

University of La Verne - La Verne, CA, United States

2006 - 2010

Bachelor's Degree in Business Administration

Guang Dong University of Technology - Guang Zhou, China

NOVEMBER 2023 - PRESENT

React Fundamentals

Udacity

AUGUST 2021 - PRESENT

Machine Learning Engineer

Udacity

MAY 2021 - PRESENT

Full-stack Web Developer

Udacity

MARCH 2021 - PRESENT

Data Engineering

Udacity

Libraries/APIs

PySpark, Flask-RESTful, React

Tools

pgAdmin, AWS Cloud Development Kit (CDK), AWS Step Functions, Apache Airflow, Microsoft Power BI, PyCharm, AWS Glue, Amazon Elastic Container Service (ECS), Amazon Elastic MapReduce (EMR), Amazon Cognito

Languages

Python, SQL, T-SQL (Transact-SQL), C#, C#.NET, Scala, JavaScript, TypeScript, Java

Frameworks

Spark, Apache Spark, Hadoop, Flask

Paradigms

ETL, Dimensional Modeling

Platforms

Jupyter Notebook, AWS Lambda, Amazon Web Services (AWS), Linux, Apache Kafka, Amazon EC2, Docker, Databricks, Apache Flink

Storage

Amazon S3 (AWS S3), SQL Server Management Studio (SSMS), PostgreSQL, Amazon Aurora, Microsoft SQL Server, SQL Server 2014, SQL Server 2017, SQL Server Integration Services (SSIS), JSON, MySQL, Redshift, NoSQL, Cassandra, Data Pipelines

Other

Data Modeling, Data Engineering, EMR, Amazon RDS, Data Migration, Data Science, Data Warehouse Design, Data Warehousing, Data Analytics, Big Data, MySQL DBA, APIs, Business Administration, Data Analysis, Machine Learning, Message Queues, API Gateways, Healthcare Management Systems

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring