Hang Guo
Verified Expert in Engineering
Database Engineer and Developer
Irvine, CA, United States
Toptal member since May 20, 2021
Hang is a skilled data and database engineer with a proven track record in writing programs and extracting valuable insights from data. He excels in back-end database development, ETL, data pipeline creation, and business intelligence solutions. Proficient in AWS technologies such as Lambda, Step Functions, EMR, DynamoDB, SES, and API Gateway, Hang is adept at working with on-premise databases (SQL Server, PostgreSQL), cloud databases (AWS Redshift), and big data technologies (Spark).
Portfolio
Experience
Availability
Preferred Environment
Linux, AWS Cloud Development Kit (CDK), Python, AWS Step Functions, AWS Lambda, Amazon S3 (AWS S3), EMR, Scala, SQL, Spark
The most amazing...
...achievement was being recognized as a high performer at Amazon in 2023, earning an "Exceeds High Bar" rating for exceptional contributions and impact.
Work Experience
Data Engineer II
Amazon.com
- Achieved a 90% reduction in delivery latency. Led the design and development of data pipelines for a marketing measurement ML model, significantly improving efficiency and performance.
- Saved millions through optimization. Optimized slow-running Spark jobs, resulting in substantial cost savings and drastically reducing software product delivery latency.
- Enhanced team capabilities and efficiency. Mentored junior engineers and interns, provided critical data engineering support for data scientists and economists, and designed automation solutions for ETL workflows.
- Demonstrated advanced data warehousing skills by developing and maintaining large-scale distributed data warehouses, ensuring robust data management and accessibility.
Senior Database and Data Warehouse Engineer
Honda
- Maintained and supported 150+ OLTP database ETL processes and 20+ OLAP data warehouse ETL processes. Achieved a 99.9% resolution rate for production issues, effectively addressing process failures from both application and analytics teams.
- Completed a sensitive data migration project involving PII and non-PII data. Managed secure data feeds to the cloud by decrypting sensitive information and processing the reverse feed from the cloud, ensuring data integrity and security.
- Developed a validation process to identify and load missing data into the data warehouse. This addressed data inaccuracies that had caused concerns for the business and analytics teams, restoring data integrity and trust.
- Identified and resolved the root cause of performance issues in a large back-end database store procedure returning hierarchical data for API usage. Eliminated the timeout issue and unblocked the downstream application workflow.
- Migrated a database and data warehouse to the cloud, thoroughly testing and ensuring all database and ETL programs functioned correctly post-migration, maintaining data integrity and operational efficiency.
- Developed helper programs in Python and C# to automate repeatable tasks, including database server interactions, FTP/SFTP, file system and security, cloud operations, and Spark processes, greatly reducing manual effort and increasing efficiency.
Database Engineer
Honda
- Completed an ETL project in two weeks and deployed it to production; it loads large text files to a relational database and it had been left unfinished by the previous contractor due to its complexity.
- Provided business intelligence solutions by creating and maintaining business intelligence reports and data visualizations.
- Worked with the project team on database design, database programs, and development.
Database Engineer
Helm360
- Assisted a senior database engineer with database development.
- Designed a relational data model to meet project needs.
- Contributed to the development of ETL pipelines and business intelligence reports.
- Wrote SQL functions and store procedures for use by the API and ETL data pipelines.
Experience
Support and Development of a Powersports Website Database and ETL
• Supported and enhanced a back-end database and ETL process for the main web application API.
• Supported cross-environment data publishing and data validation.
• Troubleshot database performance issues caused by large database programs.
Inventory Data Flow and Database
• Built and supported an auto inventory and data management database (OLTP).
• Created an ETL program to manage inventory and offer data flow across different environments.
• Developed database programs for the use of the Web API.
Cloud Sensitive Data Migration
CUSTOM ETL PACKAGE PROCESS
• Extracted and decrypted the PII information.
• Transformed and applied the cloud business rule to the data and sent it to the destination.
• Retrieved data from the cloud and built an automation process to populate cloud data back to the on-premise database.
Weblog Data ETL Pipeline
• Created ETL data pipeline loads for JSON weblogs from AWS S3 using PySpark; it then writes the Parquet files back to AWS S3.
• Developed a program that quickly reads and analyzes these Parquet files using PySpark.
The company's consumer application team uses the weblog ETL/analyze process.
Jira System's Data ETL and Generation of a Campaign Calendar
• Extracted the data from the old ticketing system and loaded it into the Jira ticketing system.
• Extracted the data from the Jira ticketing system daily and loaded it into a data warehouse.
• Transformed the data into a format required by the business.
• Generated a marketing campaign calendar based on the data set, and then the calendar was emailed.
Enhancement of a Data Warehouse Lead System
We decided to enhance the logic by introducing a unique identifier between the two systems. The process involves redesigning a few tables belonging to the data warehouse and modifying the ETL process.
Data Pipeline of ML Model
• Set up an RDS database and ETL workflow (S3 and Lambda) to store ML model metadata.
• Built an automation workflow to ingest and merge transaction data with 3rd-party data in an EMR environment, driven by AWS Step Functions and Lambda.
• Encrypted PII using an internal library.
Validation API Gateway
Education
Master's Degree in Information Technology
University of La Verne - La Verne, CA, United States
Bachelor's Degree in Business Administration
Guang Dong University of Technology - Guang Zhou, China
Certifications
React Fundamentals
Udacity
Machine Learning Engineer
Udacity
Full-stack Web Developer
Udacity
Data Engineering
Udacity
Skills
Libraries/APIs
PySpark, Flask-RESTful, React
Tools
pgAdmin, AWS Cloud Development Kit (CDK), AWS Step Functions, Apache Airflow, Microsoft Power BI, PyCharm, AWS Glue, Amazon Elastic Container Service (ECS), Amazon Elastic MapReduce (EMR), Amazon Cognito
Languages
Python, SQL, T-SQL (Transact-SQL), C#, C#.NET, Scala, JavaScript, TypeScript, Java
Frameworks
Spark, Apache Spark, Hadoop, Flask
Paradigms
ETL, Dimensional Modeling
Platforms
Jupyter Notebook, AWS Lambda, Amazon Web Services (AWS), Linux, Apache Kafka, Amazon EC2, Docker, Databricks, Apache Flink
Storage
Amazon S3 (AWS S3), SQL Server Management Studio (SSMS), PostgreSQL, Amazon Aurora, Microsoft SQL Server, SQL Server 2014, SQL Server 2017, SQL Server Integration Services (SSIS), JSON, MySQL, Redshift, NoSQL, Cassandra, Data Pipelines
Other
Data Modeling, Data Engineering, EMR, Amazon RDS, Data Migration, Data Science, Data Warehouse Design, Data Warehousing, Data Analytics, Big Data, MySQL DBA, APIs, Business Administration, Data Analysis, Machine Learning, Message Queues, API Gateways, Healthcare Management Systems
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring