Hang is available for hire

Hang Guo

Verified Expert in Engineering

Database Engineer and Developer

Location

Irvine, CA, United States

Toptal Member Since

May 20, 2021

Hang is a data and database engineer who excels at writing programs and obtaining data insights. He's worked with a variety of data sources and destinations, including on-premise databases (such as SQL Server and PostgreSQL), cloud databases (such as AWS Redshift), and big data technologies (such as Spark). Hang also has experience working with application teams for back-end database development, ETL, and data pipelines and in creating business intelligence solutions with the analytics team.

Message Queues Python SQL ETL SQL Server Integration Services (SSIS)Amazon Web Services (AWS)JSON Big Data C#Amazon S3 (AWS S3)PySpark Redshift Amazon EC2 NoSQL Cassandra

Portfolio

Amazon.com

Python, SQL, PySpark, Amazon Elastic MapReduce (EMR), Redshift...

Honda

SQL, Python, SQL Server Integration Services (SSIS), JSON, APIs, C#...

Honda

SQL, Python, C#, SQL Server Integration Services (SSIS), Data Modeling...

Experience

Data Modeling - 3 years SQL - 3 years ETL - 3 years SQL Server Integration Services (SSIS) - 3 years Python - 3 years JSON - 2 years Amazon S3 (AWS S3) - 1 year PySpark - 1 year

Availability

Part-time

Preferred Environment

PyCharm, SQL Server Management Studio (SSMS), pgAdmin, Jupyter Notebook, Amazon Web Services (AWS)

The most amazing...

...project was supporting and developing the back-end databases, data warehouse, and ETL pipeline for the app team and analytic teams at a Fortune 500 company.

Work Experience

Data Engineer ||

2022 - PRESENT

Amazon.com

Supported and developed a big data pipeline for machine learning models. Managed ETL services for a large-scale data warehouse for consumer transaction data.
Provided automation solutions for existing manual steps to maintain metadata for big data pipeline, saving client 99% times of processing.
Provided automation solutions to workflows maintained by data scientists and data analysts.

Technologies: Python, SQL, PySpark, Amazon Elastic MapReduce (EMR), Redshift, Amazon S3 (AWS S3), AWS Glue, AWS Lambda, Message Queues, Amazon Web Services (AWS)

Senior Database and Data Warehouse Engineer

2019 - 2022

Honda

Maintained and supported 150+ OLTP database ETL processes and 20+ OLAP data warehouse ETL processes. Resolved 99.9% of production issues, including complaints from the app and analytics teams regarding process failure or data questions.
Completed a sensitive data migration project, including PII and non-PII data. The project included sending a data feed to the cloud by decrypting sensitive info and processing the reverse feed sent from the cloud into an on-premise database.
Developed an ad-hoc process to check and load the missing data back and passed the validation; the data warehouse had missed some data, which resulted in data inaccuracies that the business team and analytics team questioned.
Created a program that can quickly search programs/scripts by entering keywords to solve the issue that many legacy database ETL processes that didn't document well; before I took over, many processes were only recalled by human memory alone.
Found the root cause of performance issues with a large back-end database store process returning hierarchy JSON data for API usage (which caused the API to time out) and resolved the timeout issue.
Migrated a database and data warehouse to the cloud and tested and ensured that all the database and ETL programs worked fine after the migration.
Guided and coached junior developers on the overall workflow and tasks; also reviewed essential programs developed and/or modified by junior developers.
Enhanced the weblog ETL process by using Spark to read JSON logs and create partition files in AWS S3.
Built a high-quality payment estimator data workflow, including an ETL process and store procedures returning data for web API usage.
Created helper programs to reduce the repeatable human work, including but not limited to Python and C# class to work with the database server, FTP, SFTP, file system, file security, cloud, and Spark.

Technologies: SQL, Python, SQL Server Integration Services (SSIS), JSON, APIs, C#, Amazon Web Services (AWS)

Database Engineer

2019 - 2019

Honda

Completed an ETL project in two weeks and deployed it to production; it loads large text files to a relational database and it had been left unfinished by the previous contractor due to its complexity.
Provided business intelligence solutions by creating and maintaining business intelligence reports and data visualizations.
Worked with the project team on database design, database programs, and development.

Technologies: SQL, Python, C#, SQL Server Integration Services (SSIS), Data Modeling, Dimensional Modeling

Database Engineer

2018 - 2019

Helm360

Assisted a senior database engineer with database development.
Designed a relational data model to meet project needs.
Contributed to the development of ETL pipelines and business intelligence reports.
Wrote SQL functions and store procedures for use by the API and ETL data pipelines.

Technologies: SQL, SQL Server Integration Services (SSIS), C#, Python, Data Modeling, Dimensional Modeling, Microsoft Power BI

Experience

Support and Development of a Powersports Website Database and ETL

Tasks Accomplished:
• Supported and enhanced a back-end database and ETL process for the main web application API.
• Supported cross-environment data publishing and data validation.
• Troubleshot database performance issues caused by large database programs.

Inventory Data Flow and Database

Tasks Accomplished:
• Built and supported an auto inventory and data management database (OLTP).
• Created an ETL program to manage inventory and offer data flow across different environments.
• Developed database programs for the use of the Web API.

Cloud Sensitive Data Migration

The project involved migrating sensitive PII information from an on-premise database to the cloud, and I was the main developer and created a custom ETL package.

Custom ETL Package Process:
• Extracted and decrypted the PII information.
• Transformed and applied the cloud business rule to the data and sent it to the destination.
• Retrieved data from the cloud and built an automation process to populate cloud data back to the on-premise database.

Weblog Data ETL Pipeline

Tasks Accomplished:
• Created ETL data pipeline loads for JSON weblogs from AWS S3 using PySpark; it then writes the Parquet files back to AWS S3.
• Developed a program that quickly reads and analyzes these Parquet files using PySpark.

The company's consumer application team uses the weblog ETL/analyze process.

Jira System's Data ETL and Generation of a Campaign Calendar

Project Process:
• Extracted the data from the old ticketing system and loaded it into the Jira ticketing system.
• Extracted the data from the Jira ticketing system daily and loaded it into a data warehouse.
• Transformed the data into a format required by the business.
• Generated a marketing campaign calendar based on the data set and then the calendar was emailed.

Enhancement of a Data Warehouse Lead System

This is the process that loads the feeds sent by third-party vendors to a data warehouse. The old process used fuzzy logic to map the lead information between the OLTP and OLAP and sent the result to the analytics team, which cause data inaccuracies in rare cases.

We decide to enhance the logic by introduce a unique identifier between the two systems. The process involves a redesign of a few tables belonging to the data warehouse and modifying the ETL process.

Skills

Languages

Python, SQL, C#, C#.NET

Paradigms

ETL, Dimensional Modeling

Platforms

Amazon Web Services (AWS), Amazon EC2, Jupyter Notebook, Linux, AWS Lambda

Storage

SQL Server Integration Services (SSIS), JSON, Amazon S3 (AWS S3), Redshift, NoSQL, Cassandra, SQL Server Management Studio (SSMS), PostgreSQL

Other

Data Modeling, Message Queues, Big Data, APIs, Business Administration, Data Analysis, Machine Learning

Frameworks

Flask

Libraries/APIs

PySpark, SQLAlchemy, Flask-RESTful

Tools

Apache Airflow, Microsoft Power BI, PyCharm, pgAdmin, Amazon Elastic MapReduce (EMR), AWS Glue

Education

2013 - 2015

Master's Degree in Information Technology

University of La Verne - La Verne, CA, United States

2006 - 2010

Bachelor's Degree in Business Administration

Guang Dong University of Technology - Guang Zhou, China

Certifications

AUGUST 2021 - PRESENT

Machine Learning Engineer

Udacity

MAY 2021 - PRESENT

Full-stack Web Developer

Udacity

MARCH 2021 - PRESENT

Data Engineering

Udacity

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring