Hang Guo, Database Engineer and Developer in Irvine, CA, United States
Hang Guo

Database Engineer and Developer in Irvine, CA, United States

Member since April 10, 2021
Hang is a database and data engineer who excels at writing programs and obtaining data insights. He's worked with a variety of data sources and destinations, including on-premise databases (such as SQL Server and PostgreSQL), cloud databases (such as AWS Redshift), and big data technologies (such as Spark). Hang also has experience working with application teams for back-end database development, ETL, and data pipelines and in creating business intelligence solutions with the analytics team.
Hang is now available for hire

Portfolio

  • Honda
    SQL, Python, SQL Server Integration Services (SSIS), JSON, XML, APIs, C#
  • Honda
    SQL, Python, C#, SQL Server Integration Services (SSIS), Data Modeling...
  • Helm360
    SQL, SQL Server Integration Services (SSIS), C#, Python, Data Modeling...

Experience

Location

Irvine, CA, United States

Availability

Part-time

Preferred Environment

PyCharm, SQL Server Management Studio, pgAdmin, Jupyter Notebook

The most amazing...

...project was supporting and developing the back-end databases, data warehouse, and ETL pipeline for the app team and analytic teams at a Fortune 500 company.

Employment

  • Senior Database and Data Warehouse Engineer

    2019 - PRESENT
    Honda
    • Maintained and supported 150+ OLTP database ETL processes and 20+ OLAP data warehouse ETL processes. Resolved 99.9% of production issues, including complaints from the app team and analytics team regarding process failure or data questions.
    • Completed a sensitive data migration project, including both PII and Non-PII data. The project included sending a data feed to the cloud by decrypting sensitive info and processed the reverse feed sent from the cloud into an on-premise database.
    • Developed an ad-hoc process to check and load the missing data back and passed the validation; the data warehouse had missed some data which resulted in data inaccuracies that the business team and analytics team questioned.
    • Created a program that can quickly search programs/scripts by entering keywords to solve the issue that many legacy database ETL processes that didn't document well; before I took over, many processes were only recalled by human memory alone.
    • Found the root cause of performance issues with a large back-end database store process returning hierarchy JSON data for API usage (which caused the API to time out) and resolved the timeout issue.
    • Migrated a database and data warehouse to the cloud and tested and ensured that all the database and ETL programs worked fine after the migration.
    • Guided and coached junior developers on the overall workflow and tasks; also reviewed essential programs developed and/or modified by junior developers.
    • Enhanced the weblog ETL process by using Spark to read JSON logs and create partition files in AWS S3.
    • Built a high-quality payment estimator data work flow, including an ETL process and store procedures returning data for web API usage.
    • Created helper programs to reduce the repeatable human works, which included but was not limited to, Python and C# class to work with the database server, FTP, SFTP, file system, file security, cloud, and Spark.
    Technologies: SQL, Python, SQL Server Integration Services (SSIS), JSON, XML, APIs, C#
  • Database Engineer

    2019 - 2019
    Honda
    • Completed an ETL project in two weeks and deployed it to production; it loads large text files to a relational database and it had been left unfinished by the previous contractor due to its complexity.
    • Provided business intelligence solutions by creating and maintaining business intelligence reports and data visualizations.
    • Worked with the project team on database design, database programs, and development.
    Technologies: SQL, Python, C#, SQL Server Integration Services (SSIS), Data Modeling, Dimensional Modeling
  • Database Engineer

    2018 - 2019
    Helm360
    • Assisted a senior database engineer with database development.
    • Designed a relational data model to meet project needs.
    • Contributed to the development of ETL pipelines and business intelligence reports.
    • Wrote SQL functions and store procedures for use by the API and ETL data pipelines.
    Technologies: SQL, SQL Server Integration Services (SSIS), C#, Python, Data Modeling, Dimensional Modeling, Azure, Microsoft Power BI

Experience

  • Support and Development of a Powersports Website Database and ETL

    Tasks Accomplished:
    • Supported and enhanced a back-end database and ETL process for the main web application API.
    • Supported cross-environment data publishing and data validation.
    • Troubleshot database performance issues caused by large database programs.

  • Inventory Data Flow and Database

    Tasks Accomplished:
    • Built and supported an auto inventory and data management database (OLTP).
    • Created an ETL program to manage inventory and offer data flow across different environments.
    • Developed database programs for the use of the Web API.

  • Cloud Sensitive Data Migration

    The project involved migrating sensitive PII information from an on-premise database to the cloud, and I was the main developer and created a custom ETL package.

    Custom ETL Package Process:
    • Extracted and decrypted the PII information.
    • Transformed and applied the cloud business rule to the data and sent it to the destination.
    • Retrieved data from the cloud and built an automation process to populate cloud data back to the on-premise database.

  • Weblog Data ETL Pipeline

    Tasks Accomplished:
    • Created ETL data pipeline loads for JSON weblogs from AWS S3 using PySpark; it then writes the Parquet files back to AWS S3.
    • Developed a program that quickly reads and analyzes these Parquet files using PySpark.

    The company's consumer application team uses the weblog ETL/analyze process.

  • Jira System's Data ETL and Generation of a Campaign Calendar

    Project Process:
    • Extracted the data from the old ticketing system and loaded it into the Jira ticketing system.
    • Extracted the data from the Jira ticketing system daily and loaded it into a data warehouse.
    • Transformed the data into a format required by the business.
    • Generated a marketing campaign calendar based on the data set and then the calendar was emailed.

  • Enhancement of a Data Warehouse Lead System

    This is the process that loads the feeds sent by third-party vendors to a data warehouse. The old process used fuzzy logic to map the lead information between the OLTP and OLAP and sent the result to the analytics team, which cause data inaccuracies in rare cases.

    We decide to enhance the logic by introduce a unique identifier between the two systems. The process involves a redesign of a few tables belonging to the data warehouse and modifying the ETL process.

Skills

  • Languages

    Python, SQL, C#, C#.NET
  • Paradigms

    ETL, Dimensional Modeling
  • Storage

    SQL Server Integration Services (SSIS), JSON, AWS S3, Redshift, NoSQL, Cassandra, SQL Server Management Studio, PostgreSQL
  • Other

    Data Modeling, Big Data, APIs, Business Administration, Data Analysis
  • Frameworks

    Flask
  • Libraries/APIs

    PySpark, SQLAlchemy, Flask-RESTful
  • Tools

    Apache Airflow, Microsoft Power BI, PyCharm, pgAdmin
  • Platforms

    AWS EC2, Jupyter Notebook, Linux

Education

  • Master's Degree in Information Technology
    2013 - 2015
    University of La Verne - La Verne, CA, United States
  • Bachelor's Degree in Business Administration
    2006 - 2010
    Guang Dong University of Technology - Guang Zhou, China

Certifications

  • Full-stack Web Developer
    MAY 2021 - PRESENT
    Udacity
  • Data Engineering
    MARCH 2021 - PRESENT
    Udacity

To view more profiles

Join Toptal
Share it with others