Verified Expert in Engineering
Database Engineer and Developer
Hang is a data and database engineer who excels at writing programs and obtaining data insights. He's worked with a variety of data sources and destinations, including on-premise databases (such as SQL Server and PostgreSQL), cloud databases (such as AWS Redshift), and big data technologies (such as Spark). Hang also has experience working with application teams for back-end database development, ETL, and data pipelines and in creating business intelligence solutions with the analytics team.
PyCharm, SQL Server Management Studio, pgAdmin, Jupyter Notebook, Amazon Web Services (AWS)
The most amazing...
...project was supporting and developing the back-end databases, data warehouse, and ETL pipeline for the app team and analytic teams at a Fortune 500 company.
Data Engineer ||
- Supported and developed a big data pipeline for machine learning models. Managed ETL services for a large-scale data warehouse for consumer transaction data.
- Provided automation solutions for existing manual steps to maintain metadata for big data pipeline, saving client 99% times of processing.
- Provided automation solutions to workflows maintained by data scientists and data analysts.
Senior Database and Data Warehouse Engineer
- Maintained and supported 150+ OLTP database ETL processes and 20+ OLAP data warehouse ETL processes. Resolved 99.9% of production issues, including complaints from the app and analytics teams regarding process failure or data questions.
- Completed a sensitive data migration project, including PII and non-PII data. The project included sending a data feed to the cloud by decrypting sensitive info and processing the reverse feed sent from the cloud into an on-premise database.
- Developed an ad-hoc process to check and load the missing data back and passed the validation; the data warehouse had missed some data, which resulted in data inaccuracies that the business team and analytics team questioned.
- Created a program that can quickly search programs/scripts by entering keywords to solve the issue that many legacy database ETL processes that didn't document well; before I took over, many processes were only recalled by human memory alone.
- Found the root cause of performance issues with a large back-end database store process returning hierarchy JSON data for API usage (which caused the API to time out) and resolved the timeout issue.
- Migrated a database and data warehouse to the cloud and tested and ensured that all the database and ETL programs worked fine after the migration.
- Guided and coached junior developers on the overall workflow and tasks; also reviewed essential programs developed and/or modified by junior developers.
- Enhanced the weblog ETL process by using Spark to read JSON logs and create partition files in AWS S3.
- Built a high-quality payment estimator data workflow, including an ETL process and store procedures returning data for web API usage.
- Created helper programs to reduce the repeatable human work, including but not limited to Python and C# class to work with the database server, FTP, SFTP, file system, file security, cloud, and Spark.
- Completed an ETL project in two weeks and deployed it to production; it loads large text files to a relational database and it had been left unfinished by the previous contractor due to its complexity.
- Provided business intelligence solutions by creating and maintaining business intelligence reports and data visualizations.
- Worked with the project team on database design, database programs, and development.
- Assisted a senior database engineer with database development.
- Designed a relational data model to meet project needs.
- Contributed to the development of ETL pipelines and business intelligence reports.
- Wrote SQL functions and store procedures for use by the API and ETL data pipelines.
Support and Development of a Powersports Website Database and ETL
• Supported and enhanced a back-end database and ETL process for the main web application API.
• Supported cross-environment data publishing and data validation.
• Troubleshot database performance issues caused by large database programs.
Inventory Data Flow and Database
• Built and supported an auto inventory and data management database (OLTP).
• Created an ETL program to manage inventory and offer data flow across different environments.
• Developed database programs for the use of the Web API.
Cloud Sensitive Data Migration
Custom ETL Package Process:
• Extracted and decrypted the PII information.
• Transformed and applied the cloud business rule to the data and sent it to the destination.
• Retrieved data from the cloud and built an automation process to populate cloud data back to the on-premise database.
Weblog Data ETL Pipeline
• Created ETL data pipeline loads for JSON weblogs from AWS S3 using PySpark; it then writes the Parquet files back to AWS S3.
• Developed a program that quickly reads and analyzes these Parquet files using PySpark.
The company's consumer application team uses the weblog ETL/analyze process.
Jira System's Data ETL and Generation of a Campaign Calendar
• Extracted the data from the old ticketing system and loaded it into the Jira ticketing system.
• Extracted the data from the Jira ticketing system daily and loaded it into a data warehouse.
• Transformed the data into a format required by the business.
• Generated a marketing campaign calendar based on the data set and then the calendar was emailed.
Enhancement of a Data Warehouse Lead System
We decide to enhance the logic by introduce a unique identifier between the two systems. The process involves a redesign of a few tables belonging to the data warehouse and modifying the ETL process.
Python, SQL, C#, C#.NET
ETL, Dimensional Modeling
Amazon Web Services (AWS), Amazon EC2, Jupyter Notebook, Linux, AWS Lambda
SQL Server Integration Services (SSIS), JSON, Amazon S3 (AWS S3), Redshift, NoSQL, Cassandra, SQL Server Management Studio, PostgreSQL
Data Modeling, Message Queues, Big Data, APIs, Business Administration, Data Analysis, Machine Learning
PySpark, SQLAlchemy, Flask-RESTful
Apache Airflow, Microsoft Power BI, PyCharm, pgAdmin, Amazon Elastic MapReduce (EMR), AWS Glue
Master's Degree in Information Technology
University of La Verne - La Verne, CA, United States
Bachelor's Degree in Business Administration
Guang Dong University of Technology - Guang Zhou, China
Machine Learning Engineer
Full-stack Web Developer