Data Engineer ||2022 - PRESENTAmazon.com
Technologies: Python, SQL, PySpark, AWS EMR, Redshift, AWS S3, AWS Glue, AWS Lambda
- Supported and developed a big data pipeline for machine learning models. Managed ETL services for a large-scale data warehouse for consumer transaction data.
- Provided automation solutions for existing manual steps to maintain metadata for big data pipeline, saving client 99% times of processing.
- Provided automation solutions to workflows maintained by data scientists and data analysts.
Senior Database and Data Warehouse Engineer2019 - 2022Honda
Technologies: SQL, Python, SQL Server Integration Services (SSIS), JSON, XML, APIs, C#
- Maintained and supported 150+ OLTP database ETL processes and 20+ OLAP data warehouse ETL processes. Resolved 99.9% of production issues, including complaints from the app and analytics teams regarding process failure or data questions.
- Completed a sensitive data migration project, including PII and non-PII data. The project included sending a data feed to the cloud by decrypting sensitive info and processing the reverse feed sent from the cloud into an on-premise database.
- Developed an ad-hoc process to check and load the missing data back and passed the validation; the data warehouse had missed some data, which resulted in data inaccuracies that the business team and analytics team questioned.
- Created a program that can quickly search programs/scripts by entering keywords to solve the issue that many legacy database ETL processes that didn't document well; before I took over, many processes were only recalled by human memory alone.
- Found the root cause of performance issues with a large back-end database store process returning hierarchy JSON data for API usage (which caused the API to time out) and resolved the timeout issue.
- Migrated a database and data warehouse to the cloud and tested and ensured that all the database and ETL programs worked fine after the migration.
- Guided and coached junior developers on the overall workflow and tasks; also reviewed essential programs developed and/or modified by junior developers.
- Enhanced the weblog ETL process by using Spark to read JSON logs and create partition files in AWS S3.
- Built a high-quality payment estimator data workflow, including an ETL process and store procedures returning data for web API usage.
- Created helper programs to reduce the repeatable human work, including but not limited to Python and C# class to work with the database server, FTP, SFTP, file system, file security, cloud, and Spark.
Database Engineer2019 - 2019Honda
Technologies: SQL, Python, C#, SQL Server Integration Services (SSIS), Data Modeling, Dimensional Modeling
- Completed an ETL project in two weeks and deployed it to production; it loads large text files to a relational database and it had been left unfinished by the previous contractor due to its complexity.
- Provided business intelligence solutions by creating and maintaining business intelligence reports and data visualizations.
- Worked with the project team on database design, database programs, and development.
Database Engineer2018 - 2019Helm360
Technologies: SQL, SQL Server Integration Services (SSIS), C#, Python, Data Modeling, Dimensional Modeling, Azure, Microsoft Power BI
- Assisted a senior database engineer with database development.
- Designed a relational data model to meet project needs.
- Contributed to the development of ETL pipelines and business intelligence reports.
- Wrote SQL functions and store procedures for use by the API and ETL data pipelines.