Data Analytics Engineer2017 - 2020PACO Technologies, Inc.
Technologies: Amazon Web Services (AWS), AWS, Spark, SQL, Python
- Built an internal data entry system by Python-Flask to improve data quality and eliminate data noise.
- Automated the data acquisition process to reduce human errors significantly. Configured the server environment on AWS EC2 and RDS with reliable security groups.
- Designed and developed complex SQL queries, Python script, and triggers for ETL jobs. Integrated and maintained data from a variety of sources, assuring they adhere to data quality and accessibility standards.
- Generated bi-weekly Ad-Hoc data reports by CloudWatch, Lambda, and SQL/Excel to prevent manual queries.
- Developed a KPI dashboard by Power BI to track company recruiting performance internally and facilitate the decision-making process.
- Developed, deployed, and managed the data pipeline (DocumentDB, Athena, Redshift, S3, Lambda) that cleans, transforms, and aggregates unorganized and messy data into databases, allowing for seamless collection, storage, and management of big data.
- Developed data classifiers, mining algorithms, and models for engineering documents sentiment analysis, topic mining, and data visualization.