SQL Developer in San Francisco, CA, United States
Data Engineer2019 - PRESENTCisco
Technologies: Python, PySpark, Sparks SQL, Hadoop, Hive, JSON, Google Cloud Platform. GitHub, Tidal, JIRA
- Supported machine learning, AI, and sales campaign automation.
- Built data pipeline framework, guidelines, production procedures, data architecture, and code review process.
- Led junior Python and PySpark developers.
Big Data Engineer2017 - 2018Western Digital
Technologies: AWS, Redshift, Python, S3, EC2, Postgres, Bash, Elastic Search
- Develop and support Enterprise Data Management Big Data Engineering, world-wide head and drive wafer fab production image and data ETL pipelines.
- Rebuild, manage and tune large production Enterprise Data Management AWS Redshift clusters to allow large volume pipelines and user queries.
- Support AWS Redshift, Redshift Spectrum, ElasticSearch, Kinesis, S3, EC2, RDS, MySQL, PostgreSQL, Aurora, CloudWatch. Manage Control-M, Spotfire, SnapLogic ETL.
- Support wafer images defect model Machine Learning platform.
- Tooled Slack, Hadoop, Hive, Impala, Python, numpy, scipy, SVM, SVD, GitHib, BitBucket, Jenkins, Tidal, Java, JIRA, wiki, Confluence.
Lead Data Engineer2015 - 2017ModCloth
Technologies: AWS, Redshift, S3, EC2, Bash, Python, MySQL, Postgres
- Developed and maintained an online shopping eCommerce data engineering, data analytics, 25 ETL pipelines, and data warehouse as the only available data engineer.
- Constructed and managed Salesforce E-commerce Cloud (aka DemandWare), Square POS, E-commerce replication Percona FelexCDC, Adobe Omniture Marketing Cloud, Oracle Responsys, ScientiaMobile WURFL, Qualtrics, Zodiac, ShopKeep, Acuity, and RetailNext.
- Developed data pipelines with various vendors using GitHub, Python, C/C++, JAVA, REST API, JSON, XML, CSV, TSV, JIRA, Slack.
- Designed Azure migration of Azure SQL Data Warehouse, Blob Storage, and Linux VM.
Software System Engineer2002 - 2015Charles Schwab
Technologies: Linux, Oracle, RedHat, Perl, Bash, SQL
- Built new portfolio accounting system on Linux as the very first engineer.
- Led SPARKS team and built Cost Basis Accounting System, Reporting Repository Data Warehouse.
- Built and supported Eagle Investment Systems STAR and PACE products.
- Supported and migrated mainframe based system to RedHat Linux/Solaris VMware server and 100TB+ scale Oracle 9/10/11/12 RAC/TAF/EMC/HDS based DataGuard/Golden Gate environments.
- Developed and supported partitioning, parallel processing, ESP scheduling, high availability/failover, disaster recovery, Tivoli monitoring, Splunk, and Zenoss.
- Implemented and supported both development and production OLTP, OLAP, ETL, distributed Messaging (MQ), iPlanet/Apache, Application Server, Oracle 9/10/11/12 RAC databases, and DataGuard.
- Built and supported multiple TB scale development and performance/Volume/Stress testing environments.
- Developed systems and applications with JAVA, Perl, Shell, Python, SQL, PL/SQL, and XML languages.
- Educated team with SQL and RDBMS, MySQL/SQL Server, and Data Driven Documents library.
- Contact Hub (Development)
Sales campaign data science required data pipeline. Major sources from Hadoop and Hive. Primarily email and phone contact. Target in Hive, Google Could Platform and Snowflake. Pipeline includes eight tasks: dataextraction and ingestion, data deduplication, data transformation, data incremental load, data filtering, offer data generation, offer motion data generation, and data enrichment.
- eCommerce Data Pipeline Migration (Development)
Migrate in-house built transaction and ETL pipelines to Salesforce eCommerce cloud.
- Brokerage Portfolio Accounting System (Development)
Build new Linux and Oracle-based portfolio accounting system for the largest brokerage firm in West Coast with 16 million customers.
- Enterprise Data Management (Development)
Built wafer testing data ETL pipelines from wafer factories all over the world.
LanguagesSQL, XML, Python, Java, C, C++, Perl, Bash
FrameworksHadoop, AWS EMR
ToolsCisco Tidal Enterprise Scheduler, Tableau, Jira, Slack, GitHub
StorageAWS S3, Apache Hive, Elasticsearch, MySQL, PostgreSQL, AWS RDS, JSON, Redshift
Libraries/APIsPySpark, REST APIs, NumPy, SciPy
OtherTableau Server, CSV
PlatformsLinux, AWS EC2, Oracle, Talend
- Master of Science degree in Computer Science1986 - 1987Indiana University - Bloomington, Indiana
- Bachelor of Science degree in Engineering1978 - 1982National Taiwan University - Taipei, Taiwan