Shawn Xiao
Verified Expert in Engineering
Big Data Developer
Shawn has been working on data management and data analytics for different industries for the last 15 years. He is also a Microsoft Certified Solutions Expert for data management and analytics, familiar with various technologies such as Azure, AWS, big data, Spark, SQL, Hadoop, BI, DW, and Tableau. Shawn has strong problem-solving and root cause analysis skills.
Portfolio
Experience
Availability
Preferred Environment
Amazon Web Services (AWS), Azure, Big Data, Python, SQL
The most amazing...
...thing I've optimized and improved is an E2E overnight data process solution, reducing the time from 16 hours to 4.5 hours.
Work Experience
Data Engineer
BitAlpha, Inc.
- Set up and configured a big data processing platform using Databricks on Google Cloud. Designed, built, and maintained data pipelines using Python to process data from GCS, Ethereum, and PostgreSQL to Delta Data warehouse, data lake, and BigQuery.
- Designed and implemented efficient and scalable data models to support reporting, ensuring data quality and integrity by implementing data validation, cleansing, and transformation processes.
- Monitored and optimized the performance of data systems to ensure efficient and reliable data processing. Developed and maintained documentation for data systems and processes to ensure their scalability and sustainability over time.
- Collaborated with developers to understand their data needs and provided data solutions to support their work.
Senior Database Specialist
Woolworths New Zealand
- Upgraded and migrated an on-premise SQL Server instance to Azure Virtual Machines to enable the server to scale up for a better user shopping online experience for about two million customers.
- Conducted optimization and performance tuning of the SQL Server instance. Refactored a query to reduce the execution time of the query from two minutes to 20 seconds.
- Containerized the SQL database instance to increase the developers' productivity.
Senior Data Engineer
Plexure
- Optimized the E2E data process and pipeline to reduce the overnight load from 16.5 hours to 4.5 hours, enabling presenting the reports to our key external business partners.
- Designed and developed data pipelines on Azure using Azure Data Factory to process 200 million users' sales data in order to meet the business's needs.
- Designed and developed data pipelines running on AWS using Pyspark, Glue, EMR, Lambda, and Redshift to process mobile app data for the key external customers.
Senior Managed Services BI Consultant
Altis Consulting
- Developed the data pipelines to extract, load, and transform 1,000,000 sales transaction data daily from CSV to Redshift database using AWS EMR and Lambda for a fast-food channel brand.
- Developed the data pipelines to extract, load, and transform signal data every five minutes sent via the signal tower API to the Azure SQL database, using Azure Data Factory to track the performance of the towers countrywide.
- Developed and maintained business intelligence solutions running on SSIS, SAP Data Services, and IBM Cognos for different companies within industries such as utilities, university, and transportation.
Database Developer
Fisher & Paykel Healthcare Corporation, Ltd.
- Planned, designed, and implemented the SQL Server Enterprise platform (2012, 2014) to support $100 million business growth for the next three years.
- Designed and implemented a backup and restore strategy, data security strategy, data storage strategy, high availability (HA), and disaster recovery (DR) strategy.
- Monitored, troubleshot, and optimized the database system running on MSSQL Server (2000, 2005, 2008, 2012, 2014) and Azure SQL.
BI Developer | Data Warehouse Developer
Microsoft
- Built and implemented different BI solutions for WoS reports, which helped the VP of the department track the inventory and forecast some products like Xbox One or Surface.
- Designed and implemented an E2E solution for a system to help supplier chain users track sales and delivery status for MS products (Surface). Designed and developed extraction, transformation, and load (ETL) to support data integration needs.
- Developed and optimized the database applications in SQL Server. Performed administrative tasks of the BI and ETL tools and assisted in the deployment of the application code.
Experience
Data Pipeline Development
Developed the data pipeline using Azure Data Factory to extract data from Azure Table storage and land it to Azure Blob storage. Azure Function triggered to copy it for AWS S3, and AWS Glue job triggered Lambda to write to the S3 sink bucket, resulting in an overnight job picking up the data and writing it into the Redshift database.
Skills
Languages
SQL, Python, TypeScript, Snowflake
Paradigms
ETL, Database Development, Business Intelligence (BI)
Storage
SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), Redshift, Azure SQL Databases, Amazon S3 (AWS S3), Data Pipelines, PostgreSQL, Azure Table Storage, SQL Server 2012, SQL Server 2014, Elasticsearch, Database Modeling
Other
Data Management, Performance Tuning, Big Data, Azure Data Factory, ETL Tools, Data Warehousing, Data Engineering, APIs, Data Modeling, Reporting, Data Analytics, Azure Data Lake, Data Warehouse Design, Azure Databricks, SAP BusinessObjects Data Service (BODS), Computer Science, SAP Business Intelligence (BI), SAP, Google BigQuery, Data Architecture, Big Data Architecture, Data Build Tool (dbt)
Frameworks
Windows PowerShell, Spark, Apache Spark
Libraries/APIs
PySpark
Tools
AWS Glue, Amazon Elastic MapReduce (EMR), Microsoft Power BI, Apache Airflow
Platforms
Azure, Amazon Web Services (AWS), Amazon EC2, Docker, AWS Lambda
Education
Bachelor's Degree in Computer Science
Shenzhen Open University - Shenzhen China
Certifications
Apache Spark Big Data and Python
Databricks, Inc.
MCSE - Data Management and Analytics
Microsoft
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring