Shawn Xiao, Developer in Auckland, New Zealand
Shawn is available for hire
Hire Shawn

Shawn Xiao

Verified Expert  in Engineering

Big Data Developer

Location
Auckland, New Zealand
Toptal Member Since
November 20, 2020

Shawn has been working on data management and data analytics for different industries for the last 15 years. He is also a Microsoft Certified Solutions Expert for data management and analytics, familiar with various technologies such as Azure, AWS, big data, Spark, SQL, Hadoop, BI, DW, and Tableau. Shawn has strong problem-solving and root cause analysis skills.

Portfolio

BitAlpha, Inc.
Google BigQuery, Apache Spark, SQL, Elasticsearch, TypeScript, Data Warehousing...
Woolworths New Zealand
Docker, Windows PowerShell, SQL, Data Engineering, PostgreSQL, Amazon EC2
Plexure
Amazon Web Services (AWS), Azure Data Factory, Azure SQL Databases, Redshift...

Experience

Availability

Full-time

Preferred Environment

Amazon Web Services (AWS), Azure, Big Data, Python, SQL

The most amazing...

...thing I've optimized and improved is an E2E overnight data process solution, reducing the time from 16 hours to 4.5 hours.

Work Experience

Data Engineer

2021 - 2022
BitAlpha, Inc.
  • Set up and configured a big data processing platform using Databricks on Google Cloud. Designed, built, and maintained data pipelines using Python to process data from GCS, Ethereum, and PostgreSQL to Delta Data warehouse, data lake, and BigQuery.
  • Designed and implemented efficient and scalable data models to support reporting, ensuring data quality and integrity by implementing data validation, cleansing, and transformation processes.
  • Monitored and optimized the performance of data systems to ensure efficient and reliable data processing. Developed and maintained documentation for data systems and processes to ensure their scalability and sustainability over time.
  • Collaborated with developers to understand their data needs and provided data solutions to support their work.
Technologies: Google BigQuery, Apache Spark, SQL, Elasticsearch, TypeScript, Data Warehousing, Reporting, Data Architecture, Big Data, Big Data Architecture, Amazon Web Services (AWS), Apache Airflow, Amazon S3 (AWS S3), Amazon Elastic MapReduce (EMR), AWS Glue, AWS Lambda, Azure Databricks, Data Engineering, Spark, PostgreSQL, Amazon EC2

Senior Database Specialist

2020 - 2021
Woolworths New Zealand
  • Upgraded and migrated an on-premise SQL Server instance to Azure Virtual Machines to enable the server to scale up for a better user shopping online experience for about two million customers.
  • Conducted optimization and performance tuning of the SQL Server instance. Refactored a query to reduce the execution time of the query from two minutes to 20 seconds.
  • Containerized the SQL database instance to increase the developers' productivity.
Technologies: Docker, Windows PowerShell, SQL, Data Engineering, PostgreSQL, Amazon EC2

Senior Data Engineer

2019 - 2020
Plexure
  • Optimized the E2E data process and pipeline to reduce the overnight load from 16.5 hours to 4.5 hours, enabling presenting the reports to our key external business partners.
  • Designed and developed data pipelines on Azure using Azure Data Factory to process 200 million users' sales data in order to meet the business's needs.
  • Designed and developed data pipelines running on AWS using Pyspark, Glue, EMR, Lambda, and Redshift to process mobile app data for the key external customers.
Technologies: Amazon Web Services (AWS), Azure Data Factory, Azure SQL Databases, Redshift, AWS Lambda, Amazon Elastic MapReduce (EMR), AWS Glue, PySpark, SQL, Azure, Data Engineering, Spark, PostgreSQL, Amazon EC2

Senior Managed Services BI Consultant

2016 - 2019
Altis Consulting
  • Developed the data pipelines to extract, load, and transform 1,000,000 sales transaction data daily from CSV to Redshift database using AWS EMR and Lambda for a fast-food channel brand.
  • Developed the data pipelines to extract, load, and transform signal data every five minutes sent via the signal tower API to the Azure SQL database, using Azure Data Factory to track the performance of the towers countrywide.
  • Developed and maintained business intelligence solutions running on SSIS, SAP Data Services, and IBM Cognos for different companies within industries such as utilities, university, and transportation.
Technologies: Azure Data Factory, ETL Tools, SQL Server Integration Services (SSIS), Python, Amazon Elastic MapReduce (EMR), AWS Lambda, SQL, Azure, Data Engineering, Spark

Database Developer

2013 - 2016
Fisher & Paykel Healthcare Corporation, Ltd.
  • Planned, designed, and implemented the SQL Server Enterprise platform (2012, 2014) to support $100 million business growth for the next three years.
  • Designed and implemented a backup and restore strategy, data security strategy, data storage strategy, high availability (HA), and disaster recovery (DR) strategy.
  • Monitored, troubleshot, and optimized the database system running on MSSQL Server (2000, 2005, 2008, 2012, 2014) and Azure SQL.
Technologies: SAP BusinessObjects Data Service (BODS), SQL Server 2012, SQL

BI Developer | Data Warehouse Developer

2010 - 2013
Microsoft
  • Built and implemented different BI solutions for WoS reports, which helped the VP of the department track the inventory and forecast some products like Xbox One or Surface.
  • Designed and implemented an E2E solution for a system to help supplier chain users track sales and delivery status for MS products (Surface). Designed and developed extraction, transformation, and load (ETL) to support data integration needs.
  • Developed and optimized the database applications in SQL Server. Performed administrative tasks of the BI and ETL tools and assisted in the deployment of the application code.
Technologies: Business Intelligence (BI), SQL Server 2012, SQL

Data Pipeline Development

Designed and developed a data pipeline to extract, load, and transform data from Azure to AWS.

Developed the data pipeline using Azure Data Factory to extract data from Azure Table storage and land it to Azure Blob storage. Azure Function triggered to copy it for AWS S3, and AWS Glue job triggered Lambda to write to the S3 sink bucket, resulting in an overnight job picking up the data and writing it into the Redshift database.

Languages

SQL, Python, TypeScript, Snowflake

Paradigms

ETL, Database Development, Business Intelligence (BI)

Storage

SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), Redshift, Azure SQL Databases, Amazon S3 (AWS S3), Data Pipelines, PostgreSQL, Azure Table Storage, SQL Server 2012, SQL Server 2014, Elasticsearch, Database Modeling

Other

Data Management, Performance Tuning, Big Data, Azure Data Factory, ETL Tools, Data Warehousing, Data Engineering, APIs, Data Modeling, Reporting, Data Analytics, Azure Data Lake, Data Warehouse Design, Azure Databricks, SAP BusinessObjects Data Service (BODS), Computer Science, SAP Business Intelligence (BI), SAP, Google BigQuery, Data Architecture, Big Data Architecture, Data Build Tool (dbt)

Frameworks

Windows PowerShell, Spark, Apache Spark

Libraries/APIs

PySpark

Tools

AWS Glue, Amazon Elastic MapReduce (EMR), Microsoft Power BI, Apache Airflow

Platforms

Azure, Amazon Web Services (AWS), Amazon EC2, Docker, AWS Lambda

2006 - 2009

Bachelor's Degree in Computer Science

Shenzhen Open University - Shenzhen China

MARCH 2020 - PRESENT

Apache Spark Big Data and Python

Databricks, Inc.

SEPTEMBER 2017 - PRESENT

MCSE - Data Management and Analytics

Microsoft

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring