Zhihao (Alex) Zhong, Developer in Toronto, ON, Canada
Zhihao is available for hire
Hire Zhihao

Zhihao (Alex) Zhong

Verified Expert  in Engineering

Data Engineer and Developer

Location
Toronto, ON, Canada
Toptal Member Since
April 3, 2023

Alex is a senior technical data engineer whose areas of expertise include ETL pipeline design, performance tuning, metadata management, data modeling, data warehousing, business intelligence design, data profiling, and data visualization. He has helped an eCommerce company, Loblaw Digital, architect a data pipeline from a microservice back end to a data warehouse in GCP as part of the data science platform. Alex manages more than 2,000 tables and moves more than 2TB of data daily in real time.

Portfolio

Loblaw Digital
SQL, Python 3, Google Cloud Platform (GCP), Google BigQuery, GitLab, Docker...
Microsoft
Azure Data Factory, Azure Synapse, Azure Databricks...
Microsoft
SQL Server BI, SQL Server Integration Services (SSIS), Microsoft Power BI...

Experience

Availability

Part-time

Preferred Environment

PyCharm

The most amazing...

...system I've architected and implemented from end to end is a real-time ELT pipeline in Google Cloud Platform (GCP) that reduced latency by 80%.

Work Experience

Senior Data Engineer

2020 - PRESENT
Loblaw Digital
  • Worked as a senior member of the data engineering team to build a configurable real-time replication pipeline capable of handling large-scale and muti-source data in GCP.
  • Developed and enhanced features in a custom Python BI framework for ETL/ELT batch jobs.
  • Configured CI/CD pipelines and Docker images for code deployment in a project repository and GitLab.
  • Managed two junior data engineers and performed as a solution architect. Conducted peer coding and code review as a senior engineer.
Technologies: SQL, Python 3, Google Cloud Platform (GCP), Google BigQuery, GitLab, Docker, Relational Databases, PySpark, Database Modeling, ELT, Performance Tuning, Data Modeling, Data Warehousing, Data Profiling, Data Visualization, Google Cloud Composer

Cloud Data Engineer

2020 - 2020
Microsoft
  • Migrated data to Azure Synapse from an on-premise Microsoft SQL Server system regarding data from the Azure marketing team.
  • Collaborated in designing ETL pipelines with Azure stacks, including Azure Data Factory, a lift-and-shift SQL Server Integration Services (SSIS) package, Databricks, and Synapse.
  • Implemented designs with two other senior engineers and migrated ETL and all downstream reports in Azure.
Technologies: Azure Data Factory, Azure Synapse, Azure Databricks, Microsoft Data Transformation Services (now SSIS), SQL Server Integration Services (SSIS), Relational Databases, Azure, Database Modeling, ELT, Performance Tuning, Data Modeling, Data Warehousing, Data Profiling, Data Visualization, SQL

Database Administrator

2019 - 2020
Microsoft
  • Constructed database architecture for partner investments and KPIs.
  • Created and performed ETL with the SSIS package for data updates.
  • Exported data to Microsoft Power BI for reporting using the direct query mode.
Technologies: SQL Server BI, SQL Server Integration Services (SSIS), Microsoft Power BI, Database Modeling, Relational Databases, ELT, Performance Tuning, Data Modeling, Data Warehousing, Data Profiling, Data Visualization, SQL

Data Engineer

2019 - 2020
CAMH
  • Implemented an ETL and data warehousing solution with the Hadoop environment and helped the company migrate the data warehouse from IBM Db2 to Apache Hive for the neuroinformatics platform.
  • Constructed data pipelines in Apache NiFi to perform real-time ETL and ELT.
  • Created dashboards for research study in Spotfire.
Technologies: HDFS, Apache NiFi, Apache Hive, IBM Db2, ETL, Hadoop, Spotfire, Relational Databases, Database Modeling, ELT, Performance Tuning, Data Modeling, Data Warehousing, Data Profiling, Data Visualization, SQL

Database Administrator

2018 - 2019
The Bargains Group
  • Implemented and managed the CRM system and Microsoft SQL Server to produce reports for marketing and sales.
  • Updated the opt-out and hard-bounce email list in Microsoft SQL Server and generated a targeting email list for marketing email pieces.
  • Set up and tested automatic email campaigns in CRM.
  • Reviewed and tested the connection between the website back end and the CRM system.
Technologies: SQL Server BI, SQL Server Integration Services (SSIS), ETL, Tableau, Microsoft SQL Server, Relational Databases, Database Modeling, ELT, Performance Tuning, Data Modeling, Data Warehousing, Data Profiling, Data Visualization, SQL

Dataflow Architecture in GCP

This project involved building subscriptions on the Pub/Sub topics open from a microservice back end. I developed pipelines using Apache Beam and Flex templates and then submitted Flex templates into the Cloud Dataflow runner for execution.

PySpark Jobs in GCP

Developed PySpark jobs by transferring business logic to Spark SQL. Also, I imported a job file to a GCS bucket and submitted it with Cloud Composer operators. I monitored logs from jobs with BigQuery and Looker dashboards.

Migration of a Data Pipeline and Data Warehouse to Azure Cloud

Configured Azure Data Factory and Lift and Shift SSIS package to move data from the on-premise Microsoft SQL server to Azure Synapse. I wrote the PySpark job in Databricks and configured the connector in Azure Data Factory to run the job with a schedule. I migrated PowerBI reports built on an on-premise database to Azure Synapse as a new source.
2013 - 2017

Bachelor of Mathematics in Actuarial Science and Statistics

University of Waterloo - Waterloo, Ontario

SEPTEMBER 2021 - SEPTEMBER 2023

Professional Machine Learning Engineer

Google Cloud

JUNE 2021 - JUNE 2023

GCP Professional Data Engineer

Google Cloud

MARCH 2020 - MARCH 2021

Microsoft Certified: Azure Data Engineer Associate

Microsoft

Libraries/APIs

PySpark

Tools

PyCharm, Cloud Dataflow, BigQuery, Apache Beam, Google Cloud Composer, Apache NiFi, SQL Server BI, Microsoft Power BI, Tableau, Google Cloud Dataproc, GitLab, Spotfire

Languages

SQL, Python 3, Java

Storage

Relational Databases, HDFS, Apache Hive, IBM Db2, SQL Server Integration Services (SSIS), Database Modeling, Microsoft SQL Server

Paradigms

ETL

Platforms

Google Cloud Platform (GCP), Azure, Azure Synapse, Databricks, Docker

Frameworks

Hadoop

Other

Google BigQuery, Pub/Sub, Dataproc, Azure Data Factory, Azure Databricks, Microsoft Data Transformation Services (now SSIS), Azure Stream Analytics, Azure Data Lake, ELT, Performance Tuning, Data Modeling, Data Warehousing, Data Profiling, Data Visualization, Data Engineering

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring