Bin is available for hire

Bin Wang

Verified Expert in Engineering

Data Engineer and Developer

Location

Melbourne, Victoria, Australia

Toptal Member Since

December 14, 2021

With 15 years of experience working in the data domain, Bin is passionate about all things data. He has played senior roles in consultancies like McKinsey and Ernst and Young, focusing on building solutions on AWS and Azure. Besides cloud services, he has worked on Databricks, Snowflake, dbt, and Airflow. Bin has spent more than ten years on data modeling and warehouse. He has a rich experience in software engineering with Python and Docker in a machine learning capacity.

Data Warehouse Design Data Warehousing Data Cleaning Data Engineering Azure Data Lake SQL ETL Linux Python Docker Data Pipelines Amazon Web Services (AWS)Pandas Databricks Amazon Athena DBT (Data Build Tool) Azure Synapse ABAP AWS Data Pipelines

Portfolio

carsales

Data Build Tool (dbt), Snowflake, Terraform, GitHub Actions, Docker, Python...

AusNet Services

Azure, Azure Data Factory, Databricks, Azure Synapse, Docker, Python...

Officeworks

Amazon Athena, Amazon DynamoDB, Amazon RDS, Apache Airflow, Databricks...

Experience

Data Warehouse Design - 15 years Python - 5 years Docker - 4 years Snowflake - 3 years Azure - 3 years Databricks - 2 years Apache Airflow - 2 years

Availability

Full-time

Preferred Environment

MacOS, Linux, Python, Amazon Web Services (AWS), Azure, Databricks, Snowflake

The most amazing...

...design I've recently built is a reusable spatial data analysis framework on Databricks.

Work Experience

Senior Data Engineer

2021 - PRESENT

carsales

Set up dbt projects for Redshift and Snowflake to enable both local executions using Docker and execution on dbt Cloud.
Set up an Infrastructure as Code project for Snowflake using Terraform and CI/CD pipelines using Github Actions to enable automated and repeatable resource deployment.
Proposed and built role-based access control in Snowflake.
Designed and built various data pipelines to support data transfer and transformation in AWS and GCP.
Built an extensible solution to monitor common failures and alert team members. This greatly improves system observability and increases team ownership.

Technologies: Data Build Tool (dbt), Snowflake, Terraform, GitHub Actions, Docker, Python, Apache Airflow, Redshift, Amazon Elastic Container Service (Amazon ECS), Amazon DynamoDB, Google BigQuery, Google Cloud Storage, Data Engineering, APIs, GitHub, Data Cleaning, Data Aggregation

Senior Data Engineer

2021 - 2021

AusNet Services

Designed and built reusable Azure Data Factory pipeline patterns, from Sharepoint to storage account and transformation on Databricks.
Designed and built spatial data processing framework and practice on Databricks.
Mapped out patterns of integrating Azure Machine Learning with data platform, including storage accounts, Azure Databricks, and Synapse dedicated SQL pool.
Drafted a Synapse data warehouse design to integrate Azure Machine Learning and a Python application on Azure Kubernetes Services.

Technologies: Azure, Azure Data Factory, Databricks, Azure Synapse, Docker, Python, Azure Machine Learning, GitHub, Data Cleaning, Data Aggregation

Data Platform Delivery Lead

2020 - 2021

Officeworks

Led a team of five data and cloud engineers to deliver a data platform from scratch.
Designed and implemented key components of a data platform.
Reviewed all solutions to ensure architectural standards were met.
Conducted design workshops with implementation and technology partners.
Worked with internal teams to standardize and establish usage patterns of the platform.
Ramped up data analytics team capabilities by building DevOps standards and cross-team knowledge sharing.

Technologies: Amazon Athena, Amazon DynamoDB, Amazon RDS, Apache Airflow, Databricks, Snowflake, Jenkins, Python, GitHub

Principal (Junior) Data Engineer

2018 - 2020

McKinsey & Company

Delivered a large-scale machine learning project to automate the decision-making of plant operations at a mining client.
Designed ETL pipeline architecture, integration strategy, and end-to-end monitoring solution for a multi-tier machine learning application.
Led data management and ETL activities in multiple machine learning projects.
Contributed to building firm-wide reusable assets, including application frameworks for data engineers and scientists.

Technologies: Python, Pandas, Docker, Spark ML, Amazon Web Services (AWS), GitHub

Data Analytics Manager

2017 - 2018

EY

Single-handedly migrated 15 on-premise reports to data pipelines in Azure.
Liaised with multiple finance subsidiaries to define a unified strategy for data consolidation and reporting based on SAP S/4HANA.
Designed and led the development of an end-to-end data warehouse and reporting solution to consolidate financial statements of all four major subsidiaries for the first time at a client.
Engaged in presales and won the bid proposal on a reporting transformation project.

Technologies: Azure Data Factory, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), C#, Python, SAP BW on HANA, AWS Batch

Senior Data Warehouse Developer

2013 - 2017

Australia Post

Led a team of five developers to design and build NIM, the largest data warehouse on SAP HANA in Australia.
Built a custom data management framework in SAP HANA purely based on SQL. This provided a robust and simplified interface for developers and support.
Continuously improved the performance of NIM to support 10 million data points per day and more than 50 reports.

Technologies: SAP Business Warehouse (BW), SAP HANA, Data Warehousing, ETL, SQL

Senior BI Consultant

2010 - 2014

Innogence Limited

Built a data warehousing and reporting solution for an SAP HR system, including employee, leave, and payroll.
Developed a data warehousing and reporting solution for Australia's largest SAP logistics user.
Created a data warehousing and reporting solution for an SAP sales and distribution system, including purchasing, sales, and delivery.

Technologies: SAP Business Warehouse (BW), SAP HANA, Data Warehousing

BI Consultant

2007 - 2010

ECENTA

Single-handedly built a data warehousing and reporting solution for an SAP CRM system, including customer interactions, service incidents, and customer data.
Built heavily custom data extractors in ABAP for an SAP logistics system.
Led two consultants to remotely support the ETL and reporting for an SAP finance system.

Technologies: SAP Business Warehouse (BW), SAP, ABAP

Software Engineer

2003 - 2007

IBM Singapore

Designed and built an IBM order status online site using Spring.
Built the terms and conditions section of the IBM Expressed Management Services site.
Supported a partner software lab on internal web projects.

Technologies: Java, JavaScript, IBM Db2, IBM WebSphere, Apache Tomcat

Experience

Asset Risk Management

ARM aims to produce machine learning models to predict when assets will fail.

As the solution designer and lead data engineer, I designed data access and load patterns that integrate with the machine learning solutions, including:
• Reusable Azure Data Factory pipelines that load data from Sharepoint to an Azure storage account, with custom schema evolution governance
• Reusable Azure Data Factory pipelines that perform feature engineering on data in Databricks Delta Lake, supporting both full and incremental options
• Data warehouse design—a Synapse-dedicated SQL pool—to store and serve machine learning outputs
• Spatial data processing framework on Databricks, including spatial libraries recommendation, installation process involving Azure Container Registry, a custom Python library for spatial transformation logic, and visualization options.

Officeworks Data Analytics Platform

I led five engineers to build the data platform from scratch.

As the technical lead, I was responsible for designing and building key components, including a data lake on S3, a Snowflake data model, a Databricks spark job, Airflow pipelines, and integrations of various components.

I also ensured critical non-functional requirements were met, including:
• Logging and monitoring—integration of Airflow with Sumologic and Datadog
• Alerting (integration with Xmatters)
• Snowflake role-based access control design
• Databricks security design

To help build an engineering culture in the organization, I promoted community best practices in a few areas, including CI/CD and Python project set up.

Alice — Machine Learning Empowered Pharmaceutical Project

The project aims to find the most effective peptides that carry drugs to the target cells by applying machine learning techniques.

Highlights of my achievements:
• Designed and built an end-to-end data pipeline based on a project customized version of Kedro (https://github.com/quantumblacklabs/kedro)
• Iteratively optimized feature engineering logic to efficiently process 70 million data points
• Programmatically generated synthetic peptides by reverse engineering best-known peptides. The result was so inspiring that it was synthesized and tested in the lab

Skills

Languages

Python, SQL, Snowflake, C#, ABAP, Java, JavaScript

Frameworks

Spark, Apache Spark

Libraries/APIs

Pandas, Spark ML

Tools

Amazon Athena, Apache Airflow, Azure Machine Learning, Jenkins, AWS Batch, Apache Tomcat, Terraform, Amazon Elastic Container Service (Amazon ECS), GitHub

Paradigms

ETL, Data Science, DevOps

Platforms

MacOS, Linux, Windows, Azure, Databricks, Docker, Azure SQL Data Warehouse, Amazon Web Services (AWS), Dedicated SQL Pool (formerly SQL DW), Azure Synapse, SAP HANA, IBM WebSphere

Storage

Data Pipelines, Amazon DynamoDB, IBM Db2, Redshift, Google Cloud Storage

Other

Azure Data Factory, SAP BW on HANA, Data Warehouse Design, Data Warehousing, Data Engineering, Azure Data Lake, Data Build Tool (dbt), Data Cleaning, Data Aggregation, Amazon RDS, APIs, Message Queues, Machine Learning, SAP Business Warehouse (BW), SAP, GitHub Actions, Google BigQuery

Education

1999 - 2003

Bachelor's Degree in Computer Science

National University of Singapore - Singapore

Certifications

JULY 2021 - PRESENT

Microsoft Certified: Azure Data Scientist Associate

Microsoft

JUNE 2021 - JUNE 2023

Microsoft Azure Data Engineer Associate

Microsoft

MAY 2019 - MAY 2022

AWS Certified Developer Associate

AWS

SEPTEMBER 2017 - PRESENT

CCA Spark and Hadoop Developer

Cloudera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring