Bin Wang, Data Engineer and Developer in Melbourne, Victoria, Australia
Bin Wang

Data Engineer and Developer in Melbourne, Victoria, Australia

Member since December 8, 2021
With 15 years of experience working in the data domain, Bin is passionate about all things data. He has played senior roles in consultancies like McKinsey and Ernst and Young, focusing on building solutions on AWS and Azure. Besides cloud services, he has worked on Databricks, Snowflake, dbt, and Airflow. Bin has spent more than ten years on data modeling and warehouse. He has a rich experience in software engineering with Python and Docker in a machine learning capacity.
Bin is now available for hire


  • carsales
    DBT, Snowflake, Terraform, GitHub Action, Docker, Python, Apache Airflow...
  • AusNet Services
    Azure, Azure Data Factory, Databricks, Azure Synapse, Docker, Python...
  • Officeworks
    AWS Athena, AWS DynamoDB, AWS RDS, Apache Airflow, Databricks, Snowflake...



Melbourne, Victoria, Australia



Preferred Environment

MacOS, Linux, Python, AWS, Azure, Databricks, Snowflake

The most amazing... I've recently built is a reusable spatial data analysis framework on Databricks.


  • Senior Data Engineer

    2021 - PRESENT
    • Set up dbt projects for Redshift and Snowflake to enable both local executions using Docker and execution on dbt Cloud.
    • Set up an Infrastructure as Code project for Snowflake using Terraform and CI/CD pipelines using Github Actions to enable automated and repeatable resource deployment.
    • Proposed and built role-based access control in Snowflake.
    • Designed and built various data pipelines to support data transfer and transformation in AWS and GCP.
    • Built an extensible solution to monitor common failures and alert team members. This greatly improves system observability and increases team ownership.
    Technologies: DBT, Snowflake, Terraform, GitHub Action, Docker, Python, Apache Airflow, Redshift, AWS ECS, AWS DynamoDB, Google BigQuery, Google Cloud Storage
  • Senior Data Engineer

    2021 - 2021
    AusNet Services
    • Designed and built reusable Azure Data Factory pipeline patterns, from Sharepoint to storage account and transformation on Databricks.
    • Designed and built spatial data processing framework and practice on Databricks.
    • Mapped out patterns of integrating Azure Machine Learning with data platform, including storage accounts, Azure Databricks, and Synapse dedicated SQL pool.
    • Drafted a Synapse data warehouse design to integrate Azure Machine Learning and a Python application on Azure Kubernetes Services.
    Technologies: Azure, Azure Data Factory, Databricks, Azure Synapse, Docker, Python, Azure Machine Learning
  • Data Platform Delivery Lead

    2020 - 2021
    • Led a team of five data and cloud engineers to deliver a data platform from scratch.
    • Designed and implemented key components of a data platform.
    • Reviewed all solutions to ensure architectural standards were met.
    • Conducted design workshops with implementation and technology partners.
    • Worked with internal teams to standardize and establish usage patterns of the platform.
    • Ramped up data analytics team capabilities by building DevOps standards and cross-team knowledge sharing.
    Technologies: AWS Athena, AWS DynamoDB, AWS RDS, Apache Airflow, Databricks, Snowflake, Jenkins, Python
  • Principal (Junior) Data Engineer

    2018 - 2020
    McKinsey & Company
    • Delivered a large-scale machine learning project to automate the decision-making of plant operations at a mining client.
    • Designed ETL pipeline architecture, integration strategy, and end-to-end monitoring solution for a multi-tier machine learning application.
    • Led data management and ETL activities in multiple machine learning projects.
    • Contributed to building firm-wide reusable assets, including application frameworks for data engineers and scientists.
    Technologies: Python, Pandas, Docker, Spark ML, Scala, AWS
  • Data Analytics Manager

    2017 - 2018
    • Single-handedly migrated 15 on-premise reports to data pipelines in Azure.
    • Liaised with multiple finance subsidiaries to define a unified strategy for data consolidation and reporting based on SAP S/4HANA.
    • Designed and led the development of an end-to-end data warehouse and reporting solution to consolidate financial statements of all four major subsidiaries for the first time at a client.
    • Engaged in presales and won the bid proposal on a reporting transformation project.
    Technologies: Azure Data Factory, Azure SQL Data Warehouse (SQL DW), C#, Python, SAP BW on HANA, AWS Batch
  • Senior Data Warehouse Developer

    2013 - 2017
    Australia Post
    • Led a team of five developers to design and build NIM, the largest data warehouse on SAP HANA in Australia.
    • Built a custom data management framework in SAP HANA purely based on SQL. This provided a robust and simplified interface for developers and support.
    • Continuously improved the performance of NIM to support 10 million data points per day and more than 50 reports.
    Technologies: SAP BW, SAP HANA, Data Warehousing, ETL, SQL
  • Senior BI Consultant

    2010 - 2014
    Innogence Limited
    • Built a data warehousing and reporting solution for an SAP HR system, including employee, leave, and payroll.
    • Developed a data warehousing and reporting solution for Australia's largest SAP logistics user.
    • Created a data warehousing and reporting solution for an SAP sales and distribution system, including purchasing, sales, and delivery.
    Technologies: SAP BW, SAP HANA, Data Warehousing
  • BI Consultant

    2007 - 2010
    • Single-handedly built a data warehousing and reporting solution for an SAP CRM system, including customer interactions, service incidents, and customer data.
    • Built heavily custom data extractors in ABAP for an SAP logistics system.
    • Led two consultants to remotely support the ETL and reporting for an SAP finance system.
    Technologies: SAP BW, SAP, ABAP
  • Software Engineer

    2003 - 2007
    IBM Singapore
    • Designed and built an IBM order status online site using Spring.
    • Built the terms and conditions section of the IBM Expressed Management Services site.
    • Supported a partner software lab on internal web projects.
    Technologies: Java, JavaScript, IBM Db2, IBM WebSphere, Apache Tomcat


  • Asset Risk Management

    ARM aims to produce machine learning models to predict when assets will fail.

    As the solution designer and lead data engineer, I designed data access and load patterns that integrate with the machine learning solutions, including:
    • Reusable Azure Data Factory pipelines that load data from Sharepoint to an Azure storage account, with custom schema evolution governance
    • Reusable Azure Data Factory pipelines that perform feature engineering on data in Databricks Delta Lake, supporting both full and incremental options
    • Data warehouse design—a Synapse-dedicated SQL pool—to store and serve machine learning outputs
    • Spatial data processing framework on Databricks, including spatial libraries recommendation, installation process involving Azure Container Registry, a custom Python library for spatial transformation logic, and visualization options.

  • Officeworks Data Analytics Platform

    I led five engineers to build the data platform from scratch.

    As the technical lead, I was responsible for designing and building key components, including a data lake on S3, a Snowflake data model, a Databricks spark job, Airflow pipelines, and integrations of various components.

    I also ensured critical non-functional requirements were met, including:
    • Logging and monitoring—integration of Airflow with Sumologic and Datadog
    • Alerting (integration with Xmatters)
    • Snowflake role-based access control design
    • Databricks security design

    To help build an engineering culture in the organization, I promoted community best practices in a few areas, including CI/CD and Python project set up.

  • Alice — Machine Learning Empowered Pharmaceutical Project

    The project aims to find the most effective peptides that carry drugs to the target cells by applying machine learning techniques.

    Highlights of my achievements:
    • Designed and built an end-to-end data pipeline based on a project customized version of Kedro (
    • Iteratively optimized feature engineering logic to efficiently process 70 million data points
    • Programmatically generated synthetic peptides by reverse engineering best-known peptides. The result was so inspiring that it was synthesized and tested in the lab


  • Languages

    Python, Snowflake, C#, SQL, ABAP, Java, JavaScript
  • Frameworks

    Spark, Apache Spark
  • Libraries/APIs

    Pandas, Spark ML
  • Tools

    AWS Athena, Apache Airflow, Azure Machine Learning, Jenkins, AWS Batch, Apache Tomcat, Terraform, AWS ECS
  • Platforms

    MacOS, Linux, Windows, Azure, Databricks, Docker, Amazon Web Services (AWS), SAP HANA, IBM WebSphere
  • Storage

    Data Pipelines, AWS DynamoDB, IBM Db2, Redshift, Google Cloud Storage
  • Other

    AWS, Azure Data Factory, Azure SQL Data Warehouse (SQL DW), SAP BW on HANA, Data Warehouse Design, Data Engineering, Azure Data Lake, Azure Synapse, AWS RDS, APIs, Message Queues, Machine Learning, SAP BW, Data Warehousing, SAP, DBT, GitHub Action, Google BigQuery
  • Paradigms

    Data Science, DevOps, ETL


  • Bachelor's Degree in Computer Science
    1999 - 2003
    National University of Singapore - Singapore


  • Microsoft Certified: Azure Data Scientist Associate
    JULY 2021 - PRESENT
  • Microsoft Certified: Azure Data Engineer Associate
    JUNE 2021 - JUNE 2023
  • AWS Certified Developer – Associate
    MAY 2019 - PRESENT
  • CCA Spark and Hadoop Developer

To view more profiles

Join Toptal
Share it with others