Verified Expert in Engineering
Data Engineer and Developer
With 15 years of experience working in the data domain, Bin is passionate about all things data. He has played senior roles in consultancies like McKinsey and Ernst and Young, focusing on building solutions on AWS and Azure. Besides cloud services, he has worked on Databricks, Snowflake, dbt, and Airflow. Bin has spent more than ten years on data modeling and warehouse. He has a rich experience in software engineering with Python and Docker in a machine learning capacity.
MacOS, Linux, Python, Amazon Web Services (AWS), Azure, Databricks, Snowflake
The most amazing...
...design I've recently built is a reusable spatial data analysis framework on Databricks.
Senior Data Engineer
- Set up dbt projects for Redshift and Snowflake to enable both local executions using Docker and execution on dbt Cloud.
- Set up an Infrastructure as Code project for Snowflake using Terraform and CI/CD pipelines using Github Actions to enable automated and repeatable resource deployment.
- Proposed and built role-based access control in Snowflake.
- Designed and built various data pipelines to support data transfer and transformation in AWS and GCP.
- Built an extensible solution to monitor common failures and alert team members. This greatly improves system observability and increases team ownership.
Senior Data Engineer
- Designed and built reusable Azure Data Factory pipeline patterns, from Sharepoint to storage account and transformation on Databricks.
- Designed and built spatial data processing framework and practice on Databricks.
- Mapped out patterns of integrating Azure Machine Learning with data platform, including storage accounts, Azure Databricks, and Synapse dedicated SQL pool.
- Drafted a Synapse data warehouse design to integrate Azure Machine Learning and a Python application on Azure Kubernetes Services.
Data Platform Delivery Lead
- Led a team of five data and cloud engineers to deliver a data platform from scratch.
- Designed and implemented key components of a data platform.
- Reviewed all solutions to ensure architectural standards were met.
- Conducted design workshops with implementation and technology partners.
- Worked with internal teams to standardize and establish usage patterns of the platform.
- Ramped up data analytics team capabilities by building DevOps standards and cross-team knowledge sharing.
Principal (Junior) Data Engineer
McKinsey & Company
- Delivered a large-scale machine learning project to automate the decision-making of plant operations at a mining client.
- Designed ETL pipeline architecture, integration strategy, and end-to-end monitoring solution for a multi-tier machine learning application.
- Led data management and ETL activities in multiple machine learning projects.
- Contributed to building firm-wide reusable assets, including application frameworks for data engineers and scientists.
Data Analytics Manager
- Single-handedly migrated 15 on-premise reports to data pipelines in Azure.
- Liaised with multiple finance subsidiaries to define a unified strategy for data consolidation and reporting based on SAP S/4HANA.
- Designed and led the development of an end-to-end data warehouse and reporting solution to consolidate financial statements of all four major subsidiaries for the first time at a client.
- Engaged in presales and won the bid proposal on a reporting transformation project.
Senior Data Warehouse Developer
- Led a team of five developers to design and build NIM, the largest data warehouse on SAP HANA in Australia.
- Built a custom data management framework in SAP HANA purely based on SQL. This provided a robust and simplified interface for developers and support.
- Continuously improved the performance of NIM to support 10 million data points per day and more than 50 reports.
Senior BI Consultant
- Built a data warehousing and reporting solution for an SAP HR system, including employee, leave, and payroll.
- Developed a data warehousing and reporting solution for Australia's largest SAP logistics user.
- Created a data warehousing and reporting solution for an SAP sales and distribution system, including purchasing, sales, and delivery.
- Single-handedly built a data warehousing and reporting solution for an SAP CRM system, including customer interactions, service incidents, and customer data.
- Built heavily custom data extractors in ABAP for an SAP logistics system.
- Led two consultants to remotely support the ETL and reporting for an SAP finance system.
- Designed and built an IBM order status online site using Spring.
- Built the terms and conditions section of the IBM Expressed Management Services site.
- Supported a partner software lab on internal web projects.
Asset Risk Management
As the solution designer and lead data engineer, I designed data access and load patterns that integrate with the machine learning solutions, including:
• Reusable Azure Data Factory pipelines that load data from Sharepoint to an Azure storage account, with custom schema evolution governance
• Reusable Azure Data Factory pipelines that perform feature engineering on data in Databricks Delta Lake, supporting both full and incremental options
• Data warehouse design—a Synapse-dedicated SQL pool—to store and serve machine learning outputs
• Spatial data processing framework on Databricks, including spatial libraries recommendation, installation process involving Azure Container Registry, a custom Python library for spatial transformation logic, and visualization options.
Officeworks Data Analytics Platform
As the technical lead, I was responsible for designing and building key components, including a data lake on S3, a Snowflake data model, a Databricks spark job, Airflow pipelines, and integrations of various components.
I also ensured critical non-functional requirements were met, including:
• Logging and monitoring—integration of Airflow with Sumologic and Datadog
• Alerting (integration with Xmatters)
• Snowflake role-based access control design
• Databricks security design
To help build an engineering culture in the organization, I promoted community best practices in a few areas, including CI/CD and Python project set up.
Alice — Machine Learning Empowered Pharmaceutical Project
Highlights of my achievements:
• Designed and built an end-to-end data pipeline based on a project customized version of Kedro (https://github.com/quantumblacklabs/kedro)
• Iteratively optimized feature engineering logic to efficiently process 70 million data points
• Programmatically generated synthetic peptides by reverse engineering best-known peptides. The result was so inspiring that it was synthesized and tested in the lab
Spark, Apache Spark
Pandas, Spark ML
Amazon Athena, Apache Airflow, Azure Machine Learning, Jenkins, AWS Batch, Apache Tomcat, Terraform, Amazon Elastic Container Service (Amazon ECS), GitHub
ETL, Data Science, DevOps
MacOS, Linux, Windows, Azure, Databricks, Docker, Azure SQL Data Warehouse, Amazon Web Services (AWS), Dedicated SQL Pool (formerly SQL DW), Azure Synapse, SAP HANA, IBM WebSphere
Data Pipelines, Amazon DynamoDB, IBM Db2, Redshift, Google Cloud Storage
Azure Data Factory, SAP BW on HANA, Data Warehouse Design, Data Warehousing, Data Engineering, Azure Data Lake, Data Build Tool (dbt), Data Cleaning, Data Aggregation, Amazon RDS, APIs, Message Queues, Machine Learning, SAP Business Warehouse (BW), SAP, GitHub Actions, Google BigQuery
Bachelor's Degree in Computer Science
National University of Singapore - Singapore
Microsoft Certified: Azure Data Scientist Associate
Microsoft Azure Data Engineer Associate
AWS Certified Developer Associate
CCA Spark and Hadoop Developer
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.Start hiring