Wenlong Dong, Developer in Sydney, Australia
Wenlong is available for hire
Hire Wenlong

Wenlong Dong

Verified Expert  in Engineering

Database Developer

Location
Sydney, Australia
Toptal Member Since
January 21, 2022

Wenlong is a senior data engineer with over half a decade of experience building data and ETL solutions primarily in SQL and Python. He has strong experience in building data pipelines and is familiar with various tools like DBT, Snowflake, Redshift, Python, Airflow, Power BI, Excel VBA, and PowerShell. Finally, he has led projects including fuzzy mapping in Python, end-to-end data pipeline with Dataiku and Anaplan, Salesforce data migration, and omnichannel models in DBT, Redshift, and Airflow.

Portfolio

AstraZeneca
Microsoft Power BI, Snowflake, Apache Airflow, Python 3, SQL, DBeaver, Dataiku...
IBM
Python, Salesforce, SQL, IBM Cloud, GitHub, Data Analysis, Data Engineering...
University of New South Wales
STATA, R, Excel VBA, Data Analysis, Dashboards, SQL, Data Engineering...

Experience

Availability

Part-time

Preferred Environment

PyCharm, Windows, SQL Server 2016, Visual Studio Code (VS Code), SQL Server Integration Services (SSIS)

The most amazing...

...project I've independently designed and completed is a complex medical data validation platform with built-in validation rules using Excel VBA.

Work Experience

Data Engineer

2022 - PRESENT
AstraZeneca
  • Supported the analytics team for Microsoft Power BI reporting.
  • Created a Power BI data flow and built report templates.
  • Developed and maintained a Snowflake-based data warehouse via DBT.
  • Administrated the Snowflake data warehouse and supported data users with troubleshooting issues.
  • Built and maintained Apache Airflow schedules. Completed BAU and troubleshooting tasks.
Technologies: Microsoft Power BI, Snowflake, Apache Airflow, Python 3, SQL, DBeaver, Dataiku, Data Visualization, Data Build Tool (dbt), Data Analytics, Data Analysis, Redshift, Analytics, Data Pipelines, T-SQL (Transact-SQL), SQL DML, Data Queries, SQL Performance, Performance Tuning, Automated Data Flows, Amazon Web Services (AWS), CI/CD Pipelines, ETL Tools, Business Intelligence (BI) Platforms, SQL Stored Procedures, Stored Procedure, JSON, PostgreSQL, Amazon S3 (AWS S3), Excel 2010, Excel 365, Excel 2016, MySQL, ELT, BI Reporting, Databases, Data Transformation, Data Profiling, Dashboard Development, Data Cleansing, Information Gathering, Relational Databases, Data Manipulation, Query Optimization, Data Warehouse Design, Microsoft Word, Windows

Data Engineer

2021 - 2022
IBM
  • Participated as the primary data engineer in a Salesforce data migration project using Python, SQL, and Salesforce APEX.
  • Completed training and learning activities in Hadoop and MongoDB.
  • Worked in an Agile team with a CI/CD development method implemented.
  • Contributed as the primary data engineer for a data migration project with Python-based development.
Technologies: Python, Salesforce, SQL, IBM Cloud, GitHub, Data Analysis, Data Engineering, SQL Server DBA, SQL Stored Procedures, ETL, Microsoft SQL Server, MongoDB, Database Administration (DBA), T-SQL (Transact-SQL), Docker, ETL Development, Data Warehousing, Data Architecture, Pandas, Data Modeling, ETL Testing, Database Modeling, Schemas, Microsoft Excel, Data Analytics, Analytics, Data Pipelines, SQL DML, Data Queries, SQL Performance, Performance Tuning, Dedicated SQL Pool (formerly SQL DW), Azure SQL Data Warehouse, CI/CD Pipelines, ETL Tools, Stored Procedure, PostgreSQL, Excel 2010, Excel 365, Excel 2016, MySQL, BI Reporting, Databases, Data Transformation, Data Profiling, Data Cleansing, Information Gathering, Relational Databases, Data Manipulation, Query Optimization, Data Warehouse Design, MacOS, Microsoft Word

Data Management Officer

2020 - 2021
University of New South Wales
  • Designed and developed a complete data solution with STATA, including data cleansing modules, data validation, and generating statistical reports.
  • Independently designed and developed a medical data collection and validation platform with Excel VBA.
  • Built an R-based model for data cleansing and producing academic reports.
  • Designed and developed SQL Server-based databases and relevant stored procedures.
  • Built PowerBI dashboard with SQL SERVER data source to analyze historical genetic test data with interactive reports instead of multiple spreadsheets.
Technologies: STATA, R, Excel VBA, Data Analysis, Dashboards, SQL, Data Engineering, SQL Server DBA, SQL Stored Procedures, Microsoft SQL Server, Database Administration (DBA), T-SQL (Transact-SQL), ETL Development, Data Science, Business Intelligence (BI), Data Architecture, Pandas, Data Modeling, Database Modeling, Schemas, Microsoft Power BI, Reports, Reporting, Microsoft Excel, Data Analytics, Analytics, SQL DML, Data Queries, SQL Performance, Performance Tuning, ETL Tools, Business Intelligence (BI) Platforms, Stored Procedure, PostgreSQL, Excel 2010, Excel 365, Excel 2016, BI Reporting, Databases, Data Transformation, Data Profiling, Dashboard Development, Data Cleansing, Information Gathering, Relational Databases, Data Manipulation, Query Optimization, Data Warehouse Design, Visual Basic for Applications (VBA), Visual Basic, MacOS, Microsoft Word, Windows

PowerShell Developer

2019 - 2020
Macquarie Bank
  • Designed and built SSIS solutions to create an ETL pipeline between the central data warehouse and a financial analysis platform.
  • Developed a file loading system and data processing jobs with Control-M job flows and PowerShell-based functions.
  • Contributed to the data lake project with a Hive data warehouse.
Technologies: Windows PowerShell, SQL Server 2016, Control-M, SourceTree, Jira, SQL Server Integration Services (SSIS), JSON, YAML, SQL, Data Engineering, SQL Server DBA, SQL Stored Procedures, ETL, Microsoft SQL Server, T-SQL (Transact-SQL), ETL Development, Data Warehousing, Data Modeling, ETL Testing, Database Modeling, Schemas, Microsoft Excel, Data Analysis, Analytics, Data Pipelines, SQL DML, Data Queries, SQL Performance, Performance Tuning, Amazon Web Services (AWS), CI/CD Pipelines, ETL Tools, Stored Procedure, PostgreSQL, Amazon S3 (AWS S3), Excel 2010, Excel 365, Excel 2016, ELT, Databases, Data Transformation, Data Profiling, Data Cleansing, Information Gathering, Relational Databases, Data Manipulation, Query Optimization, Data Warehouse Design, Visual Basic, Microsoft Word, Windows

Data Developer

2018 - 2019
CoreLogic AU
  • Completed a massive data warehouse and data loading pipeline upgrade based on the business rules boost for Australian property data.
  • Supported all BAU processes for the entire data team and the property data platform, including troubleshooting SQL agent jobs, AWS environments, and SSIS packages.
  • Performed detailed analysis on geographic data items. Built a data loading and validation process for geographic data types in SQL Server.
  • Created dynamic SQL processes to optimize the SQL Server performance on giant data tables with more than one million records.
Technologies: SQL Server 2016, BIML, XML, Jira, Confluence, Agile, Python, Unit Testing, SQL Server Integration Services (SSIS), Data Analysis, Dashboards, SQL, Data Engineering, SQL Server DBA, SQL Stored Procedures, ETL, Tableau, Microsoft SQL Server, T-SQL (Transact-SQL), ETL Development, Data Warehousing, Business Intelligence (BI), Pandas, Data Modeling, ETL Testing, Database Modeling, Schemas, Reports, Reporting, Microsoft Excel, Data Analytics, Analytics, Data Pipelines, SQL DML, Data Queries, SQL Performance, Performance Tuning, Amazon Web Services (AWS), CI/CD Pipelines, ETL Tools, Stored Procedure, PostgreSQL, Amazon S3 (AWS S3), Excel 2010, Excel 365, Excel 2016, ELT, Databases, Data Transformation, Data Profiling, Data Cleansing, Information Gathering, Relational Databases, Data Manipulation, Query Optimization, Data Warehouse Design, Microsoft Word

SyteLine and System Support Officer

2017 - 2018
Le Mac Australia Group
  • Designed and maintained the Infor SyteLine ERP system.
  • Designed Crystal Reports and written relevant SQL Server stored procedures.
  • Analyzed production cost data and manipulated data calculation via SQL Server and Excel Pivot Table.
Technologies: SQL Server 2016, Crystal Reports, SyteLine ERP, C#, Pivot Tables, SQL Server DBA, SQL Stored Procedures, Microsoft SQL Server, Database Administration (DBA), T-SQL (Transact-SQL), Database Modeling, Schemas, Microsoft Excel, SQL DML, Data Queries, SQL Performance, Performance Tuning, SQL, Stored Procedure, PostgreSQL, Excel 2010, Excel 365, Excel 2016, Databases, Data Transformation, Data Profiling, Dashboard Development, Data Cleansing, Information Gathering, Relational Databases, Data Manipulation, Query Optimization, Microsoft Word, Windows

Customer Fuzzy Matching Project in Python and Dataiku

The project aimed to map customer data to government-published datasets through the limited fields available—names, occupations, and business addresses. The data sources included Redshift, CSV files, and XML files. The project's first phase was built exclusively in Python, which completed 60% of the total customers mapped. The project's second phase was built in Dataiku, and an additional 20% of the total customer mapping was achieved. I was a project solution designer and builder.

Anaplan Data Integration

A Redshift-based data model that consists of several tables and views of sales data built via dbt. The data objects are refreshed daily or monthly in Airflow. As the project designer and builder, I contributed to building dbt macros to export the tables and views to the S3 bucket as CSV files. We also created Anaplan CloudWorks jobs to consume the CSV files regularly.

SalesForce Data Migration Project

Oversaw, as part of a team, the migration of Salesforce data from the source environment to the target environment. The client wished to separate part of its business into an independent Salesforce environment.

I set up the primary Python framework and built the initial version of the data extraction process—from Salesforce to Python DataFrame. I created the complete solution for duplicate records identification and merging dup records. I designed and developed the parallel computing process for comparing huge amounts of data as well as the grouping logic based on Graph theory. I also designed and built many SQL Server objects, including views, stored procedures, and functions.

Excel VBA-based Medical Data Validation Platform

I designed and completed a medical data validation platform with Excel VBA independently. I implemented complex validation rules within the Excel modules so that users could have data automatically and entirely validated in Excel.

This platform has been accepted and used for the data collection process worldwide.

ETL Solution to Update Existing Real Estate Data

A property data ETL solution project aimed at manipulating existing ETL data flow to fit new government requirements. I was one of the primary SQL Server and SSIS solution developers and completed approximately 50% of the development tasks.

Languages

Python 3, Python, SQL, Excel VBA, T-SQL (Transact-SQL), Snowflake, SQL DML, Stored Procedure, Visual Basic for Applications (VBA), Visual Basic, R, SAS, Java, C, YAML, BIML, XML, C#

Libraries/APIs

Pandas, NetworkX

Tools

STATA, Microsoft Power BI, Jira, Confluence, Spreadsheets, Microsoft Excel, Excel 2010, Excel 2016, Microsoft Word, PyCharm, MATLAB, GitHub, Tableau, Apache Airflow, MySQL Workbench, Control-M, SourceTree, Crystal Reports, CloudWorx

Paradigms

ETL, Business Intelligence (BI), Dimensional Modeling, Data Science, Agile, Unit Testing

Platforms

Visual Studio Code (VS Code), MacOS, Windows, Amazon Web Services (AWS), Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), Salesforce, Docker, Azure, Azure PaaS, Azure IaaS, Salesforce SOQL/SOSL, Linux, Windows Server 2016, Amazon EC2, Dataiku, Anaplan

Storage

SQL Server 2016, SQL Server Integration Services (SSIS), Databases, SQL Stored Procedures, SQL Server DBA, MySQL, Microsoft SQL Server, Database Administration (DBA), Database Modeling, Redshift, Data Pipelines, SQL Performance, PostgreSQL, Amazon S3 (AWS S3), Relational Databases, JSON, Database Performance, Azure SQL, Azure Blobs, MongoDB, DBeaver

Other

Data Engineering, Data Warehousing, Data Analysis, Data Cleaning, ETL Development, Data Modeling, ETL Testing, Schemas, Data Analytics, Analytics, Data Queries, Performance Tuning, CI/CD Pipelines, ETL Tools, Excel 365, BI Reporting, Data Transformation, Data Profiling, Dashboard Development, Data Cleansing, Information Gathering, Data Manipulation, Query Optimization, Data Warehouse Design, Statistics, Dashboards, Data Architecture, Reports, Reporting, Data Build Tool (dbt), Automated Data Flows, Business Intelligence (BI) Platforms, ELT, MRP, Knowledge Management, Minitab, Calculus, Linear Algebra, IBM Cloud, IT Service Management (ITSM), Web Scraping, SyteLine ERP, Pivot Tables, Multiprocessing, Data Visualization, Fuzzy Logic

Frameworks

Windows PowerShell

2020 - 2021

Graduate Certificate in Health Data Science

University of New South Wales - Sydney, NSW, Australia

2013 - 2014

Master's Degree in Information Systems

The University of Melbourne - Melbourne, Victoria, Australia

2007 - 2011

Bachelor's Degree in Logistics and Supply Chain Management

Huazhong University of Science and Technology - Wuhan, Hubei, China

MARCH 2022 - PRESENT

Microsoft Certified: Azure Fundamentals

Microsoft

MARCH 2017 - PRESENT

ITIL Foundation Certificate in IT Service Management

AXELOS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring