Wenlong Dong
Verified Expert in Engineering
Database Developer
Sydney, New South Wales, Australia
Toptal member since January 21, 2022
Wenlong is a senior data engineer with over five years of experience building data and ETL solutions, primarily in SQL and Python. He has vast experience building data pipelines and is familiar with various tools like dbt, Snowflake, Redshift, Python, Airflow, Power BI, Excel VBA, and PowerShell. Wenlong has led projects including fuzzy mapping in Python, end-to-end data pipeline with Dataiku and Anaplan, Salesforce data migration, and omnichannel models in dbt, Redshift, and Airflow.
Portfolio
Experience
- SQL Server 2016 - 5 years
- SQL Server Integration Services (SSIS) - 5 years
- Python 3 - 2 years
- GitHub - 2 years
- Visual Studio Code (VS Code) - 2 years
- STATA - 2 years
- Excel VBA - 2 years
- R - 2 years
Availability
Preferred Environment
PyCharm, Windows, SQL Server 2016, Visual Studio Code (VS Code), SQL Server Integration Services (SSIS), Snowflake, Redshift, Python 3
The most amazing...
...project I've independently designed and completed is a complex medical data validation platform with built-in validation rules using Excel VBA.
Work Experience
Data Engineer
AstraZeneca
- Supported the analytics team for Microsoft Power BI reporting.
- Created a Power BI data flow and built report templates.
- Developed and maintained a Snowflake-based data warehouse via DBT.
- Administrated the Snowflake data warehouse and supported data users with troubleshooting issues.
- Built and maintained Apache Airflow schedules. Completed BAU and troubleshooting tasks.
Data Engineer
IBM
- Participated as the primary data engineer in a Salesforce data migration project using Python, SQL, and Salesforce APEX.
- Completed training and learning activities in Hadoop and MongoDB.
- Worked in an Agile team with a CI/CD development method implemented.
- Contributed as the primary data engineer for a data migration project with Python-based development.
Data Management Officer
University of New South Wales
- Designed and developed a complete data solution with STATA, including data cleansing modules, data validation, and generating statistical reports.
- Independently designed and developed a medical data collection and validation platform with Excel VBA.
- Built an R-based model for data cleansing and producing academic reports.
- Designed and developed SQL Server-based databases and relevant stored procedures.
- Built PowerBI dashboard with SQL SERVER data source to analyze historical genetic test data with interactive reports instead of multiple spreadsheets.
PowerShell Developer
Macquarie Bank
- Designed and built SSIS solutions to create an ETL pipeline between the central data warehouse and a financial analysis platform.
- Developed a file loading system and data processing jobs with Control-M job flows and PowerShell-based functions.
- Contributed to the data lake project with a Hive data warehouse.
Data Developer
CoreLogic AU
- Completed a massive data warehouse and data loading pipeline upgrade based on the business rules boost for Australian property data.
- Supported all BAU processes for the entire data team and the property data platform, including troubleshooting SQL agent jobs, AWS environments, and SSIS packages.
- Performed detailed analysis on geographic data items. Built a data loading and validation process for geographic data types in SQL Server.
- Created dynamic SQL processes to optimize the SQL Server performance on giant data tables with more than one million records.
SyteLine and System Support Officer
Le Mac Australia Group
- Designed and maintained the Infor SyteLine ERP system.
- Designed Crystal Reports and written relevant SQL Server stored procedures.
- Analyzed production cost data and manipulated data calculation via SQL Server and Excel Pivot Table.
Experience
Customer Fuzzy Matching Project in Python and Dataiku
Anaplan Data Integration
SalesForce Data Migration Project
I set up the primary Python framework and built the initial version of the data extraction process—from Salesforce to Python DataFrame. I created the complete solution for duplicate records identification and merging dup records. I designed and developed the parallel computing process for comparing huge amounts of data as well as the grouping logic based on Graph theory. I also designed and built many SQL Server objects, including views, stored procedures, and functions.
Excel VBA-based Medical Data Validation Platform
This platform has been accepted and used for the data collection process worldwide.
ETL Solution to Update Existing Real Estate Data
Education
Graduate Certificate in Health Data Science
University of New South Wales - Sydney, NSW, Australia
Master's Degree in Information Systems
The University of Melbourne - Melbourne, Victoria, Australia
Bachelor's Degree in Logistics and Supply Chain Management
Huazhong University of Science and Technology - Wuhan, Hubei, China
Certifications
Microsoft Certified: Azure Fundamentals
Microsoft
ITIL Foundation Certificate in IT Service Management
AXELOS
Skills
Libraries/APIs
Pandas, NetworkX
Tools
STATA, Microsoft Power BI, Jira, Confluence, Spreadsheets, Microsoft Excel, Excel 2010, Excel 2016, Microsoft Word, PyCharm, MATLAB, GitHub, Tableau, Apache Airflow, MySQL Workbench, Control-M, SourceTree, Crystal Reports, CloudWorx
Languages
Python 3, Python, SQL, Excel VBA, T-SQL (Transact-SQL), Snowflake, SQL DML, Stored Procedure, Visual Basic for Applications (VBA), Visual Basic, R, SAS, Java, C, YAML, BIML, XML, C#
Paradigms
ETL, Business Intelligence (BI), Dimensional Modeling, Agile, Unit Testing
Platforms
Visual Studio Code (VS Code), MacOS, Windows, Amazon Web Services (AWS), Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), Salesforce, Docker, Azure, Azure PaaS, Azure IaaS, Salesforce SOQL/SOSL, Linux, Windows Server 2016, Amazon EC2, Dataiku, Anaplan
Storage
SQL Server 2016, SQL Server Integration Services (SSIS), Databases, SQL Stored Procedures, SQL Server DBA, MySQL, Microsoft SQL Server, Database Administration (DBA), Database Modeling, Redshift, Data Pipelines, SQL Performance, PostgreSQL, Amazon S3 (AWS S3), Relational Databases, JSON, Database Performance, Azure SQL, Azure Blobs, MongoDB, DBeaver
Frameworks
Windows PowerShell
Other
Data Engineering, Data Warehousing, Data Analysis, Data Cleaning, ETL Development, Data Modeling, ETL Testing, Schemas, Data Analytics, Analytics, Data Queries, Performance Tuning, CI/CD Pipelines, ETL Tools, Excel 365, BI Reporting, Data Transformation, Data Profiling, Dashboard Development, Data Cleansing, Information Gathering, Data Manipulation, Query Optimization, Data Warehouse Design, Statistics, Dashboards, Data Science, Data Architecture, Reports, Reporting, Data Build Tool (dbt), Automated Data Flows, Business Intelligence (BI) Platforms, ELT, Manufacturing Resource Planning (MRP), Knowledge Management, Minitab, Calculus, Linear Algebra, IBM Cloud, IT Service Management (ITSM), Web Scraping, SyteLine ERP, Pivot Tables, Multiprocessing, Data Visualization, Fuzzy Logic
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring