Yongyong is currently unavailable

Yongyong Li

Verified Expert in Engineering

Data Engineer, Analyst, and Developer

Calgary, AB, Canada

Toptal member since January 6, 2026

Expertise

Data Analysis Data Science Data Warehouse Data Engineering AWS Cloud Big Data Architecture DAX Dashboard Data Visualization SQL Database ETL AWS Python

Bio

Yongyong is a strategic and results-driven data engineer and senior data analyst with extensive experience building scalable data platforms, ETL/ELT workflows, and analytics solutions across AWS, Azure, Databricks, and enterprise BI ecosystems. She is recognized for her creative problem-solving skills, accuracy under pressure, and strong execution. Yongyong is skilled in modern data stack tools, including data build tool (dbt), Airflow, Matillion, Spark, Kafka, and cloud-native pipelines.

Portfolio

FCC

Data Build Tool (dbt), Teradata, Python, SQL, Snowflake, Apache Airflow, Docker...

Parkland

Data Engineering, Amazon S3 (AWS S3), AWS Glue, AWS Lambda, AWS Step Functions...

Cognizant

AWS Glue, AWS Resshift, Matillion ETL Tool, APIs, Bash Script, SQL, Python...

Experience

Data Analysis - 15 years
SQL - 13 years
Microsoft SQL Server - 9 years
ETL - 8 years
Data Engineering - 8 years
Python - 6 years
Databricks - 3 years
Data Build Tool (dbt) - 2 years

Preferred Environment

MacOS

The most amazing...

...thing I've accomplished is transforming complex, fragmented data into reliable, scalable platforms that business teams actually trust and utilize.

Work Experience

Senior Data Analyst

2025 - 2026

FCC

Designed, developed, and deployed scalable data pipelines using dbt with AWS Redshift, ensuring high data quality and accessibility for analytics and business intelligence.
Utilized Jinja within dbt materializations, macros, models, and tests to simplify and modularize code, enhancing reusability and maintainability.
Optimized dbt projects, reducing processing latency and improving performance by approximately 60%.
Implemented slowly changing dimension (SCD Type 2) logic using dbt snapshots to track historical state changes in high-churn datasets, enabling point-in-time "as-of" reporting for executive stakeholders.
Standardized organizational reference data via dbt seeds, eliminating hard-coded mapping logic and ensuring 100% consistency for lookup tables, e.g., currency codes and regional tiers, across all production models.
Architected a custom data observability framework in dbt using on-run-end hooks and sophisticated Jinja macros to capture and persist real-time test metadata into a centralized Redshift audit table.
Utilized Airflow to orchestrate complex data workflows, ensuring reliability, scalability, and timely data delivery across multiple business domains.
Contributed to CI/CD pipelines for dbt projects using GitHub, improving automation and deployment reliability.
Validated datasets between AWS Redshift and Teradata using Python (Jupyter Notebook) to ensure data integrity and accuracy. Debugged existing code to identify and resolve defects.

Technologies: Data Build Tool (dbt), Teradata, Python, SQL, Snowflake, Apache Airflow, Docker, Bash Script, Data Analysis, Data Warehousing, AWS Cloud Architecture, Data Engineering, ETL, PostgreSQL, Microsoft Power BI, Data Analytics, Databases, Big Data, Solution Architecture, Amazon Web Services (AWS)

Senior Data Engineer

2019 - 2024

Parkland

Built and managed data pipelines on the AWS cloud platform, developing end-to-end automation to make datasets readily consumable by internal and external stakeholders.
Leveraged AWS Glue and AWS Lambda to design and implement ETL processes that extracted, transformed, and loaded data from multiple sources (APIs, S3, SQL Server, and user inputs) into Redshift and S3 data lakes.
Utilized AWS Step Functions to orchestrate complex workflows and implemented monitoring, logging, and alerting through CloudWatch to ensure operational reliability.
Developed scalable ETL processes using PySpark and Python, optimizing jobs for performance and cost efficiency.
Designed, maintained, and monitored PostgreSQL databases, including developing SQL queries and performance tuning for customers.
Designed and optimized data pipelines in Databricks, developing Spark jobs, transformations, and processing workflows to support scalable analytics.
Mentored data engineers and data scientists on SQL optimization, data modeling, and ETL best practices, fostering collaboration, technical growth, and operational excellence.

Technologies: Data Engineering, Amazon S3 (AWS S3), AWS Glue, AWS Lambda, AWS Step Functions, Amazon CloudWatch, Amazon Redshift, Amazon Athena, API Integration, Data Modeling, Delta Lake, Data Lakes, PostgreSQL, PySpark, Data Analysis, Databricks, Data Warehousing, AWS Cloud Architecture, ETL, Microsoft SQL Server, Microsoft Power BI, SQL Stored Procedures, Stored Procedure, DAX, Dashboards, Data Analytics, Database Architecture, Databases, Big Data, Solution Architecture, Amazon Web Services (AWS)

Senior Data Engineer

2022 - 2023

Cognizant

Designed and implemented end-to-end cloud ETL pipelines using Matillion, automating the ingestion of complex data sources into AWS Redshift.
Leveraged Matillion orchestration and transformation jobs, environment variables, and grid variables to build dynamic, reusable workflows that scaled across multiple business units.
Optimized component-level logic and SQL Pushdown within Matillion to ensure high-performance execution and cost-efficiency.
Architected scalable data ingestion frameworks using Matillion ETL, integrating diverse APIs and on-premise databases into a centralized AWS data lake.

Technologies: AWS Glue, AWS Resshift, Matillion ETL Tool, APIs, Bash Script, SQL, Python, PySpark, Microsoft SQL Server, Data Analytics, Database Architecture, Databases, Big Data, Amazon Web Services (AWS)

BI and ETL Developer

2017 - 2019

Tervita Corporation

Created 100+ Business Intelligence (BI) reports and dashboards for Marketing, Finance, Production, and HR departments, leveraging DAX formulas and Power Query (M) for advanced Power BI calculations.
Developed and maintained SQL Server and Oracle databases using SSIS packages, T-SQL scripts, and stored procedures to support BI initiatives.
Designed, built, and maintained ETL workflows and data warehouses, integrating multiple data sources such as Oracle, SQL Server, flat files, Excel, XML, and web services.
Built and orchestrated ETL/ELT workflows using Azure Data Factory, including pipeline scheduling, parameterized data ingestion, and integration with cloud storage and SQL systems.
Performed large-scale data analysis using Azure Data Lake, Azure SQL Database, and Azure Synapse Analytics to drive business insights.
Analyzed production and financial data using advanced analytics techniques, including machine learning, statistical modeling, predictive analytics, and data visualization.
Led the migration of 40+ Tableau reports to Power BI within two months, improving report performance and expanding user accessibility.

Technologies: Business Intelligence (BI), ETL, SQL Server Integration Services (SSIS), Azure, Transact-SQL (T-SQL), ELT, Microsoft SQL Server, Data Visualization, Tableau, Microsoft Power BI, Data Analysis, Data Warehousing, Microsoft Data Transformation Services (now SSIS), SQL Stored Procedures, SSIS Custom Components, Stored Procedure, Microsoft Dynamics, Azure Data Factory (ADF), PostgreSQL, DAX, Dashboards, Data Analytics, Database Architecture, Databases

Experience

Customer Behaviour Models

Built customer behavior models using dbt to deliver reliable, analytics-ready datasets for business teams. I designed a layered dbt architecture, i.e., raw, staging, and marts, implemented reusable macros, documentation, and data quality tests, and used dbt snapshots to track historical changes. I also orchestrated end-to-end pipelines with Airflow, adding retries and monitoring, and set up GitHub-based CI/CD to automate testing and deployments, improving data reliability and delivery speed.

Education

2000 - 2003

Master's Degree in Petroleum Engineering

China University of Petroleum (CUP) - Beijing, China

Certifications

AUGUST 2025 - PRESENT

Databricks Certificate Data Engineer Associate

Databricks

Skills

Libraries/APIs

PySpark

Tools

Microsoft Power BI, AWS Glue, AWS Step Functions, Amazon CloudWatch, Amazon Athena, Apache Airflow, Tableau, Microsoft Dynamics

Languages

SQL, Python, Transact-SQL (T-SQL), Bash Script, Stored Procedure, Snowflake

Paradigms

ETL, Business Intelligence (BI)

Platforms

MacOS, Databricks, AWS Lambda, Azure, Amazon Web Services (AWS), Docker, Confluent Kafka

Storage

Teradata, PostgreSQL, Amazon S3 (AWS S3), SQL Server Integration Services (SSIS), Microsoft SQL Server, Data Lakes, Database Architecture, Databases, SQL Stored Procedures

Other

Data Build Tool (dbt), Data Engineering, Data Analysis, Data Warehousing, AWS Cloud Architecture, Amazon Redshift, ELT, Data Visualization, API Integration, Data Modeling, Delta Lake, Microsoft Data Transformation Services (now SSIS), SSIS Custom Components, DAX, Dashboards, Data Analytics, Big Data, Solution Architecture, Engineering, Azure Data Factory (ADF), AWS Resshift, Matillion ETL Tool, APIs

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring