
Yongyong Li
Verified Expert in Engineering
Data Engineer, Analyst, and Developer
Calgary, AB, Canada
Toptal member since January 6, 2026
Yongyong is a strategic and results-driven data engineer and senior data analyst with extensive experience building scalable data platforms, ETL/ELT workflows, and analytics solutions across AWS, Azure, Databricks, and enterprise BI ecosystems. She is recognized for her creative problem-solving skills, accuracy under pressure, and strong execution. Yongyong is skilled in modern data stack tools, including data build tool (dbt), Airflow, Matillion, Spark, Kafka, and cloud-native pipelines.
Portfolio
Experience
- Data Analysis - 15 years
- SQL - 13 years
- Microsoft SQL Server - 9 years
- ETL - 8 years
- Data Engineering - 8 years
- Python - 6 years
- Databricks - 3 years
- Data Build Tool (dbt) - 2 years
Preferred Environment
MacOS
The most amazing...
...thing I've accomplished is transforming complex, fragmented data into reliable, scalable platforms that business teams actually trust and utilize.
Work Experience
Senior Data Analyst
FCC
- Designed, developed, and deployed scalable data pipelines using dbt with AWS Redshift, ensuring high data quality and accessibility for analytics and business intelligence.
- Utilized Jinja within dbt materializations, macros, models, and tests to simplify and modularize code, enhancing reusability and maintainability.
- Optimized dbt projects, reducing processing latency and improving performance by approximately 60%.
- Implemented slowly changing dimension (SCD Type 2) logic using dbt snapshots to track historical state changes in high-churn datasets, enabling point-in-time "as-of" reporting for executive stakeholders.
- Standardized organizational reference data via dbt seeds, eliminating hard-coded mapping logic and ensuring 100% consistency for lookup tables, e.g., currency codes and regional tiers, across all production models.
- Architected a custom data observability framework in dbt using on-run-end hooks and sophisticated Jinja macros to capture and persist real-time test metadata into a centralized Redshift audit table.
- Utilized Airflow to orchestrate complex data workflows, ensuring reliability, scalability, and timely data delivery across multiple business domains.
- Contributed to CI/CD pipelines for dbt projects using GitHub, improving automation and deployment reliability.
- Validated datasets between AWS Redshift and Teradata using Python (Jupyter Notebook) to ensure data integrity and accuracy. Debugged existing code to identify and resolve defects.
Senior Data Engineer
Parkland
- Built and managed data pipelines on the AWS cloud platform, developing end-to-end automation to make datasets readily consumable by internal and external stakeholders.
- Leveraged AWS Glue and AWS Lambda to design and implement ETL processes that extracted, transformed, and loaded data from multiple sources (APIs, S3, SQL Server, and user inputs) into Redshift and S3 data lakes.
- Utilized AWS Step Functions to orchestrate complex workflows and implemented monitoring, logging, and alerting through CloudWatch to ensure operational reliability.
- Developed scalable ETL processes using PySpark and Python, optimizing jobs for performance and cost efficiency.
- Designed, maintained, and monitored PostgreSQL databases, including developing SQL queries and performance tuning for customers.
- Designed and optimized data pipelines in Databricks, developing Spark jobs, transformations, and processing workflows to support scalable analytics.
- Mentored data engineers and data scientists on SQL optimization, data modeling, and ETL best practices, fostering collaboration, technical growth, and operational excellence.
Senior Data Engineer
Cognizant
- Designed and implemented end-to-end cloud ETL pipelines using Matillion, automating the ingestion of complex data sources into AWS Redshift.
- Leveraged Matillion orchestration and transformation jobs, environment variables, and grid variables to build dynamic, reusable workflows that scaled across multiple business units.
- Optimized component-level logic and SQL Pushdown within Matillion to ensure high-performance execution and cost-efficiency.
- Architected scalable data ingestion frameworks using Matillion ETL, integrating diverse APIs and on-premise databases into a centralized AWS data lake.
BI and ETL Developer
Tervita Corporation
- Created 100+ Business Intelligence (BI) reports and dashboards for Marketing, Finance, Production, and HR departments, leveraging DAX formulas and Power Query (M) for advanced Power BI calculations.
- Developed and maintained SQL Server and Oracle databases using SSIS packages, T-SQL scripts, and stored procedures to support BI initiatives.
- Designed, built, and maintained ETL workflows and data warehouses, integrating multiple data sources such as Oracle, SQL Server, flat files, Excel, XML, and web services.
- Built and orchestrated ETL/ELT workflows using Azure Data Factory, including pipeline scheduling, parameterized data ingestion, and integration with cloud storage and SQL systems.
- Performed large-scale data analysis using Azure Data Lake, Azure SQL Database, and Azure Synapse Analytics to drive business insights.
- Analyzed production and financial data using advanced analytics techniques, including machine learning, statistical modeling, predictive analytics, and data visualization.
- Led the migration of 40+ Tableau reports to Power BI within two months, improving report performance and expanding user accessibility.
Experience
Customer Behaviour Models
Education
Master's Degree in Petroleum Engineering
China University of Petroleum (CUP) - Beijing, China
Certifications
Databricks Certificate Data Engineer Associate
Databricks
Skills
Libraries/APIs
PySpark
Tools
Microsoft Power BI, AWS Glue, AWS Step Functions, Amazon CloudWatch, Amazon Athena, Apache Airflow, Tableau, Microsoft Dynamics
Languages
SQL, Python, Transact-SQL (T-SQL), Bash Script, Stored Procedure, Snowflake
Paradigms
ETL, Business Intelligence (BI)
Platforms
MacOS, Databricks, AWS Lambda, Azure, Amazon Web Services (AWS), Docker, Confluent Kafka
Storage
Teradata, PostgreSQL, Amazon S3 (AWS S3), SQL Server Integration Services (SSIS), Microsoft SQL Server, Data Lakes, Database Architecture, Databases, SQL Stored Procedures
Other
Data Build Tool (dbt), Data Engineering, Data Analysis, Data Warehousing, AWS Cloud Architecture, Amazon Redshift, ELT, Data Visualization, API Integration, Data Modeling, Delta Lake, Microsoft Data Transformation Services (now SSIS), SSIS Custom Components, DAX, Dashboards, Data Analytics, Big Data, Solution Architecture, Engineering, Azure Data Factory (ADF), AWS Resshift, Matillion ETL Tool, APIs
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring