Karanpreet Kaur
Verified Expert in Engineering
Data Engineer and Developer
Toronto, ON, Canada
Toptal member since October 5, 2022
Karanpreet is an experienced data engineer with a solid background in working with multiple leading international enterprise clients across the retail and investment banking domain. Combining her strong technical and soft skills with a rigorous knowledge of extract, transform, and load (ETL) design and data analytics, Karanpreet is also passionate and curious about the latest tech trends and always open to learning new things.
Portfolio
Experience
- PostgreSQL - 4 years
- Azure Data Factory (ADF) - 3 years
- Azure Data Lake - 3 years
- Azure Databricks - 3 years
- PySpark - 3 years
- Azure SQL Databases - 3 years
- ETL Development - 3 years
- Azure SQL Data Warehouse - 3 years
Availability
Preferred Environment
Windows 10, Slack, Visual Studio Code (VS Code)
The most amazing...
...project I've developed is the complete hand-coded ETL process for an eCommerce startup dashboard to automate their daily product label categorization process.
Work Experience
Data Scientist | Data Engineer
The University of British Columbia (Capstone Project)
- Developed an unsupervised machine learning model to help Canada's leading startup classify around 4,000 scraped products into subcategories from different stores across various eCommerce platforms.
- Ensembled and combined contrastive language-image pre-training (CLIP) methods and multiclass text classification to achieve higher precision for each category of products.
- Implemented two hand-coded ETL pipelines (training and prediction) to obtain data for new products daily, invoke image and text model script for prediction of product category, and update product records in productions with forecasts.
- Helped save manual efforts on product category labeling from around 180 to around 14 minutes per 4,000 products daily.
Data Engineer Consultant for FMCG
Deloitte
- Developed data transformations in the ETL process in Azure Databricks and designed execution workflow in Azure Data Factory.
- Identified and removed redundant activities in the execution workflow in Azure Data Factory, leading to a one-hour reduction in daily execution and 45 minutes in the monthly process and decreased consumption of cloud resources.
- Reduced storage and processing time in SQL Data Warehouse by analyzing the duplication of records between Spark SQL and layer, reducing the row count by 86%.
- Implemented and automated the ETL process end-to-end, expediting deliverables by 1–2 days every month and making the team independent of any external and manual dependencies for Microsoft Power BI dashboards deliverables.
- Led the design, development, and validation of external data source dashboards as the team's single point of contact for any process-related queries.
- Replicated complex SQL queries implemented in SQL Data Warehouse in Apache Spark (Azure Databricks), which saved five hours of execution time and cut down 650GB of storage in the data warehouse.
- Received reward from client leadership for the accomplishments in fine-tuning, optimizations, and cost reduction for ETL processes, as well as the company's 2020 Live Dot award for outstanding performance and contribution to FMCG engagement.
Data Engineer Consultant
Deloitte
- Created a chatbot proof of concept (POC) for an Australian investment bank with RASA stack and custom components to automate the manual efforts for finding insights from various sources, enabling cost reduction of five full-time equivalents (FTE).
- Collaborated with on-shore client team members to understand financial back-end logic used to answer ad-hoc queries. Took ownership of tracking technical requirements, architectural design documentation, data collection, and data preparation.
- Prepared training data for chatbot solution in RASA based on business users' ad hoc queries on expense reports, including higher management officials such as the chief experience officer (CXO) and chief technology officer (CTO).
- Designed and implemented actions module in Python to cater to each action defined for chatbot response, such as year-to-date (YTD) calculation for revenue in personal banking.
- Developed an entity extractor model in Python as a wrapper to natural language understanding (NLU) text classification model to extract user query entities, including the month, year, line of business, and product, to help understand its intent.
- Initiated and developed a POC to transform structured data into the natural language using Arria NLG Studio, automating the manual efforts and savings of one FTE spent on writing commentaries for monthly tax and revenue reports.
- Received the company's 2019 Move the Dot team award for exemplary performance and significant contributions through team efforts in chatbot client engagement.
Intern
STMicroelectronics
- Designed and implemented a generalized Java patch to filter error files with over 10,000 lines in XML format and convert HTML tables to CSV, with columns containing specific error info tags, reducing manual efforts to read and identify them.
- Maintained documentation of releases, test plans, and deployments using the HP application lifecycle management (ALM) tool.
- Identified and solved multiple defects and boundary cases during testing, which helped the team to fix and deliver within the deadline.
Experience
Online Taxi Service ETL Pipeline
https://github.com/karanpreetkaur/online_taxi_service_ETL_ProjectThe project short description is available to authorized users: https://docs.google.com/presentation/d/1PHT9CrB602qDdVB9q_wBui5OEe7kGHgouY1YtLXDi84/edit#slide=id.gcb9a0b074_1_0.
Education
Master's Degree in Data Science
The University of British Columbia - Vancouver, British Columbia, Canada
Bachelor's Degree in Computer Science
Thapar Institute of Engineering and Technology - Patiala, Punjab, India
Certifications
Azure Data Engineer Associate
Microsoft
Microsoft Azure Data Fundamentals
Microsoft
Microsoft Azure Fundaments
Microsoft
Skills
Libraries/APIs
NumPy, Pandas, PySpark, Scikit-learn, Tidyverse, Rasa NLU
Tools
Git, Slack, Dplyr, Microsoft Power BI, HP Application Lifecycle Management (ALM)
Languages
Python, R, C, C++, SQL, HTML
Platforms
Azure SQL Data Warehouse, Azure, Databricks, Azure Synapse, Dedicated SQL Pool (formerly SQL DW), Visual Studio Code (VS Code), Azure PaaS, Azure Synapse Analytics
Storage
PostgreSQL, Azure SQL Databases, Azure SQL, Data Lakes, MongoDB, Data Pipelines, Databases, Relational Databases
Frameworks
Apache Spark
Paradigms
ETL
Other
Windows 10, Data Wrangling, Data Structures, Algorithms, Microsoft Azure, Azure Databricks, Azure Data Lake, Azure Data Factory (ADF), Data, ETL Development, Data Engineering, Data Analytics, Data Cleaning, Data Processing, Machine Learning, Data Science, Supervised Machine Learning, Unsupervised Learning, Statistical Methods, Predictive Analytics, Hypothesis Testing, Software Development, Build Pipelines, Data Warehouse Design, Manual Software Testing, Investment Banking Technology, Natural Language Generation (NLG), Technical Requirements, Natural Language Processing (NLP), OpenAI, Text Classification, Azure Stream Analytics, Big Data, Cloud, Data Security, Project Management & Work Tracking Tools, Security, Storage, Generative Pre-trained Transformers (GPT)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring