Karanpreet is available for hire

Karanpreet Kaur

Verified Expert in Engineering

Data Engineer and Developer

Location

Toronto, ON, Canada

Toptal Member Since

October 5, 2022

Karanpreet is an experienced data engineer with a solid background in working with multiple leading international enterprise clients across the retail and investment banking domain. Combining her strong technical and soft skills with a rigorous knowledge of extract, transform, and load (ETL) design and data analytics, Karanpreet is also passionate and curious about the latest tech trends and always open to learning new things.

Algorithms Data Wrangling Azure Data Lake Data Engineering Data Analytics Data Cleaning Python NumPy Pandas PostgreSQL Git Databricks PySpark Azure SQL Databases Azure SQL Dplyr Natural Language Generation

Portfolio

The University of British Columbia (Capstone Project)

Python, PostgreSQL, Machine Learning, OpenAI, Text Classification...

Deloitte

Python, Apache Spark, Azure Databricks, Azure Data Factory, PySpark...

Deloitte

Investment Banking Technology, Python, Natural Language Generation (NLG)...

Experience

PostgreSQL - 4 years Azure Data Factory - 3 years Azure Data Lake - 3 years Azure Databricks - 3 years PySpark - 3 years Azure SQL Databases - 3 years ETL Development - 3 years Azure SQL Data Warehouse - 3 years

Availability

Part-time

Preferred Environment

Windows 10, Slack, Visual Studio Code (VS Code)

The most amazing...

...project I've developed is the complete hand-coded ETL process for an eCommerce startup dashboard to automate their daily product label categorization process.

Work Experience

Data Scientist | Data Engineer

2022 - 2022

The University of British Columbia (Capstone Project)

Developed an unsupervised machine learning model to help Canada's leading startup classify around 4,000 scraped products into subcategories from different stores across various eCommerce platforms.
Ensembled and combined contrastive language-image pre-training (CLIP) methods and multiclass text classification to achieve higher precision for each category of products.
Implemented two hand-coded ETL pipelines (training and prediction) to obtain data for new products daily, invoke image and text model script for prediction of product category, and update product records in productions with forecasts.
Helped save manual efforts on product category labeling from around 180 to around 14 minutes per 4,000 products daily.

Technologies: Python, PostgreSQL, Machine Learning, OpenAI, Text Classification, Data Engineering, Data Analytics, Data Cleaning, Data Processing

Data Engineer Consultant for FMCG

2018 - 2021

Deloitte

Developed data transformations in the ETL process in Azure Databricks and designed execution workflow in Azure Data Factory.
Identified and removed redundant activities in the execution workflow in Azure Data Factory, leading to a one-hour reduction in daily execution and 45 minutes in the monthly process and decreased consumption of cloud resources.
Reduced storage and processing time in SQL Data Warehouse by analyzing the duplication of records between Spark SQL and layer, reducing the row count by 86%.
Implemented and automated the ETL process end-to-end, expediting deliverables by 1–2 days every month and making the team independent of any external and manual dependencies for Microsoft Power BI dashboards deliverables.
Led the design, development, and validation of external data source dashboards as the team's single point of contact for any process-related queries.
Replicated complex SQL queries implemented in SQL Data Warehouse in Apache Spark (Azure Databricks), which saved five hours of execution time and cut down 650GB of storage in the data warehouse.
Received reward from client leadership for the accomplishments in fine-tuning, optimizations, and cost reduction for ETL processes, as well as the company's 2020 Live Dot award for outstanding performance and contribution to FMCG engagement.

Technologies: Python, Apache Spark, Azure Databricks, Azure Data Factory, PySpark, Azure SQL Databases, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), Azure Data Lake, Microsoft Power BI, ETL Development, Data Engineering, Data Lakes, Data Analytics, Data Cleaning, Data Processing

Data Engineer Consultant

2018 - 2021

Deloitte

Created a chatbot proof of concept (POC) for an Australian investment bank with RASA stack and custom components to automate the manual efforts for finding insights from various sources, enabling cost reduction of five full-time equivalents (FTE).
Collaborated with on-shore client team members to understand financial back-end logic used to answer ad-hoc queries. Took ownership of tracking technical requirements, architectural design documentation, data collection, and data preparation.
Prepared training data for chatbot solution in RASA based on business users' ad hoc queries on expense reports, including higher management officials such as the chief experience officer (CXO) and chief technology officer (CTO).
Designed and implemented actions module in Python to cater to each action defined for chatbot response, such as year-to-date (YTD) calculation for revenue in personal banking.
Developed an entity extractor model in Python as a wrapper to natural language understanding (NLU) text classification model to extract user query entities, including the month, year, line of business, and product, to help understand its intent.
Initiated and developed a POC to transform structured data into the natural language using Arria NLG Studio, automating the manual efforts and savings of one FTE spent on writing commentaries for monthly tax and revenue reports.
Received the company's 2019 Move the Dot team award for exemplary performance and significant contributions through team efforts in chatbot client engagement.

Technologies: Investment Banking Technology, Python, Natural Language Generation (NLG), Rasa NLU, Technical Requirements, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Data Engineering, Data Analytics, Data Cleaning

Intern

2018 - 2018

STMicroelectronics

Designed and implemented a generalized Java patch to filter error files with over 10,000 lines in XML format and convert HTML tables to CSV, with columns containing specific error info tags, reducing manual efforts to read and identify them.
Maintained documentation of releases, test plans, and deployments using the HP application lifecycle management (ALM) tool.
Identified and solved multiple defects and boundary cases during testing, which helped the team to fix and deliver within the deadline.

Technologies: SQL, HTML, Manual Software Testing, HP Application Lifecycle Management (ALM)

Experience

Online Taxi Service ETL Pipeline

https://github.com/karanpreetkaur/online_taxi_service_ETL_Project

The project is a hand-coded ETL pipeline process that generates data for an online taxi service database and weblogs. It handles erroneous data and tracks ETL metadata using logging, including job start time, job finish time, and status. By performing data wrangling, it transforms it into a readable data format for reporting and finally fills the initial load of the target datastore.

The project short description is available to authorized users: https://docs.google.com/presentation/d/1PHT9CrB602qDdVB9q_wBui5OEe7kGHgouY1YtLXDi84/edit#slide=id.gcb9a0b074_1_0.

Education

2021 - 2022

Master's Degree in Data Science

The University of British Columbia - Vancouver, British Columbia, Canada

2014 - 2018

Bachelor's Degree in Computer Science

Thapar Institute of Engineering and Technology - Patiala, Punjab, India

Certifications

NOVEMBER 2022 - NOVEMBER 2023

Azure Data Engineer Associate

Microsoft

AUGUST 2021 - PRESENT

Microsoft Azure Data Fundamentals

Microsoft

JULY 2021 - PRESENT

Microsoft Azure Fundaments

Microsoft

Skills

Libraries/APIs

NumPy, Pandas, PySpark, Scikit-learn, Tidyverse, Rasa NLU

Tools

Git, Slack, Dplyr, Microsoft Power BI, HP Application Lifecycle Management (ALM)

Languages

Python, R, C, C++, SQL, HTML

Platforms

Azure SQL Data Warehouse, Azure, Databricks, Azure Synapse, Dedicated SQL Pool (formerly SQL DW), Visual Studio Code (VS Code), Azure PaaS, Azure Synapse Analytics

Paradigms

Data Science, ETL

Storage

PostgreSQL, Azure SQL Databases, Azure SQL, Data Lakes, MongoDB, Data Pipelines, Databases, Relational Databases

Frameworks

Apache Spark

Other

Windows 10, Data Wrangling, Data Structures, Algorithms, Microsoft Azure, Azure Databricks, Azure Data Lake, Azure Data Factory, Data, ETL Development, Data Engineering, Data Analytics, Data Cleaning, Data Processing, Machine Learning, Supervised Machine Learning, Unsupervised Learning, Statistical Methods, Predictive Analytics, Hypothesis Testing, Software Development, Build Pipelines, Data Warehouse Design, Manual Software Testing, Investment Banking Technology, Natural Language Generation (NLG), Technical Requirements, Natural Language Processing (NLP), OpenAI, Text Classification, Azure Stream Analytics, Big Data, Cloud, Data Security, Project Management & Work Tracking Tools, Security, Storage, GPT, Generative Pre-trained Transformers (GPT)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring