Naga Deepthi Kanamarlapudi, Developer in Québec City, QC, Canada
Naga is available for hire
Hire Naga

Naga Deepthi Kanamarlapudi

Verified Expert  in Engineering

Data Engineer and Developer

Québec City, QC, Canada

Toptal member since May 21, 2024

Bio

Naga Deepthi is a data engineer with 5+ years of software industry experience in various technical domains, from data ingestion and warehousing to visualization. Her experience in the big data analytics front has solidified her expertise in PySpark, Apache Spark's Python API, and TypeScript for data visualization. At Capgemini, Naga transformed raw data into actionable insights, automating ETL pipelines in Azure's Palantir Foundry and building intuitive dashboards for strategic decision-making.

Portfolio

Capgemini
PySpark, SQL, TypeScript, Git, Python, Foundry, Ontology Framework...
Cognizant
PySpark, SQL, TypeScript, Git, Azure Databricks, Foundry, Ontology Framework...

Experience

  • PySpark - 5 years
  • Azure Databricks - 5 years
  • SQL - 5 years
  • Data Engineering - 5 years
  • Azure Data Factory (ADF) - 5 years
  • Python - 5 years
  • Code Review - 4 years
  • Foundry - 4 years

Availability

Part-time

Preferred Environment

Foundry, Azure Databricks, Azure Data Factory (ADF), Azure Data Lake, Azure, Palantir

The most amazing...

...things I've automated are ETL pipelines using Palantir Foundry to build intuitive dashboards, turning data into actionable insights to guide decision-making.

Work Experience

Data Engineer

2022 - PRESENT
Capgemini
  • Developed robust ETL processes to extract, transform, and load data from diverse sources (CSV, databases, and parquet) and leveraged Spark DataFrames for structured data storage and manipulation.
  • Collaborated with data analysts and stakeholders to define data requirements and design ETL solutions that aligned with analytical needs and provided application support.
  • Defined object views and actions on the categorized dataset to identify meaningful insights from the ontology. I have a proven ability to translate complex business requirements into technical specifications and data models.
  • Implemented custom UDFs (user-defined functions) in PySpark to standardize complex data manipulations, improving the maintainability and efficiency of data pipelines.
  • Optimized Spark ETL jobs by applying partitioning, caching, and broadcast variables to enhance performance when processing large-scale insurance datasets.
  • Used Code Workbook and Notebooks extensively for analyzing and transforming datasets and further used them to define relations and object views. Used TypeScript to create functions for visualizations.
  • Gained experience in implementing data quality frameworks and metrics to assess and enhance data quality across various dimensions.
  • Built pipelines using Azure Data Factory from Snowflake to load in Azure Data Lake Storage. Integrated Azure Data Factory with PowerBI for report analysis.
  • Implemented custom UDFs in PySpark to standardize complex data manipulations, improving maintainability and efficiency of data pipelines using Azure Databricks notebooks.
  • Monitored data pipeline health, including build, job, and sync statuses, to swiftly identify and address bottlenecks, safeguarding data quality and reporting timelines.
Technologies: PySpark, SQL, TypeScript, Git, Python, Foundry, Ontology Framework, Visualization, Code Review, Scheduling, Analysis, Debugging, Hard Coding, Build Pipelines, Task Scheduling, Job Schedulers, Ontologies, Objects, Data Pipelines, Workshops, Data Engineering, Palantir, Azure Databricks, Azure Data Factory (ADF), Azure Data Lake, Delta Live Tables (DLT), Spark, DataFrames, Azure, ETL, Azure SQL Databases, Data Governance, Data Lineage, SQL Server Integration Services (SSIS), Medallion Architecture, Microsoft Power BI, Data Cleaning, APIs, Data Scraping, PostgreSQL, Visual Basic, SQL Server 2019

Data Engineer

2020 - 2022
Cognizant
  • Involved actively in gathering the requirements, designing, developing, and testing.
  • Applied strong PySpark development skills within an Agile framework, contributing to ETL logic design, implementation, and testing throughout the SDLC.
  • Designed and developed Spark/PySpark Core/SQL applications to parse and validate the raw input data and applied transformations to store the output as a data frame into ADLS.
  • Gained proficiency in identifying and resolving a wide range of data quality problems, including listing specific data quality issues, e.g., completeness, accuracy, consistency, and timeliness.
  • Used PySpark transformations and actions efficiently to build ETL logic in Azure Databricks. Led the design and implementation of data pipelines, ensuring seamless data ingestion from diverse sources.
  • Executed data cleansing tasks, including deduplication, outlier removal, and missing value imputation, to ensure accurate and reliable data for analysis. Performed extensive data cleansing to improve the accuracy and consistency of datasets.
  • Coordinated with offshore and onsite teams to understand the requirements and prepare a high-level and low-level design and analysis from the requirements specification.
  • Coordinated with the team and was involved in core reviews and application support activities.
  • Monitored data pipeline health, including build, job, and sync statuses, to swiftly identify and address bottlenecks, safeguarding data quality and reporting timelines.
  • Created object models and defined dataset views to structure insurance data, integrating ontological mappings to enhance insights and data exploration.
Technologies: PySpark, SQL, TypeScript, Git, Azure Databricks, Foundry, Ontology Framework, Code Review, Apache Hive, Scheduling, Python, Analysis, Debugging, Hard Coding, Build Pipelines, Task Scheduling, Job Schedulers, Jira, WinSCP, Ontologies, Objects, Data Pipelines, Databricks, Data Engineering, Palantir, Spark, DataFrames, ETL, Data Governance, Data Lineage, SQL Server Integration Services (SSIS), Microsoft Power BI, Data Cleaning, APIs, Data Scraping, Visual Basic, SQL Server 2019

Experience

Stellantis | Center of Excellence

Demonstrated expertise in crafting robust ETL pipelines using Spark Data Frames, Azure Databricks, and Palantir Foundry to extract, transform, and load diverse vehicle information data. I successfully optimized ETL processes for large-scale insurance datasets, leveraging techniques such as partitioning, caching, and broadcast variables. I collaborated closely with stakeholders to ensure data quality and alignment with analytical needs. I demonstrated proficiency in building data pipelines using Azure Data Factory and integrating with Power BI for reporting. I am also experienced in implementing data quality frameworks and metrics to maintain data integrity.

AbbVie | Organizational Units Communication

Demonstrated proficiency in the entire software development lifecycle, from requirements gathering to testing, within an agile environment. I actively contributed to ETL pipeline development using PySpark transformations and actions in Azure Databricks and Palantir Foundry. I am skilled in designing and developing Spark/PySpark applications for data parsing, validation, and transformation. I am also proficient in data quality assurance, including data cleansing, deduplication, and outlier removal. I am experienced in building data pipelines using Azure Data Factory and creating visualizations for effective analysis. I collaborated effectively with both onshore and offshore teams to ensure successful project delivery.

Education

2015 - 2019

Bachelor's Degree in Information Technology

Bhoj Reddy Engineering College for Women - Hyderabad, Telangana, India

Certifications

MAY 2024 - MAY 2026

Palantir Foundry Data Engineer Associate

Palantir Technologies

MAY 2024 - PRESENT

Foundry Foundations

Palantir Technologies

MAY 2024 - MAY 2025

Academy Accreditation – Databricks Fundamentals

Databricks

MAY 2024 - MAY 2025

Academy Accreditation – Databricks Lakehouse Fundamentals

Databricks

MAY 2024 - PRESENT

Get Started with Databricks for Data Engineering

Databricks

Skills

Libraries/APIs

PySpark

Tools

Microsoft Power BI, Git, Jira, WinSCP

Languages

SQL, Python, Visual Basic, TypeScript, Java 8

Paradigms

ETL

Platforms

Databricks, Azure

Storage

SQL Server 2019, Azure SQL Databases, SQL Server Integration Services (SSIS), PostgreSQL, Apache Hive, Data Pipelines

Frameworks

Spark, Ontology Framework, Delta Live Tables (DLT)

Other

Foundry, Data Engineering, Palantir, Data Lineage, Medallion Architecture, Azure Databricks, Scheduling, Code Review, Azure Data Factory (ADF), Azure Data Lake, DataFrames, Data Governance, Data Cleaning, APIs, Data Scraping, Visualization, Analysis, Debugging, Hard Coding, Build Pipelines, Task Scheduling, Job Schedulers, Ontologies, Objects, Workshops

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring