
Naga Deepthi Kanamarlapudi
Verified Expert in Engineering
Data Engineer and Developer
Québec City, QC, Canada
Toptal member since May 21, 2024
Naga Deepthi is a data engineer with 5+ years of software industry experience in various technical domains, from data ingestion and warehousing to visualization. Her experience in the big data analytics front has solidified her expertise in PySpark, Apache Spark's Python API, and TypeScript for data visualization. At Capgemini, Naga transformed raw data into actionable insights, automating ETL pipelines in Azure's Palantir Foundry and building intuitive dashboards for strategic decision-making.
Portfolio
Experience
- PySpark - 5 years
- Azure Databricks - 5 years
- SQL - 5 years
- Data Engineering - 5 years
- Azure Data Factory (ADF) - 5 years
- Python - 5 years
- Code Review - 4 years
- Foundry - 4 years
Availability
Preferred Environment
Foundry, Azure Databricks, Azure Data Factory (ADF), Azure Data Lake, Azure, Palantir
The most amazing...
...things I've automated are ETL pipelines using Palantir Foundry to build intuitive dashboards, turning data into actionable insights to guide decision-making.
Work Experience
Data Engineer
Capgemini
- Developed robust ETL processes to extract, transform, and load data from diverse sources (CSV, databases, and parquet) and leveraged Spark DataFrames for structured data storage and manipulation.
- Collaborated with data analysts and stakeholders to define data requirements and design ETL solutions that aligned with analytical needs and provided application support.
- Defined object views and actions on the categorized dataset to identify meaningful insights from the ontology. I have a proven ability to translate complex business requirements into technical specifications and data models.
- Implemented custom UDFs (user-defined functions) in PySpark to standardize complex data manipulations, improving the maintainability and efficiency of data pipelines.
- Optimized Spark ETL jobs by applying partitioning, caching, and broadcast variables to enhance performance when processing large-scale insurance datasets.
- Used Code Workbook and Notebooks extensively for analyzing and transforming datasets and further used them to define relations and object views. Used TypeScript to create functions for visualizations.
- Gained experience in implementing data quality frameworks and metrics to assess and enhance data quality across various dimensions.
- Built pipelines using Azure Data Factory from Snowflake to load in Azure Data Lake Storage. Integrated Azure Data Factory with PowerBI for report analysis.
- Implemented custom UDFs in PySpark to standardize complex data manipulations, improving maintainability and efficiency of data pipelines using Azure Databricks notebooks.
- Monitored data pipeline health, including build, job, and sync statuses, to swiftly identify and address bottlenecks, safeguarding data quality and reporting timelines.
Data Engineer
Cognizant
- Involved actively in gathering the requirements, designing, developing, and testing.
- Applied strong PySpark development skills within an Agile framework, contributing to ETL logic design, implementation, and testing throughout the SDLC.
- Designed and developed Spark/PySpark Core/SQL applications to parse and validate the raw input data and applied transformations to store the output as a data frame into ADLS.
- Gained proficiency in identifying and resolving a wide range of data quality problems, including listing specific data quality issues, e.g., completeness, accuracy, consistency, and timeliness.
- Used PySpark transformations and actions efficiently to build ETL logic in Azure Databricks. Led the design and implementation of data pipelines, ensuring seamless data ingestion from diverse sources.
- Executed data cleansing tasks, including deduplication, outlier removal, and missing value imputation, to ensure accurate and reliable data for analysis. Performed extensive data cleansing to improve the accuracy and consistency of datasets.
- Coordinated with offshore and onsite teams to understand the requirements and prepare a high-level and low-level design and analysis from the requirements specification.
- Coordinated with the team and was involved in core reviews and application support activities.
- Monitored data pipeline health, including build, job, and sync statuses, to swiftly identify and address bottlenecks, safeguarding data quality and reporting timelines.
- Created object models and defined dataset views to structure insurance data, integrating ontological mappings to enhance insights and data exploration.
Experience
Stellantis | Center of Excellence
AbbVie | Organizational Units Communication
Education
Bachelor's Degree in Information Technology
Bhoj Reddy Engineering College for Women - Hyderabad, Telangana, India
Certifications
Palantir Foundry Data Engineer Associate
Palantir Technologies
Foundry Foundations
Palantir Technologies
Academy Accreditation – Databricks Fundamentals
Databricks
Academy Accreditation – Databricks Lakehouse Fundamentals
Databricks
Get Started with Databricks for Data Engineering
Databricks
Skills
Libraries/APIs
PySpark
Tools
Microsoft Power BI, Git, Jira, WinSCP
Languages
SQL, Python, Visual Basic, TypeScript, Java 8
Paradigms
ETL
Platforms
Databricks, Azure
Storage
SQL Server 2019, Azure SQL Databases, SQL Server Integration Services (SSIS), PostgreSQL, Apache Hive, Data Pipelines
Frameworks
Spark, Ontology Framework, Delta Live Tables (DLT)
Other
Foundry, Data Engineering, Palantir, Data Lineage, Medallion Architecture, Azure Databricks, Scheduling, Code Review, Azure Data Factory (ADF), Azure Data Lake, DataFrames, Data Governance, Data Cleaning, APIs, Data Scraping, Visualization, Analysis, Debugging, Hard Coding, Build Pipelines, Task Scheduling, Job Schedulers, Ontologies, Objects, Workshops
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring