Preethi B, Developer in Nashville, TN, United States
Preethi is available for hire
Hire Preethi

Preethi B

Verified Expert  in Engineering

Data Engineer and Developer

Nashville, TN, United States

Toptal member since July 10, 2024

Bio

Preethi is a versatile data engineer with extensive experience across various industries, specializing in Azure, AWS, and Informatica. She designs, develops, and maintains robust data pipelines, utilizing Agile methodologies to ensure efficient project delivery. Preethi also excels in cloud application integration and cloud data integration, offering valuable insights for seamlessly integrating and optimizing data solutions.

Portfolio

Walmart
Azure, Hadoop, SQL, Apache Airflow, Snowflake, Apache Kafka, Azure Data Factory...
Homesite Insurance
Apache Spark, Scala, Spark SQL, Amazon S3 (AWS S3), Apache Kafka, AWS IoT...
Merck Pharma
Amazon EC2, Amazon S3 (AWS S3), Amazon EMR Studio, Informatica Cloud, Python...

Experience

  • Python - 10 years
  • SQL - 10 years
  • Azure - 9 years
  • Azure Data Factory - 9 years
  • Hadoop - 9 years
  • ETL - 9 years
  • Snowflake - 8 years
  • CI/CD Pipelines - 8 years

Availability

Part-time

Preferred Environment

SQL, Kubernetes, Hadoop, Snowflake, PySpark, Azure Databricks, Data Modeling, ETL Implementation & Design, Data Engineering, Microsoft Power BI

The most amazing...

...thing I've done is design, develop, and implement a real-time data analytics pipeline using Azure—streamlining data ingestion and integrating analytics.

Work Experience

Senior Data Engineer

2022 - 2024
Walmart
  • Orchestrated complex data workflows using ADF, integrating batch and streaming data sources, enabling real-time analytics for retail operations such as inventory management and pricing optimization.
  • Designed and implemented automated ADF pipelines to ingest and transform large datasets into Snowflake from multiple retail data sources, enhancing the overall data architecture for sales and inventory tracking.
  • Developed data integration using Python and Scala to handle real-time data streams and integrated these scripts with Azure Event Hubs and Azure Stream Analytics, which enabled real-time data insights and a 30% increase in operational efficiency.
Technologies: Azure, Hadoop, SQL, Apache Airflow, Snowflake, Apache Kafka, Azure Data Factory, CI/CD Pipelines, Docker, Apache Spark, Informatica, GitHub, ETL, Exploratory Data Analysis, Python

Senior Data Engineer

2020 - 2022
Homesite Insurance
  • Designed and managed ADF-based ETL pipelines to populate insurance data warehouses with policy, claims, underwriting, and customer data, enabling business intelligence teams to generate accurate reports for operational and regulatory purposes.
  • Integrated the fraud detection system with Snowflake and SQL Server for immediate data access and reporting. Reduced false positive rates by optimizing data processing workflows using Hadoop and Scala.
  • Improved data reliability, reduced downtime by integrating Kafka for real-time data ingestion, and reduced average claims processing time by 60% through optimized data pipelines.
Technologies: Apache Spark, Scala, Spark SQL, Amazon S3 (AWS S3), Apache Kafka, AWS IoT, Exploratory Data Analysis, Informatica PowerCenter, SQL, Python, Kubernetes, Spark, Azure, Tableau

Senior Data Engineer

2018 - 2020
Merck Pharma
  • Developed a scalable data ingestion pipeline that handles millions of patient records daily from diverse sources such as EHRs, lab systems, and patient portals.
  • Implemented a real-time health monitoring system using stream processing technologies to collect and analyze data from wearable devices and IoT sensors, which increased early detection of critical health issues by 30%.
  • Implemented a scalable data lake to store and process large volumes of structured and unstructured healthcare data, which improved data accessibility and query performance by 40%, facilitating advanced analytics and research initiatives.
Technologies: Amazon EC2, Amazon S3 (AWS S3), Amazon EMR Studio, Informatica Cloud, Python, Scala, SQL Server BI, Exploratory Data Analysis, Informatica PowerCenter, SQL, Spark, Spark SQL, Tableau

Data Engineer

2016 - 2018
Grapesoft Solutions
  • Utilized Hadoop for distributed storage and processing of large-scale data, improving data processing capabilities, enabling the handling of terabytes of data, and significantly reducing query response times.
  • Connected Tableau to various data sources, including SQL databases and Hadoop clusters, and created interactive dashboards and reports with intuitive and interactive visualizations, leading to a 25% increase in report utilization by stakeholders.
  • Implemented indexing strategies and query refactoring while performing SQL Queries, which reduced query execution times by 50%, leading to quicker access to critical data and improved overall system performance.
Technologies: Spark SQL, Scala, Tableau, Hadoop, Redshift, Power Query, Exploratory Data Analysis, SQL, Python, Spark

Data Engineer

2014 - 2016
Avon Technologies Pvt Ltd
  • Designed and implemented data models using relational databases (e.g., MySQL, PostgreSQL) or NoSQL databases (e.g., MongoDB), optimizing data storage and retrieval.
  • Integrated data from multiple sources (e.g., APIs, flat files, databases) into a centralized data repository, ensuring data consistency and accuracy.
  • Implemented version control for ETL scripts and maintained comprehensive documentation for data pipelines and processes, ensuring reproducibility and knowledge sharing.
  • Created reports and dashboards using tools like Tableau or Power BI, providing actionable insights to stakeholders and business users.
Technologies: Data Warehouse Design, Scripting, SQL, Python, Software Development Lifecycle (SDLC), Exploratory Data Analysis, Spark, Software Development

Recommendation System

https://github.com/Preethi-68/Recommendation-Systems
Using data to verify similarities is the secret to how Amazon, Flipkart, or YouTube suggest alternatives. The recommendation system is a similarity-based modeling technique that verifies the association of data points and basket preferences based on the same.

Outcomes include understanding association-based models, including the patterns of commonly used recommenders encountered daily, and building basic recommenders that suit business needs.

Exploratory Data Analysis

https://github.com/Preethi-68/Exploratory-Data-Analysis/tree/main
The aim of this project is to perform a comprehensive exploratory data analysis (EDA) on the datasets to understand their underlying structure, identify patterns, detect anomalies, and summarize the main characteristics using both visual and quantitative techniques. The analysis will cover univariate, bivariate, and multivariate analysis to gain deeper insights into the data.

KEY FEATURES
• Univariate, bivariate, and multivariate analysis
• Categorical variable encoding
• Normalization and scaling
• Missing value handling
• Data visualization and storytelling

Model Deployment

https://github.com/Preethi-68/Model-Deployment-
This project showcases the complete lifecycle of a machine learning model from development to deployment and productionalization.

I serialized the model using Python's pickle library, allowing efficient model storage and reuse.

To facilitate accessibility, I implemented an API using Flask, creating endpoints that handle JSON inputs and outputs for smooth integration. I containerized the Flask application using Docker for development and production by writing and optimizing Dockerfiles to create efficient Docker images.

Additionally, I orchestrated the deployment using Kubernetes, configuring deployments and services for scalability, load balancing, and rolling updates. I integrated monitoring and logging solutions to track the application's performance and quickly address any issues.
2010 - 2014

Bachelor's Degree in Computer Science

Gokaraju Rangaraju Institute of Engineering and Technology - Hyderabad, Telangana, India

Libraries/APIs

PySpark

Tools

GitHub, Tableau, Informatica PowerCenter, Spark SQL, SQL Server BI, Power Query, Azure Machine Learning, Apache Airflow, Microsoft Power BI

Languages

SQL, Python, Snowflake, Scala

Frameworks

Apache Spark, Hadoop, Spark, Flask

Paradigms

ETL, ETL Implementation & Design

Platforms

Azure, AWS IoT, Amazon EC2, Google Cloud Platform (GCP), Kubernetes, Docker, Databricks, Apache Kafka, Amazon Web Services (AWS)

Storage

Amazon S3 (AWS S3), Data Integration, Data Validation, Redshift, Azure Cosmos DB

Other

Big Data, Azure Data Factory, CI/CD Pipelines, Informatica, Amazon EMR Studio, Informatica Cloud, Model Building, Data Warehouse Design, Scripting, Data Cleaning, Feature Engineering, Reporting, Normalization, Azure Databricks, Data Engineering, Software Development, Exploratory Data Analysis, Software Development Lifecycle (SDLC), Visualization, Storytelling, Statistical Analysis, Scaling, Model Deployment, Data Modeling

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring