Preethi B
Verified Expert in Engineering
Data Engineer and Developer
Nashville, TN, United States
Toptal member since July 10, 2024
Preethi is a versatile data engineer with extensive experience across various industries, specializing in Azure, AWS, and Informatica. She designs, develops, and maintains robust data pipelines, utilizing Agile methodologies to ensure efficient project delivery. Preethi also excels in cloud application integration and cloud data integration, offering valuable insights for seamlessly integrating and optimizing data solutions.
Portfolio
Experience
- Python - 10 years
- SQL - 10 years
- Azure - 9 years
- Azure Data Factory - 9 years
- Hadoop - 9 years
- ETL - 9 years
- Snowflake - 8 years
- CI/CD Pipelines - 8 years
Availability
Preferred Environment
SQL, Kubernetes, Hadoop, Snowflake, PySpark, Azure Databricks, Data Modeling, ETL Implementation & Design, Data Engineering, Microsoft Power BI
The most amazing...
...thing I've done is design, develop, and implement a real-time data analytics pipeline using Azure—streamlining data ingestion and integrating analytics.
Work Experience
Senior Data Engineer
Walmart
- Orchestrated complex data workflows using ADF, integrating batch and streaming data sources, enabling real-time analytics for retail operations such as inventory management and pricing optimization.
- Designed and implemented automated ADF pipelines to ingest and transform large datasets into Snowflake from multiple retail data sources, enhancing the overall data architecture for sales and inventory tracking.
- Developed data integration using Python and Scala to handle real-time data streams and integrated these scripts with Azure Event Hubs and Azure Stream Analytics, which enabled real-time data insights and a 30% increase in operational efficiency.
Senior Data Engineer
Homesite Insurance
- Designed and managed ADF-based ETL pipelines to populate insurance data warehouses with policy, claims, underwriting, and customer data, enabling business intelligence teams to generate accurate reports for operational and regulatory purposes.
- Integrated the fraud detection system with Snowflake and SQL Server for immediate data access and reporting. Reduced false positive rates by optimizing data processing workflows using Hadoop and Scala.
- Improved data reliability, reduced downtime by integrating Kafka for real-time data ingestion, and reduced average claims processing time by 60% through optimized data pipelines.
Senior Data Engineer
Merck Pharma
- Developed a scalable data ingestion pipeline that handles millions of patient records daily from diverse sources such as EHRs, lab systems, and patient portals.
- Implemented a real-time health monitoring system using stream processing technologies to collect and analyze data from wearable devices and IoT sensors, which increased early detection of critical health issues by 30%.
- Implemented a scalable data lake to store and process large volumes of structured and unstructured healthcare data, which improved data accessibility and query performance by 40%, facilitating advanced analytics and research initiatives.
Data Engineer
Grapesoft Solutions
- Utilized Hadoop for distributed storage and processing of large-scale data, improving data processing capabilities, enabling the handling of terabytes of data, and significantly reducing query response times.
- Connected Tableau to various data sources, including SQL databases and Hadoop clusters, and created interactive dashboards and reports with intuitive and interactive visualizations, leading to a 25% increase in report utilization by stakeholders.
- Implemented indexing strategies and query refactoring while performing SQL Queries, which reduced query execution times by 50%, leading to quicker access to critical data and improved overall system performance.
Data Engineer
Avon Technologies Pvt Ltd
- Designed and implemented data models using relational databases (e.g., MySQL, PostgreSQL) or NoSQL databases (e.g., MongoDB), optimizing data storage and retrieval.
- Integrated data from multiple sources (e.g., APIs, flat files, databases) into a centralized data repository, ensuring data consistency and accuracy.
- Implemented version control for ETL scripts and maintained comprehensive documentation for data pipelines and processes, ensuring reproducibility and knowledge sharing.
- Created reports and dashboards using tools like Tableau or Power BI, providing actionable insights to stakeholders and business users.
Experience
Recommendation System
https://github.com/Preethi-68/Recommendation-SystemsOutcomes include understanding association-based models, including the patterns of commonly used recommenders encountered daily, and building basic recommenders that suit business needs.
Exploratory Data Analysis
https://github.com/Preethi-68/Exploratory-Data-Analysis/tree/mainKEY FEATURES
• Univariate, bivariate, and multivariate analysis
• Categorical variable encoding
• Normalization and scaling
• Missing value handling
• Data visualization and storytelling
Model Deployment
https://github.com/Preethi-68/Model-Deployment-I serialized the model using Python's pickle library, allowing efficient model storage and reuse.
To facilitate accessibility, I implemented an API using Flask, creating endpoints that handle JSON inputs and outputs for smooth integration. I containerized the Flask application using Docker for development and production by writing and optimizing Dockerfiles to create efficient Docker images.
Additionally, I orchestrated the deployment using Kubernetes, configuring deployments and services for scalability, load balancing, and rolling updates. I integrated monitoring and logging solutions to track the application's performance and quickly address any issues.
Education
Bachelor's Degree in Computer Science
Gokaraju Rangaraju Institute of Engineering and Technology - Hyderabad, Telangana, India
Skills
Libraries/APIs
PySpark
Tools
GitHub, Tableau, Informatica PowerCenter, Spark SQL, SQL Server BI, Power Query, Azure Machine Learning, Apache Airflow, Microsoft Power BI
Languages
SQL, Python, Snowflake, Scala
Frameworks
Apache Spark, Hadoop, Spark, Flask
Paradigms
ETL, ETL Implementation & Design
Platforms
Azure, AWS IoT, Amazon EC2, Google Cloud Platform (GCP), Kubernetes, Docker, Databricks, Apache Kafka, Amazon Web Services (AWS)
Storage
Amazon S3 (AWS S3), Data Integration, Data Validation, Redshift, Azure Cosmos DB
Other
Big Data, Azure Data Factory, CI/CD Pipelines, Informatica, Amazon EMR Studio, Informatica Cloud, Model Building, Data Warehouse Design, Scripting, Data Cleaning, Feature Engineering, Reporting, Normalization, Azure Databricks, Data Engineering, Software Development, Exploratory Data Analysis, Software Development Lifecycle (SDLC), Visualization, Storytelling, Statistical Analysis, Scaling, Model Deployment, Data Modeling
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring