
Rahul Valupadasu
Verified Expert in Engineering
Azure Data Engineer and Developer
Toronto, ON, Canada
Toptal member since December 12, 2024
Rahul is an expert Azure data engineer with over five years of experience delivering big data solutions across the Azure ecosystem. He specializes in Azure Data Factory, Databricks, Delta Lake, Delta Live Tables (DLT), and Spark, including PySpark, SparkSQL, and Scala, for real-time and batch data processing. Rahul designs and optimizes high-performance data systems with robust data governance, ensuring actionable insights and business success.
Portfolio
Experience
- Spark - 5 years
- PySpark - 5 years
- Big Data - 5 years
- Azure - 4 years
- Azure Databricks - 4 years
- Azure Data Factory (ADF) - 4 years
- Microsoft Fabric - 2 years
- Microsoft Power BI - 2 years
Availability
Preferred Environment
Azure Databricks, Azure Data Factory (ADF), Azure Data Lake Storage, Azure SQL Databases, Azure Synapse Analytics, Delta Lake, Delta Live Tables (DLT), Unity Catalog, Spark, PySpark
The most amazing...
...thing I've implemented is a DLT pipeline in Azure Databricks using Spark and Azure Data Factory, which reduced data processing time by 30%.
Work Experience
Data Engineer
Scotiabank
- Designed and deployed scalable ETL pipelines using Azure Data Factory and Databricks, improving data ingestion efficiency by 30% for critical business processes.
- Implemented DLT in Databricks, streamlining real-time and batch data processing workflows, reducing latency, and improving data pipeline reliability.
- Migrated data solutions from on-premises to the Azure ecosystem, leveraging Azure Data Factory, Azure Synapse Analytics, and ADLS to improve scalability and reduce storage costs by 20%.
- Optimized Spark-based workflows in Azure Databricks, reducing data processing times by 40% while ensuring high performance for large-scale datasets.
- Implemented Unity Catalog for centralized data governance, enabling streamlined data access management and ensuring regulatory compliance across the enterprise.
- Built real-time streaming pipelines using Azure Event Hubs, Kafka, and Azure Databricks, enabling seamless data integration for near-instant analytics and reporting.
- Enhanced data performance through partitioning and optimization techniques, achieving 30% faster query execution in Delta Lake for reporting and analysis.
Data Engineer
Inkresults-Outsourcing
- Developed PySpark scripts to perform comprehensive data profiling, validation, and quality checks on Azure Data Lake Storage, enhancing data reliability and trust.
- Migrated legacy SSIS packages to Azure Data Factory, modernizing ETL processes and achieving a 30% improvement in scalability and flexibility.
- Coordinated ETL workflows using Apache Airflow and Azure Databricks, ensuring reliable and automated data pipelines with 95% accuracy.
- Built PySpark-based ETL workflows to automate complex transformations, enhancing data processing efficiency by 25% and improving data quality checks.
- Collaborated with stakeholders to develop actionable Power BI visualizations, enabling data-driven decision-making and improving business insights.
- Reduced ETL processing time by 20% by optimizing Azure Databricks pipelines and implementing config-driven solutions for flexible workflows.
- Optimized SQL queries and data models, leading to a 50% improvement in query performance for complex analytical queries.
- Developed Spark programs in PySpark for data transformation and quality checks, ensuring consistency and integrity across multiple datasets.
- Implemented strong data governance practices with role-based access control (RBAC) and data encryption, ensuring compliance with GDPR and industry data privacy standards.
- Implemented data transformation strategies using Python (Pandas) and Azure Data Factory, reducing data cleansing and enrichment time by 30%.
Experience
Enterprise Data Platform Modernization
• Building robust ETL pipelines using Azure Data Factory and Databricks to ingest and process structured and unstructured data from multiple financial systems.
• Implementing Delta Live Tables and PySpark in Databricks for efficient real-time data transformations and incremental loads from bronze to silver layers following the medallion architecture.
• Integrating Unity Catalog to centralize data security, manage access, and enforce compliance across the enterprise.
• Leveraging Azure Data Lake Storage and Azure Synapse Analytics for scalable storage and querying of large datasets, enhancing performance and business insights.
• Achieving a 30% reduction in processing time by optimizing Spark-based workflows and improving data pipeline performance.
This project enhanced data reliability, faster insights, and improved governance for critical financial reporting and decision-making.
Revenue Cycle Management Data Integration Project
• Designing and implementing ETL pipelines in Azure Data Factory to extract data from on-premises systems, cloud databases, and APIs.
• Utilizing PySpark and Databricks for advanced data cleansing, enrichment, and validation, ensuring 99% data accuracy.
• Optimizing SQL queries and applied partitioning techniques to improve the performance of complex analytical workloads by 50%.
• Ensuring HIPAA compliance by implementing RBAC and encrypting sensitive data.
• Integrating Power BI dashboards with Azure SQL Database to deliver actionable insights into patient revenue trends, improving decision-making for healthcare administrators.
This project improved operational efficiency, enabled real-time revenue tracking, and reduced reporting latency by 20%.
Education
Postgraduate Degree in Computer Science
Lambton College - Toronto, Canada
Certifications
Microsoft Certified Azure Data Associate
Microsoft
Skills
Libraries/APIs
PySpark, Pandas
Tools
Spark SQL, Microsoft Power BI, Azure Monitor, Apache Airflow, Control-M, Tableau, Jira
Languages
SQL, Python
Frameworks
Spark, Delta Live Tables (DLT), Hadoop, Windows PowerShell
Storage
Azure SQL Databases, IBM Db2, Databases, Database Performance, Microsoft SQL Server, Data Pipelines, Database Administration (DBA), SQL Server DBA, SQL Performance
Paradigms
ETL
Platforms
Azure Data Lake Storage, Azure, Databricks, Linux, Microsoft Fabric, Azure Synapse Analytics, Jupyter Notebook
Other
Azure Databricks, Data Migration, Database Partitioning, Azure Data Factory (ADF), Delta Lake, Unity Catalog, Data Cleansing, Data Profiling, Data Quality, Big Data, Data Engineering, Distributed Systems, Data Cleaning, Data Conversion, Migration, FTP, OneLake
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring