Pawan Warade
Verified Expert in Engineering
Data Engineer and Developer
Tokyo, Japan
Toptal member since May 24, 2024
Pawan is a data engineer with 9+ years of experience in data warehouse design, solution architecture, and managing offshore teams. He implements end-to-end ETL pipelines in Agile with Talend, Informatica, and PySpark and creates modern data lakes with Snowflake. Pawan also works with databases like Teradata, Greenplum, and Hive and drives informed decisions by crafting insightful visualizations with Tableau and Power BI.
Portfolio
Experience
- Talend ETL - 5 years
- Data Modeling - 4 years
- Snowflake - 3 years
- Informatica ETL - 3 years
- Python - 3 years
- Business Intelligence (BI) - 3 years
- Amazon S3 (AWS S3) - 3 years
- Power BI Desktop - 2 years
Availability
Preferred Environment
Snowflake, Talend ETL, Informatica ETL, Azure Data Factory, Azure Databricks, Power BI Desktop, Teradata, Greenplum, Business Intelligence (BI), Data Warehousing
The most amazing...
...thing I've designed is a scalable data lake using Snowflake and Talend, enabling Agile and efficient data management and analytics.
Work Experience
Data Engineer
Indo-Sakura Software Japan
- Designed comprehensive ETL workflows and documented requirements, converting them into Talend-specific syntax for implementation. Created reusable frameworks in Talend for similar job types, enhancing maintainability and scalability.
- Developed robust data models, including detailed entity lists and attribute lists, to ensure data integrity and consistency. Implemented data quality checks and validation routines to ensure data accuracy, completeness, and consistency.
- Developed Talend jobs to efficiently import data from REST APIs and ServiceNow, ensuring seamless data integration. Implemented best practices for optimizing data pipelines, including performance tuning.
- Successfully migrated the existing HDFS file system to AWS S3, optimizing data storage and access.
- Designed and implemented Snowflake pipelines using stages, Snowpipe, streams, and tasks to handle semi-structured data ingestion and processing efficiently.
- Created secure and shared views in Snowflake to facilitate controlled data access and collaboration. Utilized the Snowflake COPY command efficiently to load data into Snowflake, ensuring optimal performance.
- Developed automation scripts in Python and Shell for various data engineering tasks, including data extraction, transformation, and loading (ETL).
- Identified key performance indicators (KPIs) and created interactive dashboards in Power BI for actionable insights.
- Analyzed existing Informatica workflows to facilitate their migration to Azure Data Factory, resulting in improved scalability and a 20% reduction in operational costs.
ETL and BI Developer
Capgemini India
- Involved in all activities from scratch, including designing and developing ETL jobs according to requirements and specifications using agile methodology and Jira, ensuring timely and accurate data integration.
- Designed ETL mappings and workflows to fetch data from multiple sources (e.g., .xml, .csv, .txt) or databases and loaded data into relational tables or files, increasing data accessibility by 40%.
- Optimized ETL jobs to achieve maximum execution speed and data transfer efficiency by enabling multi-thread execution and utilizing various optimization and parallelization options in Talend, reducing processing time.
- Created Tableau data sources and dashboards based on user requirements, enhancing data visualization and reporting capabilities and leading to a 30% increase in user satisfaction.
- Implemented data blending, customized queries, and complex table calculations in Tableau, enabling more sophisticated data analysis and insights.
ETL Developer
Tata Consultancy Services
- Created mappings and mapplets using various transformations, scheduled sessions in Informatica's Workflow Manager, modified Teradata structures, and developed shell scripts for data file management.
- Converted Informatica ETL jobs to Talend and migrated Teradata BTEQ scripts to Greenplum. Managed Talend jobs through Job Conductor and developed jobs for various file formats, automating FTP data retrieval.
- Collaborated with colleagues, contributing to our collective success. Quickly grasped complex concepts and efficiently resolved bugs, enhancing project quality.
Experience
X360 Data Lake
Additionally, I orchestrated Talend job execution using Docker images in Amazon ECS containers, leveraging step functions and event bridge scheduler for automation. I created a Snowflake pipeline using stages, Snowpipes, streams, and tasks for semi-structured data processing and developed secured and shared views in Snowflake to ensure data accessibility and security. I also efficiently utilized Snowflake's copy command for data loading operations and managed an offshore team using Agile methodology to ensure project milestones were met. Finally, I identified KPIs and developed dashboards in Power BI for data visualization and analysis.
Insurance Campaign Management
• Utilized PySpark to convert unstructured data into structured CSV format.
• Implemented robust data transformation techniques to ensure data quality and consistency.
• Uploaded the transformed CSV files to Azure Data Lake Storage, providing scalable and secure data storage.
• Leveraged the capabilities of Azure Data Lake for efficient data management and retrieval.
ETL Processing:
• Conducted ETL (extract, transform, load) operations on the stored files using Azure Data Factory.
• Loaded the processed data into SQL Server for further analysis and reporting.
Collaboration with BI Team:
• Assisted the Power BI team in accessing the processed data.
• Created views and data marts to facilitate the creation of insightful and interactive dashboards.
Education
Bachelor's Degree in Information Technology
Nagpur University - Nagpur, India
Certifications
Microsoft Certified: Azure Fundamentals
Microsoft
Skills
Tools
Talend ETL, Informatica ETL, Power BI Desktop, Microsoft Power BI, Tableau
Languages
Snowflake, SQL, Python
Paradigms
Dimensional Modeling, Business Intelligence (BI), Software Testing
Storage
Amazon S3 (AWS S3), Database Management Systems (DBMS), Teradata, Greenplum, HDFS
Frameworks
Data Lakehouse, Spark
Platforms
Azure
Other
Data Modeling, Data Warehousing, Big Data, Data Analysis, Data Governance, Data Visualization, Azure Data Factory, Azure Databricks, Software Development, Informatica, Slowly Changing Dimensions (SCD), Data Architecture
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring