Pappu Yadav, Developer in Gurugram, Haryana, India
Pappu is available for hire
Hire Pappu

Pappu Yadav

Verified Expert  in Engineering

Software Engineer and Developer

Gurugram, Haryana, India

Toptal member since June 14, 2022

Bio

Pappu is a software engineer with skills in BigData, Spark, Hadoop, Hive, and Presto DB. He focuses on building ETL frameworks to process large amounts of data in batches and in real time with Spark Streaming. With his expertise in optimizing Spark jobs and experience writing complex SQL queries for debugging and analysis, Pappu builds platform tools used by various business teams and REST APIs in Java using frameworks like Spring and Hibernate.

Portfolio

Airtel India
Java, Spark, Spark Structured Streaming, Apache Hive, Hadoop, Big Data...
Mobileum
Apache Spark, Presto, MySQL, Java, Spark, Spark Structured Streaming, Big Data...
MobiKwik
Java, MySQL, PostgreSQL, Back-end Development, SQL, API Documentation...

Experience

  • Back-end Development - 7 years
  • Java - 7 years
  • Databases - 7 years
  • MySQL - 7 years
  • Hadoop - 4 years
  • Apache Spark - 4 years
  • Big Data - 4 years
  • Spark - 4 years

Availability

Part-time

Preferred Environment

MacOS, Linux

The most amazing...

...thing I've developed from scratch is an ETL Spark framework, enabling teams in the organization to run their ETL jobs without worrying about complexities.

Work Experience

Technical Lead

2020 - 2022
Airtel India
  • Developed a framework that enabled various business teams to run any ETL job on top of Spark and Spark Streaming.
  • Changed the Presto code to remove the data duplication issue on the query engine.
  • Developed a real-time reconciliation framework that performed reconciliation on any number of input sources based on configured rules in real time.
Technologies: Java, Spark, Spark Structured Streaming, Apache Hive, Hadoop, Big Data, Data Migration, Apache Hudi, Apache Airflow, ETL, MongoDB, Frameworks, Data Engineering, PostgreSQL, Back-end Development, SQL, API Documentation, OOP Designs, Git

Senior Software Engineer

2018 - 2020
Mobileum
  • Created an ETL pipeline from scratch to process real-time data using Spark Streaming.
  • Tracked in real time users roaming outside the country and assigned scores for each user based on the quality of calls, SMS, and data. Used Spark Streaming to manage trips of users.
  • Developed a framework to overcome small file problems in Spark Streaming job using custom compaction of files written over Hadoop.
Technologies: Apache Spark, Presto, MySQL, Java, Spark, Spark Structured Streaming, Big Data, Data Migration, Cloudera, Data Engineering, PostgreSQL, ETL, Back-end Development, SQL, API Documentation, OOP Designs, Git

Senior Software Engineer

2017 - 2018
MobiKwik
  • Built a cab booking API to enable cab booking from the mobile app without having to install the app. Integrated it with a cab aggregator Ola in the back end for the actual cab booking and ride tracking.
  • Developed the bike rental booking APIs to enable users to rent bikes within the app. Integrated it with a bike rental aggregator in the back end for the actual booking.
  • Built a booking cancellation flow in the hotel booking module, enabling the users to cancel a hotel booking, and integrated the APIs provided by the hotel aggregator.
Technologies: Java, MySQL, PostgreSQL, Back-end Development, SQL, API Documentation, OOP Designs, Git

Software Engineer

2015 - 2017
PayU India
  • Developed a credit card app that enabled users to track transaction history, enable or disable the card, and approve transactions with in-app notification.
  • Made changes in the dashboards that track payment transactions' health in real time.
  • Developed a payment flow that bypassed the payment gateway and integrated directly with the bank to facilitate user payments.
Technologies: Java, Databases, OOP Designs, API Documentation, Back-end Development, PostgreSQL, Database Design, Springbot, Git

Rule Engine

Rule Engine enables business users to configure some rules on configured feed/source; when a rule is breached, a notification will be sent to the business user.
Rules can be point query or aggregated rule types, which aggregate data for the configured time.
The whole framework is built using the Spark Streaming engine.

Spark Batch and Streaming ETL Framework

This is a generic framework to run any Spark ETL job.
Job is configured using a JSON file in which users can configure source and sink, define input format, location, and other relevant information.
Source and sink configurations are maintained in different Postgres databases.
The framework can be used across the organization.

Generic Recon Framework

A generic framework that provides 1-1 mapping of records between any number of sources in real-time.
Users can define configurable rules on the basis on which records can be grouped later.
The framework also has the capability for late arrival events, including state management of records within Spark memory.
Users can also view the partial reconciled records, and later they can be moved to reconciled.
2011 - 2015

Bachelor's Degree in Computer Science

Delhi Technological University - New Delhi, India

Tools

Git, Apache Airflow, Cloudera, RabbitMQ

Languages

Java, SQL

Frameworks

Spark, Spark Structured Streaming, Presto, Hadoop, Apache Spark

Storage

Databases, Apache Hive, MySQL, PostgreSQL, MongoDB

Paradigms

ETL, Database Design

Platforms

Apache Hudi

Other

API Documentation, Back-end Development, OOP Designs, Data Engineering, Big Data, Frameworks, Springbot, Data Migration

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring