Tapan Bhatt, Developer in Hyderabad, India
Tapan is available for hire
Hire Tapan

Tapan Bhatt

Verified Expert  in Engineering

Data Engineer and Software Developer

Hyderabad, India

Toptal member since December 5, 2022

Bio

Tapan has more than 16 years of experience in developing data-driven applications. He writes high-quality and scalable code for large-scale products, data warehouses, and data lakes in on-premise and cloud setups. He specializes in writing powerful analytical queries and designing high-performance batch and streaming workloads using big data processing frameworks like Hadoop, Spark, and Kafka.

Portfolio

A5 Labs
Databricks, Apache Spark, Apache Kafka, Amazon Elastic Container Service (ECS)...
JPMorgan Chase
Spark SQL, Apache Hive, Apache Sqoop, Apache Impala, Git, Bitbucket, HBase...
ValueLabs
Spark, Spark SQL, Apache Kafka, Amazon Web Services (AWS), Amazon S3 (AWS S3)...

Experience

  • SQL - 15 years
  • Spark - 8 years
  • Python 3 - 8 years
  • Oracle - 8 years
  • SQL Server 2014 - 8 years
  • Spark SQL - 8 years
  • Amazon Web Services (AWS) - 6 years
  • NoSQL - 5 years

Availability

Full-time

Preferred Environment

Windows, Python 3, SQL, DBeaver, Visual Studio Code (VS Code), Notepad++

The most amazing...

...code I've developed is a complex rule-based engine that performed six times faster than the legacy system, using only one sixth number of lines of code.

Work Experience

Senior Data Engineer

2021 - PRESENT
A5 Labs
  • Integrated a CRM module with the company's gaming app by sending messages from Kafka. It was used in designing high-budget marketing campaigns by the operations team, resulting in faster acquisition of new customers for the newly launched gaming app.
  • Rewrote the Kafka-to-Databricks pipeline with new architecture in structured Spark streaming, resulting in fewer AWS resources consumed due to performance increase. Reduced AWS resources usage led to annual savings of $20,000 for the company.
  • Designed from scratch an Apache Spark-based framework to ETL of MySQL/MongoDB datasets into Databricks. It takes a JSON configuration to load data, automatically merge schema changes, and save job history in metadata.
  • Implemented an org-wide cost dataset using a Databricks notebook, which provided insights into the company's cost of acquisition for new customers for the newly launched gaming app.
Technologies: Databricks, Apache Spark, Apache Kafka, Amazon Elastic Container Service (ECS), MongoDB, Python 3, MySQL, Amazon Web Services (AWS), PySpark, Bitbucket, MongoDB Shell, NoSQL, Data Engineering, Data Warehousing, Python, ETL, Data Pipelines, Amazon S3 (AWS S3)

Senior Associate

2017 - 2021
JPMorgan Chase
  • Served as the enrichment lead and pioneered high-performance Spark SQL-based enrichment guidelines.
  • Contributed toward Spark SQL-based enrichment engine leveraging most advanced features of Spark 2.x.
  • Created a data obfuscation service that masks PI attributes for data requirements in a lower environment.
  • Designed a compression service that compressed 90TB of data within a matter of hours and gained 75% storage savings.
Technologies: Spark SQL, Apache Hive, Apache Sqoop, Apache Impala, Git, Bitbucket, HBase, NoSQL, Jira, Data Engineering, Query Optimization, Data Pipelines

Project Lead

2016 - 2017
ValueLabs
  • Designed a new messaging system based on Amazon Kinesis and Spark.
  • Implemented Spark receivers to scan new data arriving in the S3 bucket in real time.
  • Designed a NoSQL (HBase) schema to enable fast scan and high throughput.
  • Implemented real-time algorithms to detect bot activity based on user agents and IP addresses.
Technologies: Spark, Spark SQL, Apache Kafka, Amazon Web Services (AWS), Amazon S3 (AWS S3), Amazon Kinesis, Amazon Athena, Jira, Bitbucket

Senior Application Engineer

2011 - 2015
Oracle
  • Performed root cause analysis for the incoming bugs, which resulted in a 40% reduction in bug backlog within one year.
  • Developed generic SQL for the identified data corruption patterns and automated it, which reduced new bug inflow by 30%.
  • Identified critical performance fixes in P2P processes, which resulted in a 50% increase in speed.
Technologies: Oracle 10g

Software Engineer

2006 - 2011
CSSI (now Vertafore)
  • Designed and implemented a core rule-based brokerage calculation engine.
  • Developed ETL routines for the central insurance data warehouse.
  • Created generic stored procedures to calculate core metrics of agents like retention, growth, disenrollment, and persistency.
Technologies: Oracle, SQL Server 2014

Software Enginner

2005 - 2006
Universal Software
  • Developed multi-threaded and auto-correction-enabled data entry applications using C#.NET and SQL Server 2005, which enables faster experience for operators.
  • Designed generic T-SQL stored procedures that return matching results of valid addresses based on a few words entered by the data entry operator.
  • Participated in weekly requirement calls with the product team.
Technologies: SQL Server 2005

Experience

Video Analytics Platform

A cloud-based platform that received real-time data from around 80,000 users based in Europe. I developed the Spark-based application, which processed data landing on S3 Dropbox in real time. Also, designed an HBase-backed bot detection engine that identified and blacklisted malicious IP addresses with 97% efficiency.

Brokerage Calcultion Engine for an Insurance Firm

A rule-based engine for an insurance product. I served as the database developer and completely rewrote the entire calculation engine from scratch using T-SQL. The end result was mesmerizing—the new version performed 6x faster than the legacy component and was much easier to maintain because it took only one-sixth number of lines of code.

The new version could process 430,000 transactions in one hour compared to only 75,000 processed by the legacy version. I also added additional features in the new version, such as pre-payments, plan transfers, recurring payments, and deductions.

Customer.io Integration for an Online Gaming Research Firm

Customer.io (CIO) is a CRM platform used by marketers to connect with their users and increase their engagement through powerful marketing campaigns. I worked as an integration specialist and supplied real-time Kafka events by calling CIO endpoints. I also worked with the OPS team to design the workflow of CRM campaigns used to send welcome bonuses.

Data Validation Tool for Hadoop

I served as an Apache Spark developer for one of the world's largest banks. Our team was building a Hadoop-based data lake where data from multiple sources was loaded and later used for advanced analysis.

I implemented a Spark-based framework that:
• Compared the ingested data with the source in an efficient manner.
• Stored a summary of comparison results for audit purposes, like the number of rows, minimum and maximum, the number of distinct values of each column, and the average value of each numerical column.
• Generated alerts if any mismatch was found during validation.

This resulted in increased confidence of stakeholders in the quality of data stored in the lake, less time spent on identifying root causes of potential issues, and faster delivery of new ingest requests.

Critical Performance Fixes in a Large-scale ERP Software

I worked for a large business suite in the accounts payable module as an application engineer. The module was responsible for storing invoices in the system and paying suppliers on time. The module needed to be upgraded when a new major release became available from Oracle, and old versions went out of support.

The process responsible for upgrades used to be very slow, especially for large customers where millions of invoices needed to be upgraded. I identified key architectural issues in the upgrade process and implemented a new upgrade process using parallel threads that leveraged a multi-core CPU.

The upgrade downtime came down from 12 hours to just 2.5 hours.

Education

2002 - 2005

Master's Degree in Computer Science

Department of Computer Science, Bhavnagar University - Bhavnagar, India

1998 - 2002

Bachelor's Degree in Industrial Engineering

SS College of Engineering - Bhavnagar, India

Certifications

JUNE 2021 - PRESENT

AWS Certified Data Analytics – Specialty

Amazon Web Services

Skills

Libraries/APIs

PySpark, Spark Streaming

Tools

Spark SQL, Git, Bitbucket, Jira, Amazon Athena, Apache Sqoop, Apache Impala, Amazon Elastic MapReduce (EMR), Amazon Elastic Container Service (ECS), MongoDB Shell

Languages

SQL, Python 3, Python

Paradigms

ETL

Frameworks

Spark, Apache Spark

Platforms

Oracle, Apache Kafka, Amazon Web Services (AWS), Databricks

Storage

SQL Server 2014, NoSQL, Data Pipelines, Oracle 10g, Amazon S3 (AWS S3), Apache Hive, HBase, SQL Server 2005, PL/SQL, MongoDB, MySQL

Other

Data Engineering, Data Warehousing, Query Optimization, Amazon Kinesis, Customer.io

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring