
Tapan Bhatt
Verified Expert in Engineering
Data Engineer and Software Developer
Hyderabad, India
Toptal member since December 5, 2022
Tapan has more than 16 years of experience in developing data-driven applications. He writes high-quality and scalable code for large-scale products, data warehouses, and data lakes in on-premise and cloud setups. He specializes in writing powerful analytical queries and designing high-performance batch and streaming workloads using big data processing frameworks like Hadoop, Spark, and Kafka.
Portfolio
Experience
- SQL - 15 years
- Spark - 8 years
- Python 3 - 8 years
- Oracle - 8 years
- SQL Server 2014 - 8 years
- Spark SQL - 8 years
- Amazon Web Services (AWS) - 6 years
- NoSQL - 5 years
Availability
Preferred Environment
Windows, Python 3, SQL, DBeaver, Visual Studio Code (VS Code), Notepad++
The most amazing...
...code I've developed is a complex rule-based engine that performed six times faster than the legacy system, using only one sixth number of lines of code.
Work Experience
Senior Data Engineer
A5 Labs
- Integrated a CRM module with the company's gaming app by sending messages from Kafka. It was used in designing high-budget marketing campaigns by the operations team, resulting in faster acquisition of new customers for the newly launched gaming app.
- Rewrote the Kafka-to-Databricks pipeline with new architecture in structured Spark streaming, resulting in fewer AWS resources consumed due to performance increase. Reduced AWS resources usage led to annual savings of $20,000 for the company.
- Designed from scratch an Apache Spark-based framework to ETL of MySQL/MongoDB datasets into Databricks. It takes a JSON configuration to load data, automatically merge schema changes, and save job history in metadata.
- Implemented an org-wide cost dataset using a Databricks notebook, which provided insights into the company's cost of acquisition for new customers for the newly launched gaming app.
Senior Associate
JPMorgan Chase
- Served as the enrichment lead and pioneered high-performance Spark SQL-based enrichment guidelines.
- Contributed toward Spark SQL-based enrichment engine leveraging most advanced features of Spark 2.x.
- Created a data obfuscation service that masks PI attributes for data requirements in a lower environment.
- Designed a compression service that compressed 90TB of data within a matter of hours and gained 75% storage savings.
Project Lead
ValueLabs
- Designed a new messaging system based on Amazon Kinesis and Spark.
- Implemented Spark receivers to scan new data arriving in the S3 bucket in real time.
- Designed a NoSQL (HBase) schema to enable fast scan and high throughput.
- Implemented real-time algorithms to detect bot activity based on user agents and IP addresses.
Senior Application Engineer
Oracle
- Performed root cause analysis for the incoming bugs, which resulted in a 40% reduction in bug backlog within one year.
- Developed generic SQL for the identified data corruption patterns and automated it, which reduced new bug inflow by 30%.
- Identified critical performance fixes in P2P processes, which resulted in a 50% increase in speed.
Software Engineer
CSSI (now Vertafore)
- Designed and implemented a core rule-based brokerage calculation engine.
- Developed ETL routines for the central insurance data warehouse.
- Created generic stored procedures to calculate core metrics of agents like retention, growth, disenrollment, and persistency.
Software Enginner
Universal Software
- Developed multi-threaded and auto-correction-enabled data entry applications using C#.NET and SQL Server 2005, which enables faster experience for operators.
- Designed generic T-SQL stored procedures that return matching results of valid addresses based on a few words entered by the data entry operator.
- Participated in weekly requirement calls with the product team.
Experience
Video Analytics Platform
Brokerage Calcultion Engine for an Insurance Firm
The new version could process 430,000 transactions in one hour compared to only 75,000 processed by the legacy version. I also added additional features in the new version, such as pre-payments, plan transfers, recurring payments, and deductions.
Customer.io Integration for an Online Gaming Research Firm
Data Validation Tool for Hadoop
I implemented a Spark-based framework that:
• Compared the ingested data with the source in an efficient manner.
• Stored a summary of comparison results for audit purposes, like the number of rows, minimum and maximum, the number of distinct values of each column, and the average value of each numerical column.
• Generated alerts if any mismatch was found during validation.
This resulted in increased confidence of stakeholders in the quality of data stored in the lake, less time spent on identifying root causes of potential issues, and faster delivery of new ingest requests.
Critical Performance Fixes in a Large-scale ERP Software
The process responsible for upgrades used to be very slow, especially for large customers where millions of invoices needed to be upgraded. I identified key architectural issues in the upgrade process and implemented a new upgrade process using parallel threads that leveraged a multi-core CPU.
The upgrade downtime came down from 12 hours to just 2.5 hours.
Education
Master's Degree in Computer Science
Department of Computer Science, Bhavnagar University - Bhavnagar, India
Bachelor's Degree in Industrial Engineering
SS College of Engineering - Bhavnagar, India
Certifications
AWS Certified Data Analytics – Specialty
Amazon Web Services
Skills
Libraries/APIs
PySpark, Spark Streaming
Tools
Spark SQL, Git, Bitbucket, Jira, Amazon Athena, Apache Sqoop, Apache Impala, Amazon Elastic MapReduce (EMR), Amazon Elastic Container Service (ECS), MongoDB Shell
Languages
SQL, Python 3, Python
Paradigms
ETL
Frameworks
Spark, Apache Spark
Platforms
Oracle, Apache Kafka, Amazon Web Services (AWS), Databricks
Storage
SQL Server 2014, NoSQL, Data Pipelines, Oracle 10g, Amazon S3 (AWS S3), Apache Hive, HBase, SQL Server 2005, PL/SQL, MongoDB, MySQL
Other
Data Engineering, Data Warehousing, Query Optimization, Amazon Kinesis, Customer.io
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring