Anuj Thakwani, Developer in Noida, Uttar Pradesh, India
Anuj is available for hire
Hire Anuj

Anuj Thakwani

Verified Expert  in Engineering

Software Developer

Location
Noida, Uttar Pradesh, India
Toptal Member Since
June 18, 2020

Anuj is a big data and data warehouse engineer with more than three years of experience. He joined Toptal because freelancing offered exciting project opportunities but he wanted to focus solely on the work and not chase down payment. Anuj specializes in big data and databases but he’s also quite comfortable working with Java, SQL, Scala, and Spark Kafka.

Portfolio

World’s Leading Online Travel Website
Java, Amazon S3 (AWS S3), Apache Kafka, Presto, Apache Hive, SQL, Spark
Yatra.com
Redshift, Parquet, Amazon S3 (AWS S3), Apache Kafka, Spark, SQL, Java
Snapdeal.com
Vertica, Apache Kafka, Parquet, Amazon S3 (AWS S3), Apache Hive, Spark, SQL...

Experience

Availability

Part-time

Preferred Environment

DBeaver, IntelliJ IDEA, Git, EMR, CentOS, Linux

The most amazing...

...thing I’ve built was a generic self service ETL framework which helps users to onboard any type data from any data source to a central S3 data lake.

Work Experience

Software Development Engineer 2

2017 - PRESENT
World’s Leading Online Travel Website
  • Constructed data marts using Spark SQL and pushed data as S3 Parquet files.
  • Migrated data marts from S3 Parquet files to NoSQL data stores like MongoDB, HBase, and so on. The data is stored in NoSQL data stores is then used by front-end teams for their use cases.
  • Sourced data from various data sources like REST APIs, SQL data stores, NoSQL data stores, S3, and more.
  • Ensured the deployment of Spark jobs in a CI/CD environment.
  • Built cubes on Druid and Apache Kylin.
Technologies: Java, Amazon S3 (AWS S3), Apache Kafka, Presto, Apache Hive, SQL, Spark

Senior Data Engineer

2016 - 2017
Yatra.com
  • Developed new ClickStream funnel metrics using a Spark and Kafka data pipeline.
  • Set up, tuned, and maintained a Tungsten-to-RedShift replicator.
  • Maintained and deployed data marts using Spark SQL.
  • Developed an ETL framework for sourcing data from various sources and dumped the events on a central S3 data lake.
  • Optimized indexes and the projections of data marts for reducing the run times of SQL queries that were querying these data marts.
Technologies: Redshift, Parquet, Amazon S3 (AWS S3), Apache Kafka, Spark, SQL, Java

Senior Data Engineer

2014 - 2016
Snapdeal.com
  • Developed ETL jobs in a big data environment related to the fields of supply chains, seller business health, seller DWH, and seller rating.
  • Actively supported the migration of a DWH fact dimension process from MySQL/Pentaho to Vertica.
  • Thoroughly tested the developed jobs before deploying them to production.
  • Actively supported the development team releases that involved database activity to make sure that these activities do not affect the DWH ETL process.
  • Implemented log parsing of various Snapdeal systems and reported the API health metrics such as response time and total hits.
Technologies: Vertica, Apache Kafka, Parquet, Amazon S3 (AWS S3), Apache Hive, Spark, SQL, Java

Analyzing User Activity in ClickStream Events

https://github.com/anujthakwani/useractivitybatch
This is a Spark batch processor to calculate sessions of a user based on their activity.

More details are available in the README file that can be found at the link.
2009 - 2013

Bachelor of Technology Degree in Computer Science and Engineering

IEC College of Engineering and Technology - Greater Noida, India

Languages

SQL, Java, Scala

Frameworks

Apache Spark, Spark, Presto

Paradigms

Agile Software Development

Storage

MySQL, Amazon S3 (AWS S3), DBeaver, Apache Hive, Vertica, Redshift

Tools

Git, IntelliJ IDEA, Spark SQL

Platforms

Linux, CentOS, Apache Kafka

Other

EMR, Parquet

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring