Anuj Thakwani
Verified Expert in Engineering
Software Developer
Anuj is a big data and data warehouse engineer with more than three years of experience. He joined Toptal because freelancing offered exciting project opportunities but he wanted to focus solely on the work and not chase down payment. Anuj specializes in big data and databases but he’s also quite comfortable working with Java, SQL, Scala, and Spark Kafka.
Portfolio
Experience
Availability
Preferred Environment
DBeaver, IntelliJ IDEA, Git, EMR, CentOS, Linux
The most amazing...
...thing I’ve built was a generic self service ETL framework which helps users to onboard any type data from any data source to a central S3 data lake.
Work Experience
Software Development Engineer 2
World’s Leading Online Travel Website
- Constructed data marts using Spark SQL and pushed data as S3 Parquet files.
- Migrated data marts from S3 Parquet files to NoSQL data stores like MongoDB, HBase, and so on. The data is stored in NoSQL data stores is then used by front-end teams for their use cases.
- Sourced data from various data sources like REST APIs, SQL data stores, NoSQL data stores, S3, and more.
- Ensured the deployment of Spark jobs in a CI/CD environment.
- Built cubes on Druid and Apache Kylin.
Senior Data Engineer
Yatra.com
- Developed new ClickStream funnel metrics using a Spark and Kafka data pipeline.
- Set up, tuned, and maintained a Tungsten-to-RedShift replicator.
- Maintained and deployed data marts using Spark SQL.
- Developed an ETL framework for sourcing data from various sources and dumped the events on a central S3 data lake.
- Optimized indexes and the projections of data marts for reducing the run times of SQL queries that were querying these data marts.
Senior Data Engineer
Snapdeal.com
- Developed ETL jobs in a big data environment related to the fields of supply chains, seller business health, seller DWH, and seller rating.
- Actively supported the migration of a DWH fact dimension process from MySQL/Pentaho to Vertica.
- Thoroughly tested the developed jobs before deploying them to production.
- Actively supported the development team releases that involved database activity to make sure that these activities do not affect the DWH ETL process.
- Implemented log parsing of various Snapdeal systems and reported the API health metrics such as response time and total hits.
Experience
Analyzing User Activity in ClickStream Events
https://github.com/anujthakwani/useractivitybatchMore details are available in the README file that can be found at the link.
Education
Bachelor of Technology Degree in Computer Science and Engineering
IEC College of Engineering and Technology - Greater Noida, India
Skills
Languages
SQL, Java, Scala
Frameworks
Apache Spark, Spark, Presto
Paradigms
Agile Software Development
Storage
MySQL, Amazon S3 (AWS S3), DBeaver, Apache Hive, Vertica, Redshift
Tools
Git, IntelliJ IDEA, Spark SQL
Platforms
Linux, CentOS, Apache Kafka
Other
EMR, Parquet
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring