Aniket is available for hire

Aniket Goel

Verified Expert in Engineering

Data Engineering Developer

Location

Delhi, India

Toptal Member Since

June 18, 2020

Aniket is a data engineer with proven industry experience in data lake development, data analytics, real-time streaming, and back-end application development. His work is used by millions of people in the legal and entertainment industries. Aniket builds exceptionally stable solutions for high-traffic, high-visibility projects, and understands what it takes to ensure products are robust and dependable. He also has expertise in the Hadoop ecosystem, AWS Big Data, Apache Kafka, Java, and SQL.

Data Engineering Data Warehousing Data Analysis SQL Java Amazon Web Services (AWS)ETL Apache Airflow Apache Kafka Apache Spark Hadoop Apache NiFi Databricks Python 3 Apache Hive KNIME

Portfolio

MedGeo Ventures

Data Pipelines, Data Engineering, APIs, Data Warehousing...

Hopin LTD

SQL, ETL, Data Engineering, Data Lakes, Redshift, Snowflake, Python, Databricks...

TO THE NEW

Apache Kafka, Cassandra, Elasticsearch, Python, SQL, Java, Big Data, Hadoop...

Experience

SQL - 8 years Amazon Web Services (AWS) - 6 years Apache Airflow - 5 years Apache Spark - 4 years Apache Kafka - 4 years Python 3 - 4 years Databricks - 2 years Terraform - 2 years

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Redshift, Apache Kafka, Hadoop, Databricks, Apache Spark, Looker, SQL, Azure Databricks, Apache Airflow

The most amazing...

...optimized real-time data stream I've developed had a minimum of 10,000 events per second for a video domain application used by a million people.

Work Experience

Data Engineer

2023 - 2023

MedGeo Ventures

Developed a data factory to find the relevant doctors (providers) and organizations (facilities) based on taxonomies that the client (associations) are interested in. Earlier, this task was carried out manually using publicly available data.
Deducted the organizations' and individuals' data using Elasticssearch and SQL based on the addresses.
Developed multiple Knime workflows for data ingestion and data transformation.
Proposed new ideas in generating quality data using public healthcare data.

Technologies: Data Pipelines, Data Engineering, APIs, Data Warehousing, Amazon Web Services (AWS), NoSQL, Google Cloud Platform (GCP), KNIME, Google BigQuery, Java, PostgreSQL, Redshift, Python, MongoDB, Bash, ETL, Data Integration, Apache Airflow, Luigi, Oozie, Scala, Big Data, Databases, Azure, Spatial Databases, GeoPandas, ArcGIS, QGIS, Data Management

Senior Data Engineer

2021 - 2023

Hopin LTD

Implemented the infrastructure as a service using Terraform and Serverless framework, including the set up of the Redshift data warehouse, real-time enrichment framework, Databricks workspace, and user permissions for an online events platform.
Handled the implementation of CI/CD pipelines using GitHub Actions and GitLab pipelines for multiple projects, including deployment of lambda code, spark-streaming jobs, and DBT code.
Developed a bunch of batch ETL pipelines for sources like SQL databases, Stripe, NetSuite, Chargebee, Qualtrics, Zendesk, and others using AWS services and Fivetran.
Built a set of streaming applications using Apache Spark Streaming, Apache Kafka, and Scala on Databricks to implement a near real-time streaming pipeline based on S3.
Developed multiple dashboards on Datadog to track the progress of streaming applications.
Constructed multiple dashboards on Looker, like MAU, WAU, and the event registration progress.
Created an end-to-end enrichment pipeline of user activity tracking events using AWS services like Lambda, Redis, SQS, and Segment.io.

Technologies: SQL, ETL, Data Engineering, Data Lakes, Redshift, Snowflake, Python, Databricks, Data Build Tool (dbt), Serverless Framework, Terraform, Scala, AWS Lambda, Looker, CI/CD Pipelines, Fivetran, Segment.io, Segment, Datadog, Azure Databricks, Data Management

Senior Data Engineer

2017 - 2021

TO THE NEW

Designed and implemented a travel domain data lake using the Hadoop ecosystem.
Managed data analytics using Apache Hive and Apache Spark.
Implemented an optimized real-time streaming platform for an OTT video domain application using Apache Kafka and Cassandra.
Initiated a search engine for an OTT video domain application using Elasticsearch and Java.
Developed an ETL solution using AWS Kinesis and AWS Redshift.

Technologies: Apache Kafka, Cassandra, Elasticsearch, Python, SQL, Java, Big Data, Hadoop, Data Management

Senior Data Engineer

2020 - 2020

Strive VR LLC

Developed a web application to capture live video and stream it to different destinations using the Kinesis Video streaming service.
Built a real-time video streaming platform and Twitch streaming for gym trainers to monitor remotely using AWS Kinesis Video Streams, Java, AWS Lambda, and AWS Elemental.
Stored all streaming videos using optimized API into S3.

Technologies: ETL, Stream Processing, Amazon Kinesis, Amazon S3 (AWS S3), AWS Lambda, Amazon Web Services (AWS), Java

Software Engineer

2015 - 2017

Contata Solutions

Assisted with the patent application process tracker system for SLW that helps attorneys quickly find information related to their case.
Built the databases and UI with Java, JSF, Spring, and Hibernate as the full-stack developer on the team.
Implemented a multi-feature search engine for patent applications using Elasticsearch.
Optimized a legacy application by identifying memory leakage and performance issues.
Created a module in Core Java for the daily processing and storage of data for 10+ million patents.

Technologies: Elasticsearch, MySQL, Spring, Java, SQL, Hibernate

Software Engineer

2014 - 2015

ACT21Softwares Pvt Ltd.

Developed an IDE project using the Eclipse RCP plugin.
Created modules that generate Java code by using AST through drag and drop and UI events.
Automated web application development using clicks and drag and drop features through an IDE.

Technologies: Hibernate, Spring, Java

Experience

Data Lakes Set-up for Organizing Online Events using Databricks

This online event organization wanted to move from the Redshift data warehouse to an optimized data lake to save cost and enhance the platform with more capabilities, such as analyzing the data in a live streaming window of events, a new pipeline set up to implement SCD Type 2 to capture the complete history of changes using Spark Streaming, Apache Spark, AWS Database Migration Service (AWS DMS). The data lake was designed in three layers, bronze, silver, and gold, where each layer successively gets aggregated.

Real-Time User Activity Tracking System

The client wanted to implement a user activity tracking system for data analytics. With the help of Segment.io and its plugins, the applications could send different events as the user uses the app. We built an enrichment pipeline to include attributes not part of row events. The frequency of events was around 200,000 per second on average, and each event was processed in around 70 milliseconds. This was possible by optimizing the pipeline using AWS Lambda cold start, Redis Cache, and RDS connection pool.

Data Warehouse for Online Events Data Analysis

The online event organization wants to set up its data warehouse for data analysis and dashboarding. I helped the team set the Redshift instance through Terraform end-to-end and ETL pipeline development using Fivetran and AWS Services like DMS, Lambda, Glue, and others. The data was consumed from Qualtrics, Stripe, Chargebee, Salesforce, RDS, Zendesk, and other sources. The data warehouse was used by many stakeholders, like data analysts and engineers, ad-hoc queries, business insights, and dashboard development on Looker. I also contributed to setting up the Looker instance and dashboard development.

Application for a Full-fitness Virtual Reality Gym Experience

Strive VR LLC is a company with remote users to do workout sessions. The client aims to optimize the support structure and wants a remote support center to stream, record, break each video and store it on the server to have a full repository of these sessions. I implemented the requirements using Java Swing for the desktop application and JavaScript for the web page, and the whole streaming process was done using AWS Kinesis Video Stream.

Seera Data Lake

Al Tayyar Group is a Saudi-based organization involved in many industries, including leisure, tourism, education, transport, real estate, retail trading, hospitality, aviation, and food and beverages. Having many different organizations with millions of customers, they wanted to analyze their data to understand better how to increase their revenue and productivity. I built a data lake that consumes data and incrementally puts that into a Hadoop Distributed File System (HDFS). It then uses a Hadoop ecosystem and other tools to analyze this massive amount of data and visualize it using D3.js, Qlik Sense, and Tableau after processing.

Data Lake on AWS for a US-based Pharmaceutical Organization

A US-based organization that invests in scientific innovation to create transformative medicines for people with serious diseases. The company has multiple approved medications that treat the underlying cause of cystic fibrosis (CF)—a rare, life-threatening genetic disease—and has several ongoing clinical and research programs in CF. Having many different medicines and research with millions of customers, they wanted to analyze their data to understand better how to increase their revenue and productivity.

I built a data lake that consumes data and incrementally puts that into AWS S3. It then uses AWS services and other tools to analyze this massive amount of data and visualize it using Tableau.

Tata Sky OTT Application

https://watch.tataplay.com/

An application for Tata Sky enables greater accessibility for its viewers. Tata Sky is an Indian direct broadcast satellite television provider in India, using MPEG-4 digital compression technology and transmitting using INSAT-4A and GSAT-10 satellites. The company wanted subscribers to be able to watch select live TV channels, catch-up TV videos, and videos on demand (VOD) on their devices. The goal was for users to access these options through the Tata Sky application on any platform. The application was developed for Android, iOS, web, Dongle, and Smart TV platforms.

Skills

Languages

SQL, Java, Python 3, Python, Snowflake, Scala, Bash

Frameworks

Hadoop, Apache Spark, Spring Boot, Spring, Hibernate, Serverless Framework, Spark Structured Streaming

Tools

Apache Airflow, Apache NiFi, Terraform, AWS Glue, Amazon Athena, Looker, Oozie

Paradigms

ETL, Spatial Databases

Platforms

Amazon Web Services (AWS), Apache Kafka, Databricks, AWS Lambda, Hortonworks Data Platform (HDP), Google Cloud Platform (GCP), KNIME, Azure

Other

Data Analysis, Data Engineering, Data Architecture, Data Warehousing, Data Build Tool (dbt), Azure Databricks, Data Management, Stream Processing, NiFi, Big Data, Amazon Kinesis, Computer Science, Software Engineering, Data Modeling, Big Data Architecture, CI/CD Pipelines, Fivetran, Segment, Video Streaming, Amazon RDS, AWS Database Migration Service (DMS), APIs, Google BigQuery, GeoPandas, QGIS

Storage

Data Lakes, Apache Hive, MySQL, Amazon S3 (AWS S3), Elasticsearch, Cassandra, Redshift, Datadog, Redis Cache, Data Pipelines, NoSQL, PostgreSQL, MongoDB, Data Integration, Databases

Libraries/APIs

Segment.io, Luigi, ArcGIS

Education

2010 - 2014

Bachelor of Technology Degree in Computer Science

Uttar Pradesh Technical University - Lucknow, Uttar Pradesh, India

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring