Aniket Goel
Verified Expert in Engineering
Data Engineering Developer
Delhi, India
Toptal member since June 18, 2020
Aniket is a data engineer with proven industry experience in data lake development, data analytics, real-time streaming, and back-end application development. His work is used by millions of people in the legal and entertainment industries. Aniket builds exceptionally stable solutions for high-traffic, high-visibility projects, and understands what it takes to ensure products are robust and dependable. He also has expertise in the Hadoop ecosystem, AWS Big Data, Apache Kafka, Java, and SQL.
Portfolio
Experience
- SQL - 8 years
- Amazon Web Services (AWS) - 6 years
- Apache Airflow - 5 years
- Apache Spark - 4 years
- Apache Kafka - 4 years
- Python 3 - 4 years
- Databricks - 2 years
- Terraform - 2 years
Availability
Preferred Environment
Amazon Web Services (AWS), Redshift, Apache Kafka, Hadoop, Databricks, Apache Spark, Looker, SQL, Azure Databricks, Apache Airflow
The most amazing...
...optimized real-time data stream I've developed had a minimum of 10,000 events per second for a video domain application used by a million people.
Work Experience
Data Engineer
MedGeo Ventures
- Developed a data factory to find the relevant doctors (providers) and organizations (facilities) based on taxonomies that the client (associations) are interested in. Earlier, this task was carried out manually using publicly available data.
- Deducted the organizations' and individuals' data using Elasticssearch and SQL based on the addresses.
- Developed multiple Knime workflows for data ingestion and data transformation.
- Proposed new ideas in generating quality data using public healthcare data.
Senior Data Engineer
Hopin LTD
- Implemented the infrastructure as a service using Terraform and Serverless framework, including the set up of the Redshift data warehouse, real-time enrichment framework, Databricks workspace, and user permissions for an online events platform.
- Handled the implementation of CI/CD pipelines using GitHub Actions and GitLab pipelines for multiple projects, including deployment of lambda code, spark-streaming jobs, and DBT code.
- Developed a bunch of batch ETL pipelines for sources like SQL databases, Stripe, NetSuite, Chargebee, Qualtrics, Zendesk, and others using AWS services and Fivetran.
- Built a set of streaming applications using Apache Spark Streaming, Apache Kafka, and Scala on Databricks to implement a near real-time streaming pipeline based on S3.
- Developed multiple dashboards on Datadog to track the progress of streaming applications.
- Constructed multiple dashboards on Looker, like MAU, WAU, and the event registration progress.
- Created an end-to-end enrichment pipeline of user activity tracking events using AWS services like Lambda, Redis, SQS, and Segment.io.
Senior Data Engineer
TO THE NEW
- Designed and implemented a travel domain data lake using the Hadoop ecosystem.
- Managed data analytics using Apache Hive and Apache Spark.
- Implemented an optimized real-time streaming platform for an OTT video domain application using Apache Kafka and Cassandra.
- Initiated a search engine for an OTT video domain application using Elasticsearch and Java.
- Developed an ETL solution using AWS Kinesis and AWS Redshift.
Senior Data Engineer
Strive VR LLC
- Developed a web application to capture live video and stream it to different destinations using the Kinesis Video streaming service.
- Built a real-time video streaming platform and Twitch streaming for gym trainers to monitor remotely using AWS Kinesis Video Streams, Java, AWS Lambda, and AWS Elemental.
- Stored all streaming videos using optimized API into S3.
Software Engineer
Contata Solutions
- Assisted with the patent application process tracker system for SLW that helps attorneys quickly find information related to their case.
- Built the databases and UI with Java, JSF, Spring, and Hibernate as the full-stack developer on the team.
- Implemented a multi-feature search engine for patent applications using Elasticsearch.
- Optimized a legacy application by identifying memory leakage and performance issues.
- Created a module in Core Java for the daily processing and storage of data for 10+ million patents.
Software Engineer
ACT21Softwares Pvt Ltd.
- Developed an IDE project using the Eclipse RCP plugin.
- Created modules that generate Java code by using AST through drag and drop and UI events.
- Automated web application development using clicks and drag and drop features through an IDE.
Experience
Data Lakes Set-up for Organizing Online Events using Databricks
Real-Time User Activity Tracking System
Data Warehouse for Online Events Data Analysis
Application for a Full-fitness Virtual Reality Gym Experience
Seera Data Lake
Data Lake on AWS for a US-based Pharmaceutical Organization
I built a data lake that consumes data and incrementally puts that into AWS S3. It then uses AWS services and other tools to analyze this massive amount of data and visualize it using Tableau.
Tata Sky OTT Application
https://watch.tataplay.com/Education
Bachelor of Technology Degree in Computer Science
Uttar Pradesh Technical University - Lucknow, Uttar Pradesh, India
Skills
Libraries/APIs
Segment.io, Luigi, ArcGIS
Tools
Apache Airflow, Apache NiFi, Terraform, AWS Glue, Amazon Athena, Looker, Oozie
Languages
SQL, Java, Python 3, Python, Snowflake, Scala, Bash
Frameworks
Hadoop, Apache Spark, Spring Boot, Spring, Hibernate, Serverless Framework, Spark Structured Streaming
Paradigms
ETL, Spatial Databases
Platforms
Amazon Web Services (AWS), Apache Kafka, Databricks, AWS Lambda, Hortonworks Data Platform (HDP), Google Cloud Platform (GCP), KNIME, Azure
Storage
Data Lakes, Apache Hive, MySQL, Amazon S3 (AWS S3), Elasticsearch, Cassandra, Redshift, Datadog, Redis Cache, Data Pipelines, NoSQL, PostgreSQL, MongoDB, Data Integration, Databases
Other
Data Analysis, Data Engineering, Data Architecture, Data Warehousing, Data Build Tool (dbt), Azure Databricks, Data Management, Stream Processing, NiFi, Big Data, Amazon Kinesis, Computer Science, Software Engineering, Data Modeling, Big Data Architecture, CI/CD Pipelines, Fivetran, Segment, Video Streaming, Amazon RDS, AWS Database Migration Service (DMS), APIs, Google BigQuery, GeoPandas, QGIS
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring