Khushali Patel, Big Data Developer in Mumbai, Maharashtra, India
Khushali Patel

Big Data Developer in Mumbai, Maharashtra, India

Member since June 24, 2020
Khushali is a detail-oriented data engineer with a get-it-done, on-time, and high-quality product delivery attitude. He has over three years of experience in the design and development of scalable, robots, and reusable big data products and frameworks for many startups and well-known financial firms. Khushali excels in programming (Scala, Java, Python), big data (Hadoop, Spark, Hive, Impala, Druid), and streaming technology (Kafka, KSQL).
Khushali is now available for hire

Portfolio

  • Clients (via Toptal)
    Druid.io, Spark, Apache Kafka, Hadoop
  • Uniphi Inc
    Amazon Web Services (AWS), Azure, Spark, Apache Hive, Apache Kafka
  • Morgan Stanley
    Amazon Web Services (AWS), Apache Airflow, Apache Hive, Apache Kafka, AWS EMR...

Experience

Location

Mumbai, Maharashtra, India

Availability

Full-time

Preferred Environment

MySQL Workbench, IntelliJ, Jupyter, PyCharm, Amazon Web Services (AWS)

The most amazing...

...project was consulting with a startup to transform their batch ELT with real-time Kafka-based streaming to provide real-time updates to their client.

Employment

  • Big Data Consultant

    2016 - PRESENT
    Clients (via Toptal)
    • Consulted with startups and medium-scale organizations to build data lakes for analytics.
    • Advised organizations on building real-time data pipelines using Kafka and Spark.
    • Helped organizations to analyze and report on their datasets.
    Technologies: Druid.io, Spark, Apache Kafka, Hadoop
  • Data Modeling/Data Analyst (Hive and Kafka)

    2020 - 2020
    Uniphi Inc
    • Understood the problem statement thoroughly, explored available options to solve the problem in the data engineering world, and proposed the most optimized and stable architecture.
    • Developed a configuration-driven streaming engine that auto-detects changes from well-known distributed systems like AWS S3, Azure Blob, and GCP File System and ingest them into Streaming Queue (Kafka).
    • Affected the handover successfully to the in-house development team with proper knowledge transfer and supported end-to-end functionality for a week.
    Technologies: Amazon Web Services (AWS), Azure, Spark, Apache Hive, Apache Kafka
  • Senior Data Engineer

    2016 - 2020
    Morgan Stanley
    • Received consecutive promotions for four years for exceptional performance.
    • Got MD recognition for exceptional deliverables for real-time data ingestion Initiative.
    • Worked on analytics system that currently processing 10K records per minute on 10 node spark cluster.
    • Managed and nurtured a team of six people to work on the next-generation real-time cyber analytics engine.
    Technologies: Amazon Web Services (AWS), Apache Airflow, Apache Hive, Apache Kafka, AWS EMR, AWS Glue, Hadoop, HDFS, Spark, Spark Structured Streaming, Spark Streaming

Experience

  • Cyber Analytics Platform

    CAP is a real-time scalable advanced analytics system that was designed to analyze and detect cybersecurity threats and patterns. It would then alert cybersecurity applications in near real-time from tremendous high volume structured, semistructured, and unstructured streams of various applications and network hardware types.

  • DataStream: Generic Streaming Framework

    DataStream is configuration driven any-to-any data integration and transformation framework fulfills the real-time data integration, transformation, and distribution needs of the applications that were built around Kafka, Kafka Connect, Kafka Streaming, and Spark Streaming. The platform facilitates the quick and easy setup, management and management of stream ETL without writing a single line of code.

  • DIF | Data Integration Fabric and Data Integration Framework

    DIF is a generic metadata-driven any-to-any data integration framework that is a one-stop-shop for all integration needs. It comprises of different lightweight and governed data integration utilities that enable data integration between various DWH components like TD, SQL Server, Hadoop, file system, and Greenplum

  • Next Generation Trade Analytics System

    Built the centralized data warehouse by aggregating and transforming data from the various internal and external data providers. This data will be leveraged by the Analytics engine to predict and automate the trading strategy. There will be an intuitive reporting tool that allows the researcher/trader to visualize the current and past behavior of the market.

  • Trade Analytics and Prediction System

    • Designed and built a data platform on the AWS data ecosystem.
    • Set up ETL pipeline to bring data from 200+ external and 50+ internal sources daily.
    • Tools and technology used include AWS Glue, Amazon Managed Workflows for Apache Airflow (MWAA), AWS Lambda, Amazon Redshift, AWS Athena, AWS S3, Apache Spark, Pandas, AWS Kinesis, and Apache Kafka.

Skills

  • Frameworks

    Hadoop, Spark, Spark Structured Streaming, AWS EMR
  • Platforms

    Apache Kafka, Amazon Web Services (AWS), Azure, AWS Kinesis
  • Other

    Data Engineering, Stream Processing, Big Data, Streaming, Computer Science, StreamSets, Streaming Data
  • Languages

    SQL, Java, Python, Scala
  • Libraries/APIs

    Spark Streaming, Pandas
  • Tools

    Jupyter, Kafka Streams, Spark SQL, AWS Glue, Apache Airflow, Apache Impala, Ansible, Apache NiFi, PyCharm, IntelliJ, MySQL Workbench
  • Storage

    Apache Hive, HDFS, Amazon S3 (AWS S3), MongoDB, Druid.io, Redshift
  • Paradigms

    B2B

Education

  • Master of Technology Degree in Computer Science
    2014 - 2016
    Nirma University - Ahmedabad, India
  • Bachelor of Technology Degree in Computer Science
    2011 - 2014
    Ganpat University - Mehsana, India

Certifications

  • AWS Certified Solutions Architect
    AUGUST 2020 - PRESENT
    Amazon Web Services

To view more profiles

Join Toptal
Share it with others