Khushali Patel, Big Data Developer in Mumbai, Maharashtra, India
Khushali Patel

Big Data Developer in Mumbai, Maharashtra, India

Member since September 14, 2019
Khushali is a detail-oriented data engineer with a get-it-done, on-time, and high-quality product delivery attitude. He has over three years of experience in the design and development of scalable, robots, and reusable big data products and frameworks for many startups and well-known financial firms. Khushali excels in programming (Scala, Java, Python), big data (Hadoop, Spark, Hive, Impala, Druid), and streaming technology (Kafka, KSQL).
Khushali is now available for hire

Portfolio

Experience

Location

Mumbai, Maharashtra, India

Availability

Part-time

Preferred Environment

SQL Workbench, IntelliJ, Jupyter, PyCharm

The most amazing...

...project was consulting with a startup to transform their batch ELT with real-time Kafka-based streaming to provide real-time updates to their client.

Employment

  • Big Data Consultant

    2016 - PRESENT
    Clients (via Toptal)
    • Consulted with startups and medium-scale organizations to build data lakes for analytics.
    • Advised organizations on building real-time data pipelines using Kafka and Spark.
    • Helped organizations to analyze and report on their datasets.
    Technologies: Druid.io, Spark, Apache Kafka, Hadoop
  • Senior Data Engineer

    2016 - 2020
    Morgan Stanley
    • Received consecutive promotions for four years for exceptional performance.
    • Got MD recognition for exceptional deliverables for real-time data ingestion Initiative.
    • Worked on analytics system that currently processing 10K records per minute on 10 node spark cluster.
    • Managed and nurtured a team of six people to work on the next-generation real-time cyber analytics engine.
    Technologies: Amazon Web Services (AWS), Apache Airflow, Apache Hive, Apache Kafka, AWS EMR, AWS Glue, Hadoop, HDFS, Spark, Spark Structured Streaming, Spark Streaming

Experience

  • Cyber Analytics Platform

    CAP is a real-time scalable advanced analytics system that was designed to analyze and detect cybersecurity threats and patterns. It would then alert cybersecurity applications in near real-time from tremendous high volume structured, semistructured, and unstructured streams of various applications and network hardware types.

  • DataStream: Generic Streaming Framework

    DataStream is configuration driven any-to-any data integration and transformation framework fulfills the real-time data integration, transformation, and distribution needs of the applications that were built around Kafka, Kafka Connect, Kafka Streaming, and Spark Streaming. The platform facilitates the quick and easy setup, management and management of stream ETL without writing a single line of code.

  • DIF | Data Integration Fabric and Data Integration Framework

    DIF is a generic metadata-driven any-to-any data integration framework that is a one-stop-shop for all integration needs. It comprises of different lightweight and governed data integration utilities that enable data integration between various DWH components like TD, SQL Server, Hadoop, file system, and Greenplum

  • Next generation Trade Analytics System

    Built the centralized data warehouse by aggregating and transforming data from the various internal and external data providers. This data will be leveraged by the Analytics engine to predict and automate the trading strategy. There will be an intuitive reporting tool that allows the researcher/trader to visualize the current and past behavior of the market.

Skills

  • Frameworks

    Hadoop, Spark, Spark Structured Streaming, AWS EMR
  • Platforms

    Apache Kafka, Amazon Web Services (AWS)
  • Other

    Data Engineering, Data Stream Processing, Big Data, Streaming, Computer Science, StreamSets, Streaming Data
  • Languages

    SQL, Java, Python, Scala
  • Libraries/APIs

    Spark Streaming
  • Tools

    Jupyter, Kafka Streams, Spark SQL, AWS Glue, Apache Airflow, Apache Impala, Ansible, Apache NiFi, PyCharm, IntelliJ
  • Storage

    Apache Hive, HDFS, AWS S3, MongoDB, Druid.io, SQL Workbench, Redshift
  • Paradigms

    B2B

Education

  • Master of Technology Degree in Computer Science
    2014 - 2016
    Nirma University - Ahmedabad, India
  • Bachelor of Technology Degree in Computer Science
    2011 - 2014
    Ganpat University - Mehsana, India

Certifications

  • AWS Certified Solutions Architect
    AUGUST 2020 - PRESENT
    Amazon Web Services

To view more profiles

Join Toptal
Share it with others