Sushil Pangeni, Data Mining Developer in Fremont, CA, United States
Sushil Pangeni

Data Mining Developer in Fremont, CA, United States

Member since April 11, 2017
For more than six years, Sushil has been working in core back-end infrastructure development in C. During his career, he's filed four patents related to data and query infrastructure and designed and developed core data and query processing distributed systems. Sushil's experienced in all stages of the product development cycle from design, development, deployment, and maintenance.
Sushil is now available for hire



  • C 7 years
  • Data Mining 6 years
  • Networks 6 years
  • Data Engineering 6 years
  • Parallel & Distributed Computing 4 years
  • Distributed Systems 4 years
  • C++ 3 years
  • Java 2 years


Fremont, CA, United States



Preferred Environment

Git, SVN, Linux/Unix, Eclipse, Vim

The most amazing...

...thing I’ve created was the C++ code that helped a robot navigate through a door; used equations with angular/linear velocity, acceleration, & obstacle distance.


  • Staff Software Engineer

    2011 - PRESENT
    Zscaler, Inc.
    • Developed a generic querying framework implemented in C where I also implemented a custom SQL-like query language. More details about this project can be found in my portfolio.
    • Worked with a framework that included a Zscaler Interface Definition language (ZIDL), an AVRO-like schema/metadata definition language for the query framework.
    • Created a query planner and engine to implement a common SQL functionality; the query engine flushed the intermediate memory into the disk files and compaction/merging of those files in the background.
    • Supported the output of a query as a micro-database (KeyStore files).
    • Created a distributed framework, with built-in tolerance and resilience, to run queries on multiple remote nodes and collect resultant data.
    • Enabled the distributed framework to run any kind of command that was supported by the remote hosts; for SQL-based queries, ZQL, and a KeyStore framework was used to run MapReduce jobs.
    • Set up the distributed framework so that it supports a wide variety of options for merging resultant data.
    • Built a key-value store written in C that was inspired by a Google-sorted string table concept.
    • Implemented the key-value store so that it supported composite keys and stored them hierarchically as a prefix tree; the values are stored in sorted order based on the composite key.
    • Implemented the following features in the key-value store: a custom file format with configurable block-based compression and an ability to merge similar key-value stores.
    • Developed a data-mining pipeline to detect potential tunneling traffic that uses DNS as a transport layer.
    • Found common features and characteristics that were seen in the tunnel traffic that uses DNS as a transport layer; implemented an algorithm in Spark to flag potential DNS prefixes that were creating the tunnel. The algorithm ran over large data-sets containing DNS logs.
    • Designed and implemented an approach for the real-time streaming of compressed log data from cloud to customer premises; the service connected to SIEMs for final delivery of data and included filtering and formatting capabilities.
    Technologies: C, Java, Perl, Spark, Airflow, FreeBSD


  • Patent | Optimized Exclusion Filters for a Multistage Filter Process in Queries (Other amazing things)

    A computer-implemented method for querying a data source using an optimized exclusion filter expression created from a full-filter expression is described.

    The method includes the following:
    • Receiving one or more queries defined by the full-filter expression, wherein one or more queries are for obtaining an output from the data source.
    • Performing a reduction in the full-filter expression to determine the optimized exclusion filter expression.
    • Applying the optimized exclusion filter expression in the data source to exclude the data.
    • Applying the full-filter expression to the data not excluded by the optimized exclusion filter expression.

  • Patent | Optimized Query Process Using Aggregates with Varying Grain Sizes (Other amazing things)

    A computer-implemented method and system for querying aggregates in a database including the following features:
    • Maintaining aggregates based on a dimension in the database with at least two grain sizes.
    • Receiving a query of the aggregates for a defined range of the dimension.
    • Finding a start and an end for a read operation for a larger grain size of the at least two grain sizes of the aggregates for the defined range.
    • Reading the first set from the start to the end in the database of the larger grain size of the at least two grain sizes of the aggregates.
    • Reading a second set comprising of the smaller grain size of the at least two grain sizes of the aggregates based on the defined range and the start and the end.
    • Adjusting the first set with the second set.


  • Languages

    C, C++, Java, Perl, Python
  • Paradigms

    Parallel & Distributed Computing, Asynchronous Programming
  • Platforms

    Unix, Linux
  • Storage

  • Other

    Data Mining, Distributed Systems, Networks, Data Engineering
  • Frameworks

    WebApp, Jersey
  • Libraries/APIs

  • Tools

    Spark SQL


  • Bachelor of Engineering degree in Computer Science and Engineering
    2007 - 2011
    PEC University of Technology - Chandigarh, India

To view more profiles

Join Toptal
Share it with others