Sushil Pangeni, Developer in Fremont, CA, United States
Sushil is available for hire
Hire Sushil

Sushil Pangeni

Verified Expert  in Engineering

Data Mining Developer

Location
Fremont, CA, United States
Toptal Member Since
November 16, 2017

For more than six years, Sushil has been working in core back-end infrastructure development in C. During his career, he's filed four patents related to data and query infrastructure and designed and developed core data and query processing distributed systems. Sushil's experienced in all stages of the product development cycle from design, development, deployment, and maintenance.

Availability

Part-time

Preferred Environment

Vim Text Editor, Eclipse, Unix, Linux, Subversion (SVN), Git

The most amazing...

...thing I’ve created was the C++ code that helped a robot navigate through a door; used equations with angular/linear velocity, acceleration, & obstacle distance.

Work Experience

Staff Software Engineer

2011 - PRESENT
Zscaler, Inc.
  • Developed a generic querying framework implemented in C where I also implemented a custom SQL-like query language. More details about this project can be found in my portfolio.
  • Worked with a framework that included a Zscaler Interface Definition language (ZIDL), an AVRO-like schema/metadata definition language for the query framework.
  • Created a query planner and engine to implement a common SQL functionality; the query engine flushed the intermediate memory into the disk files and compaction/merging of those files in the background.
  • Supported the output of a query as a micro-database (KeyStore files).
  • Created a distributed framework, with built-in tolerance and resilience, to run queries on multiple remote nodes and collect resultant data.
  • Enabled the distributed framework to run any kind of command that was supported by the remote hosts; for SQL-based queries, ZQL, and a KeyStore framework was used to run MapReduce jobs.
  • Set up the distributed framework so that it supports a wide variety of options for merging resultant data.
  • Built a key-value store written in C that was inspired by a Google-sorted string table concept.
  • Implemented the key-value store so that it supported composite keys and stored them hierarchically as a prefix tree; the values are stored in sorted order based on the composite key.
  • Implemented the following features in the key-value store: a custom file format with configurable block-based compression and an ability to merge similar key-value stores.
  • Developed a data-mining pipeline to detect potential tunneling traffic that uses DNS as a transport layer.
  • Found common features and characteristics that were seen in the tunnel traffic that uses DNS as a transport layer; implemented an algorithm in Spark to flag potential DNS prefixes that were creating the tunnel. The algorithm ran over large data-sets containing DNS logs.
  • Designed and implemented an approach for the real-time streaming of compressed log data from cloud to customer premises; the service connected to SIEMs for final delivery of data and included filtering and formatting capabilities.
Technologies: FreeBSD, Apache Airflow, Spark, Perl, Java, C

Patent | Optimized Exclusion Filters for a Multistage Filter Process in Queries

https://www.google.com/patents/US20160299947
A computer-implemented method for querying a data source using an optimized exclusion filter expression created from a full-filter expression is described.

The method includes the following:
• Receiving one or more queries defined by the full-filter expression, wherein one or more queries are for obtaining an output from the data source.
• Performing a reduction in the full-filter expression to determine the optimized exclusion filter expression.
• Applying the optimized exclusion filter expression in the data source to exclude the data.
• Applying the full-filter expression to the data not excluded by the optimized exclusion filter expression.

Patent | Optimized Query Process Using Aggregates with Varying Grain Sizes

http://www.google.com.gi/patents/US20160048558
A computer-implemented method and system for querying aggregates in a database including the following features:
• Maintaining aggregates based on a dimension in the database with at least two grain sizes.
• Receiving a query of the aggregates for a defined range of the dimension.
• Finding a start and an end for a read operation for a larger grain size of the at least two grain sizes of the aggregates for the defined range.
• Reading the first set from the start to the end in the database of the larger grain size of the at least two grain sizes of the aggregates.
• Reading a second set comprising of the smaller grain size of the at least two grain sizes of the aggregates based on the defined range and the start and the end.
• Adjusting the first set with the second set.

Languages

C, C++, Java, Perl, Python

Paradigms

Distributed Computing, Parallel Computing, Asynchronous Programming

Platforms

Unix, Linux, Eclipse, FreeBSD

Storage

MySQL

Other

Data Mining, Distributed Systems, Networks, Data Engineering

Frameworks

Spark, WebApp, Jersey

Libraries/APIs

JAX-RS

Tools

Git, Subversion (SVN), Vim Text Editor, Apache Airflow, Spark SQL

2007 - 2011

Bachelor of Engineering Degree in Computer Science and Engineering

PEC University of Technology - Chandigarh, India

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring