Sushil Pangeni
Verified Expert in Engineering
Data Mining Developer
For more than six years, Sushil has been working in core back-end infrastructure development in C. During his career, he's filed four patents related to data and query infrastructure and designed and developed core data and query processing distributed systems. Sushil's experienced in all stages of the product development cycle from design, development, deployment, and maintenance.
Portfolio
Experience
Availability
Preferred Environment
Vim Text Editor, Eclipse, Unix, Linux, Subversion (SVN), Git
The most amazing...
...thing I’ve created was the C++ code that helped a robot navigate through a door; used equations with angular/linear velocity, acceleration, & obstacle distance.
Work Experience
Staff Software Engineer
Zscaler, Inc.
- Developed a generic querying framework implemented in C where I also implemented a custom SQL-like query language. More details about this project can be found in my portfolio.
- Worked with a framework that included a Zscaler Interface Definition language (ZIDL), an AVRO-like schema/metadata definition language for the query framework.
- Created a query planner and engine to implement a common SQL functionality; the query engine flushed the intermediate memory into the disk files and compaction/merging of those files in the background.
- Supported the output of a query as a micro-database (KeyStore files).
- Created a distributed framework, with built-in tolerance and resilience, to run queries on multiple remote nodes and collect resultant data.
- Enabled the distributed framework to run any kind of command that was supported by the remote hosts; for SQL-based queries, ZQL, and a KeyStore framework was used to run MapReduce jobs.
- Set up the distributed framework so that it supports a wide variety of options for merging resultant data.
- Built a key-value store written in C that was inspired by a Google-sorted string table concept.
- Implemented the key-value store so that it supported composite keys and stored them hierarchically as a prefix tree; the values are stored in sorted order based on the composite key.
- Implemented the following features in the key-value store: a custom file format with configurable block-based compression and an ability to merge similar key-value stores.
- Developed a data-mining pipeline to detect potential tunneling traffic that uses DNS as a transport layer.
- Found common features and characteristics that were seen in the tunnel traffic that uses DNS as a transport layer; implemented an algorithm in Spark to flag potential DNS prefixes that were creating the tunnel. The algorithm ran over large data-sets containing DNS logs.
- Designed and implemented an approach for the real-time streaming of compressed log data from cloud to customer premises; the service connected to SIEMs for final delivery of data and included filtering and formatting capabilities.
Experience
Patent | Optimized Exclusion Filters for a Multistage Filter Process in Queries
https://www.google.com/patents/US20160299947The method includes the following:
• Receiving one or more queries defined by the full-filter expression, wherein one or more queries are for obtaining an output from the data source.
• Performing a reduction in the full-filter expression to determine the optimized exclusion filter expression.
• Applying the optimized exclusion filter expression in the data source to exclude the data.
• Applying the full-filter expression to the data not excluded by the optimized exclusion filter expression.
Patent | Optimized Query Process Using Aggregates with Varying Grain Sizes
http://www.google.com.gi/patents/US20160048558• Maintaining aggregates based on a dimension in the database with at least two grain sizes.
• Receiving a query of the aggregates for a defined range of the dimension.
• Finding a start and an end for a read operation for a larger grain size of the at least two grain sizes of the aggregates for the defined range.
• Reading the first set from the start to the end in the database of the larger grain size of the at least two grain sizes of the aggregates.
• Reading a second set comprising of the smaller grain size of the at least two grain sizes of the aggregates based on the defined range and the start and the end.
• Adjusting the first set with the second set.
Skills
Languages
C, C++, Java, Perl, Python
Paradigms
Distributed Computing, Parallel Computing, Asynchronous Programming
Platforms
Unix, Linux, Eclipse, FreeBSD
Storage
MySQL
Other
Data Mining, Distributed Systems, Networks, Data Engineering
Frameworks
Spark, WebApp, Jersey
Libraries/APIs
JAX-RS
Tools
Git, Subversion (SVN), Vim Text Editor, Apache Airflow, Spark SQL
Education
Bachelor of Engineering Degree in Computer Science and Engineering
PEC University of Technology - Chandigarh, India
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring