Staff Software Engineer2011 - PRESENTZscaler, Inc.
Technologies: C, Java, Perl, Spark, Airflow, FreeBSD
- Developed a generic querying framework implemented in C where I also implemented a custom SQL-like query language. More details about this project can be found in my portfolio.
- Worked with a framework that included a Zscaler Interface Definition language (ZIDL), an AVRO-like schema/metadata definition language for the query framework.
- Created a query planner and engine to implement a common SQL functionality; the query engine flushed the intermediate memory into the disk files and compaction/merging of those files in the background.
- Supported the output of a query as a micro-database (KeyStore files).
- Created a distributed framework, with built-in tolerance and resilience, to run queries on multiple remote nodes and collect resultant data.
- Enabled the distributed framework to run any kind of command that was supported by the remote hosts; for SQL-based queries, ZQL, and a KeyStore framework was used to run MapReduce jobs.
- Set up the distributed framework so that it supports a wide variety of options for merging resultant data.
- Built a key-value store written in C that was inspired by a Google-sorted string table concept.
- Implemented the key-value store so that it supported composite keys and stored them hierarchically as a prefix tree; the values are stored in sorted order based on the composite key.
- Implemented the following features in the key-value store: a custom file format with configurable block-based compression and an ability to merge similar key-value stores.
- Developed a data-mining pipeline to detect potential tunneling traffic that uses DNS as a transport layer.
- Found common features and characteristics that were seen in the tunnel traffic that uses DNS as a transport layer; implemented an algorithm in Spark to flag potential DNS prefixes that were creating the tunnel. The algorithm ran over large data-sets containing DNS logs.
- Designed and implemented an approach for the real-time streaming of compressed log data from cloud to customer premises; the service connected to SIEMs for final delivery of data and included filtering and formatting capabilities.