Verified Expert in Engineering
Git, Emacs, Ubuntu, Visual Studio Code (VS Code)
The most amazing...
...product I've developed was a real-time microseismic imaging app that ran on a GPU microcluster that reduced turnaround time from weeks to 75 seconds.
Platform Software Developer
- Added Cholesky factorization in Poplar and TensorFlow.
- Improved the support for convolutional image resizing.
- Worked on improving TensorFlow support on Graphcore IPUs.
- Collaborated on a new internal compiler project for our next-generation hardware and software stack.
C++ Software Developer
Motion Signal Technologies
- Reviewed code and recommended performance improvement in existing code.
- Added generic C/C++ optimizations in the data load and filtering code, with significant improvements in performance.
- Reviewed generated assembly to investigate ARM v8-specific optimizations.
Back-end Common Lisp (CL) Developer
- Identified new features and enhancements based on discussions with the client and implemented them in an existing Common Lisp web app.
- Integrated the application with an external service that provided social media exploratory services.
- Added support for exporting billing data to Google Sheets.
Principal Software Developer
- Optimized the 4D MTMI stacking application—speeding up the processing time by six times, resulting in financial gains on the rented HPC cluster.
- Reorganized and refactored the Make-based build system to make it more efficient and extensible.
- Optimized and parallelized various microseismic algorithms to run across a cluster.
- Optimized the 4D MTMI stacking application for NVIDIA CUDA GPUs, resulting in a six-times improvement over the multithreaded version running on the high-end Xeon CPUs of comparable costs.
- Refactored and optimized the offline microseismic workflow into a single application running on a GPU microcluster with OpenMP, Boost.Thread, MPI, and CUDA for use as a field system, reducing the event display latency from weeks to around 75 seconds.
- Wrote a distributed version of Map() to run over our PBS-based cluster to implement transparent clustering in Python scripts implementing seismic workflows.
- Wrote a genetic algorithm framework in Python and used it to implement a travel-time table optimizer.
- Integrated OpenInventor's heightfield implementation into the G3D graphics engine used to build the DecisionSpace platform.
- Developed deserialization and reserialization code for 3D objects and integrated connectivity backed with DecisionSpace and the Unity3D engine for supporting VR and AR using Unity3D and Microsoft HoloLens.
- Integrated Valve's OpenVR SDK to support a VR display using SteamVR headsets with DecisionSpace.
- Developed a LEMS (https://github.com/LEMS/pylems, http://neuroml.org/lems) API and a simulator in Python. The simulator reads-in and parses a LEMS model specification, and then dynamically generates executable Python code corresponding to the model.
- Developed a real-time courtroom transcript streaming application using Node.js and Socket.IO that captures input from a transcript stream and streams it live to multiple clients in multiple locations (coutroom, client locations) connected to multiple interlinked servers.
- Integrated the above application with the Magnum 2 application used for managing legal document bundles. http://www.opus2.com/magnum/case-analysis-software#realtime.
Senior Software Developer | Software Architect
- Ported and optimized a Fortran-based real-time migration (RTM) code-base from Shell to run on NVIDIA’s Tesla C2050 card. I handled the initial effort estimation and design, and phase 1 of the optimization work.
- Ported and optimized the 3D inversion code in Seismic Unix (suinvzco3d) to run on an NVIDIA C1060 Tesla card. Speedups achieved were in the range of 48x-52x.
- Ported and optimized image processing algorithms on CUDA for a microscope imaging application for Olympus. The speedups achieved were in the range of 35x-40x (120x for the core interpolation algorithms).
- Led a team for development and optimization of VC1 decoder firmware on the Quartics QV1500 platform, a 12-core VLIW-based SOC. The firmware for each DSP was developed using assembly language and then optimized to decode HD videos.
- Led a team to develop audio and video components for an automotive infotainment system running on VxWorks.
- Ported an optimized Wipro's H.263 encoder IP for TI's DM642 DSP, and used it to implement a multi-channel streaming camera, including developing and RTSP/RTP stack from scratch.
- Ported and optimized Wipro’s H.264 decoder IP for TI’s C64x and ARM’s ARM11 architectures, including C and assembly-language optimizations, as well as optimal pipeline and cache usage optimizations for ARM11.
Real-time Courtroom Transcript Streaming
OpenMP, Socket.IO, Pandas, React, MPI, OpenGL, Node.js, SystemC, Pthreads, Keras, TensorFlow
Distributed Computing, High-performance Computing, Parallel Computing, Object-oriented Programming (OOP), Functional Programming, Functional Reactive Programming, Reactive Programming
Linux, NVIDIA CUDA, VxWorks, Windows, Ubuntu, Steam, Android, Xamarin, Google Chrome, Raspberry Pi, Visual Studio Code (VS Code)
Graphics Processing Unit (GPU), GPU Computing, Optimization, ARM, VC-1, H.264, Eclipse CDT, Algorithms, Mathematics, TI DSP C2000, Visual Composer, OpenVR, Multiprocessing, Multithreading, Neuroscience, Deep Learning, Computational Linguistics, MPEG, AAC, Digital Signal Processing, Linear Algebra, Machine Learning, Natural Language Processing (NLP), IPU, GPT, Generative Pre-trained Transformers (GPT)
.NET, Unity, Unity3D, PowerPC, Boost
Microsoft Visual Studio, Git, Subversion (SVN), Emacs, Eclipse IDE, MATLAB, NGINX, Torque, CMake, NEURON, GNU Debugger, GNUMake, Makefile, Make, Apache Ant
Oracle RDBMS, PostgreSQL, SQLite
Master of Science in Computational Neuroscience and Machine Learning
University of Edinburgh - Edinburgh, UK
Bachelor of Technology/Engineering in Computer Science
University of Kerala - Kollam, Kerala, India