Gautham Ganapathy, Developer in Shrivenham, United Kingdom
Gautham is available for hire
Hire Gautham

Gautham Ganapathy

Verified Expert  in Engineering

Software Developer

Shrivenham, United Kingdom
Toptal Member Since
May 16, 2017

Gautham has spent 18+ years developing and optimizing software—primarily for parallel-programmed HPC apps (GPU, multicore, clusters) and embedded systems. He has spent over a decade working with NVIDIA CUDA GPUs and is also familiar with JavaScript, Node.js, Socket.IO, and React. Gautham has an academic background in machine learning and computational neuroscience, and a personal interest in Haskell and Rust.



Preferred Environment

Git, Emacs, Ubuntu, Visual Studio Code (VS Code)

The most amazing...

...product I've developed was a real-time microseismic imaging app that ran on a GPU microcluster that reduced turnaround time from weeks to 75 seconds.

Work Experience

Platform Software Developer

2020 - PRESENT
  • Added Cholesky factorization in Poplar and TensorFlow.
  • Improved the support for convolutional image resizing.
  • Worked on improving TensorFlow support on Graphcore IPUs.
  • Collaborated on a new internal compiler project for our next-generation hardware and software stack.
Technologies: C++, Python, TensorFlow, IPU

C++ Software Developer

2022 - 2022
Motion Signal Technologies
  • Reviewed code and recommended performance improvement in existing code.
  • Added generic C/C++ optimizations in the data load and filtering code, with significant improvements in performance.
  • Reviewed generated assembly to investigate ARM v8-specific optimizations.
Technologies: C++, Raspberry Pi, Optimization, OpenMP

Back-end Common Lisp (CL) Developer

2019 - 2020
  • Identified new features and enhancements based on discussions with the client and implemented them in an existing Common Lisp web app.
  • Integrated the application with an external service that provided social media exploratory services.
  • Added support for exporting billing data to Google Sheets.
Technologies: Common Lisp (CL), Python

Principal Software Developer

2013 - 2020
  • Optimized the 4D MTMI stacking application—speeding up the processing time by six times, resulting in financial gains on the rented HPC cluster.
  • Reorganized and refactored the Make-based build system to make it more efficient and extensible.
  • Optimized and parallelized various microseismic algorithms to run across a cluster.
  • Optimized the 4D MTMI stacking application for NVIDIA CUDA GPUs, resulting in a six-times improvement over the multithreaded version running on the high-end Xeon CPUs of comparable costs.
  • Refactored and optimized the offline microseismic workflow into a single application running on a GPU microcluster with OpenMP, Boost.Thread, MPI, and CUDA for use as a field system, reducing the event display latency from weeks to around 75 seconds.
  • Wrote a distributed version of Map() to run over our PBS-based cluster to implement transparent clustering in Python scripts implementing seismic workflows.
  • Wrote a genetic algorithm framework in Python and used it to implement a travel-time table optimizer.
  • Integrated OpenInventor's heightfield implementation into the G3D graphics engine used to build the DecisionSpace platform.
  • Developed deserialization and reserialization code for 3D objects and integrated connectivity backed with DecisionSpace and the Unity3D engine for supporting VR and AR using Unity3D and Microsoft HoloLens.
  • Integrated Valve's OpenVR SDK to support a VR display using SteamVR headsets with DecisionSpace.
Technologies: OpenVR, Unity3D, OpenGL, Java, MPI, OpenMP, NVIDIA CUDA, Python, C, C++

Software Developer

2012 - 2013
Textensor Ltd
  • Developed a LEMS (, API and a simulator in Python. The simulator reads-in and parses a LEMS model specification, and then dynamically generates executable Python code corresponding to the model.
  • Developed a real-time courtroom transcript streaming application using Node.js and Socket.IO that captures input from a transcript stream and streams it live to multiple clients in multiple locations (coutroom, client locations) connected to multiple interlinked servers.
  • Integrated the above application with the Magnum 2 application used for managing legal document bundles.
Technologies: NGINX, PHP, Socket.IO, Node.js, JavaScript, Python

Senior Software Developer | Software Architect

2000 - 2010
Wipro Technologies
  • Ported and optimized a Fortran-based real-time migration (RTM) code-base from Shell to run on NVIDIA’s Tesla C2050 card. I handled the initial effort estimation and design, and phase 1 of the optimization work.
  • Ported and optimized the 3D inversion code in Seismic Unix (suinvzco3d) to run on an NVIDIA C1060 Tesla card. Speedups achieved were in the range of 48x-52x.
  • Ported and optimized image processing algorithms on CUDA for a microscope imaging application for Olympus. The speedups achieved were in the range of 35x-40x (120x for the core interpolation algorithms).
  • Led a team for development and optimization of VC1 decoder firmware on the Quartics QV1500 platform, a 12-core VLIW-based SOC. The firmware for each DSP was developed using assembly language and then optimized to decode HD videos.
  • Led a team to develop audio and video components for an automotive infotainment system running on VxWorks.
  • Ported an optimized Wipro's H.263 encoder IP for TI's DM642 DSP, and used it to implement a multi-channel streaming camera, including developing and RTSP/RTP stack from scratch.
  • Ported and optimized Wipro’s H.264 decoder IP for TI’s C64x and ARM’s ARM11 architectures, including C and assembly-language optimizations, as well as optimal pipeline and cache usage optimizations for ARM11.
Technologies: Visual Composer, PowerPC, TI DSP C2000, Digital Signal Processing, ARM, Assembly Language, SystemC, VxWorks, AAC, MPEG, H.264, MPI, OpenMP, Fortran, OpenGL, NVIDIA CUDA, C++, C

A LEMS ( simulator written in Python which can be used to run NeuroML2 ( models.

Real-time Courtroom Transcript Streaming

This streams courtroom transcripts in real time into a legal document management system; it is simultaneously visible to multiple clients both in the courtroom as well as offsite. Additionally, viewers could annotate the document in real-time as well as chat with other viewers within the same workspace.


Python, C++, C, C#, Java, Bash, JavaScript, Haskell, Assembly Language, TypeScript, PHP, Rust, Fortran, R, Perl, Common Lisp (CL), Emacs Lisp


OpenMP, Socket.IO, Pandas, React, MPI, OpenGL, Node.js, SystemC, Pthreads, Keras, TensorFlow


Distributed Computing, High-performance Computing, Parallel Computing, Object-oriented Programming (OOP), Functional Programming, Functional Reactive Programming, Reactive Programming


Linux, NVIDIA CUDA, VxWorks, Windows, Ubuntu, Steam, Android, Xamarin, Google Chrome, Raspberry Pi, Visual Studio Code (VS Code)


Graphics Processing Unit (GPU), GPU Computing, Optimization, ARM, VC-1, H.264, Eclipse CDT, Algorithms, Mathematics, TI DSP C2000, Visual Composer, OpenVR, Multiprocessing, Multithreading, Neuroscience, Deep Learning, Computational Linguistics, MPEG, AAC, Digital Signal Processing, Linear Algebra, Machine Learning, Natural Language Processing (NLP), IPU, GPT, Generative Pre-trained Transformers (GPT)


.NET, Unity, Unity3D, PowerPC, Boost


Microsoft Visual Studio, Git, Subversion (SVN), Emacs, Eclipse IDE, MATLAB, NGINX, Torque, CMake, NEURON, GNU Debugger, GNUMake, Makefile, Make, Apache Ant


Oracle RDBMS, PostgreSQL, SQLite

2010 - 2011

Master of Science in Computational Neuroscience and Machine Learning

University of Edinburgh - Edinburgh, UK

1996 - 2000

Bachelor of Technology/Engineering in Computer Science

University of Kerala - Kollam, Kerala, India