Gautham Ganapathy, Developer in Shrivenham, United Kingdom
Gautham is available for hire
Hire Gautham

Gautham Ganapathy

Verified Expert  in Engineering

Software Developer

Location
Shrivenham, United Kingdom
Toptal Member Since
May 16, 2017

Gautham has spent 18+ years developing and optimizing software—primarily for parallel-programmed HPC apps (GPU, multicore, clusters) and embedded systems. He has spent over a decade working with NVIDIA CUDA GPUs and is also familiar with JavaScript, Node.js, Socket.IO, and React. Gautham has an academic background in machine learning and computational neuroscience, and a personal interest in Haskell and Rust.

Availability

Part-time

Preferred Environment

Git, Emacs, Ubuntu, Visual Studio Code (VS Code)

The most amazing...

...product I've developed was a real-time microseismic imaging app that ran on a GPU microcluster that reduced turnaround time from weeks to 75 seconds.

Work Experience

Platform Software Developer

2020 - PRESENT
Graphcore
  • Added Cholesky factorization in Poplar and TensorFlow.
  • Improved the support for convolutional image resizing.
  • Worked on improving TensorFlow support on Graphcore IPUs.
  • Collaborated on a new internal compiler project for our next-generation hardware and software stack.
Technologies: C++, Python, TensorFlow, IPU

C++ Software Developer

2022 - 2022
Motion Signal Technologies
  • Reviewed code and recommended performance improvement in existing code.
  • Added generic C/C++ optimizations in the data load and filtering code, with significant improvements in performance.
  • Reviewed generated assembly to investigate ARM v8-specific optimizations.
Technologies: C++, Raspberry Pi, Optimization, OpenMP

Back-end Common Lisp (CL) Developer

2019 - 2020
SocialFlight
  • Identified new features and enhancements based on discussions with the client and implemented them in an existing Common Lisp web app.
  • Integrated the application with an external service that provided social media exploratory services.
  • Added support for exporting billing data to Google Sheets.
Technologies: Common Lisp (CL), Python

Principal Software Developer

2013 - 2020
Halliburton
  • Optimized the 4D MTMI stacking application—speeding up the processing time by six times, resulting in financial gains on the rented HPC cluster.
  • Reorganized and refactored the Make-based build system to make it more efficient and extensible.
  • Optimized and parallelized various microseismic algorithms to run across a cluster.
  • Optimized the 4D MTMI stacking application for NVIDIA CUDA GPUs, resulting in a six times improvement over the multithreaded version running on the high-end Xeon CPUs of comparable costs.
  • Refactored and optimized the offline microseismic workflow into a single application running on a GPU microcluster with OpenMP, Boost. Thread, MPI, and CUDA for use as a field system reducing the event display latency from weeks to around 75 seconds.
  • Wrote a distributed version of the map() to run over our PBS-based cluster to implement transparent clustering in Python scripts implementing seismic workflows.
  • Wrote a genetic algorithm framework in Python and used it to implement a travel-time table optimizer.
  • Integrated Open Inventor's heightfield implementation into the G3D graphics engine used to build the DecisionSpace platform.
  • Developed deserialization and reserialization code for 3D objects and integrated connectivity backed with DecisionSpace and the Unity3D engine for supporting VR and AR using Unity3D and Microsoft HoloLens.
  • Integrated Valve's OpenVR SDK to support a VR display using SteamVR headsets with DecisionSpace.
Technologies: OpenVR, Unity3D, OpenGL, Java, MPI, OpenMP, NVIDIA CUDA, Python, C, C++

Software Developer

2012 - 2013
Textensor Ltd
  • Developed a LEMS (https://github.com/LEMS/pylems, http://neuroml.org/lems) API and a simulator in Python as part of an EPSRC-funded project. The simulator reads a model specification and then dynamically generates executable code to run the model.
  • Developed a real-time courtroom transcript streaming application using Node.js and Socket.IO that captures input from a transcript stream and streams it live to multiple clients connected to multiple interlinked servers (courtroom, client locations).
  • Integrated the above application with the Magnum 2 application for managing legal document bundles. http://www.opus2.com/magnum/case-analysis-software#realtime to enable discussions over shared legal documents, including streamed transcripts.
Technologies: NGINX, PHP, Socket.IO, Node.js, JavaScript, Python

Senior Software Developer | Software Architect

2000 - 2010
Wipro Technologies
  • Ported and optimized a Fortran-based real-time migration (RTM) codebase from Shell to run on NVIDIA's Tesla C2050 card. I handled the initial effort estimation and design and phase 1 of the optimization work.
  • Ported and optimized the 3D inversion code in Seismic Unix (suinvzco3d) to run on an NVIDIA C1060 Tesla card. Speedups achieved were in the range of 48x-52x.
  • Ported and optimized image processing algorithms on CUDA for a microscope imaging application for Olympus. The speedups achieved were in the range of 35x-40x (120x for the core interpolation algorithms).
  • Led a team to develop and optimize VC1 decoder firmware on the Quartics QV1500 platform, a 12-core VLIW-based SOC. The firmware for each DSP was developed using assembly language and then optimized to decode HD videos.
  • Led a team to develop audio and video components for an automotive infotainment system running on VxWorks.
  • Ported an optimized Wipro's H.263 encoder IP for TI's DM642 DSP and implemented a multi-channel streaming camera, including developing and RTSP/RTP stack from scratch.
  • Ported and optimized Wipro's H.264 decoder IP for TI's C64x and ARM's ARM11 architectures, including C and assembly-language optimizations and optimal pipeline and cache usage optimizations for ARM11.
Technologies: PowerPC, TI DSP C2000, Digital Signal Processing, ARM, Assembly Language, SystemC, VxWorks, AAC, MPEG, H.264, MPI, OpenMP, Fortran, OpenGL, NVIDIA CUDA, C++, C

PyLEMS

https://github.com/LEMS/pylems
Developed a simulator called PyLEMS for LEMS (Lems.github.io/LEMS), the underlying compute specification language for NeuroML v2, a language for describing models of individual and connected neurons. Developed while working at Textensor as part of a research project funded by an EPSRC grant.

Real-time Courtroom Transcript Streaming

https://www.opus2.com/transcripts
This streams courtroom transcripts in real-time into a legal document management system; it is simultaneously visible to multiple clients both in the courtroom and offsite. Additionally, viewers could annotate the document in real-time and chat with other viewers within the same workspace.
2010 - 2011

Master of Science in Computational Neuroscience and Machine Learning

University of Edinburgh - Edinburgh, UK

1996 - 2000

Bachelor of Technology/Engineering in Computer Science

University of Kerala - Kollam, Kerala, India

Libraries/APIs

OpenMP, Socket.IO, Pandas, React, MPI, OpenGL, Node.js, SystemC, Pthreads, Keras, TensorFlow

Tools

Microsoft Visual Studio, Git, Subversion (SVN), Emacs, Eclipse IDE, MATLAB, NGINX, Torque, CMake, NEURON, GNU Debugger (GDB), GNUMake, Makefile, Make, Apache Ant

Languages

Python, C++, C, C#, Java, Bash, JavaScript, Haskell, Assembly Language, TypeScript, PHP, Rust, Fortran, R, Perl, Common Lisp (CL), Emacs Lisp, Python 3

Platforms

Linux, NVIDIA CUDA, VxWorks, Windows, Ubuntu, Steam, Android, Xamarin, Google Chrome, Raspberry Pi, Visual Studio Code (VS Code)

Paradigms

Distributed Computing, High-performance Computing (HPC), Parallel Computing, Object-oriented Programming (OOP), Functional Programming, Functional Reactive Programming, Reactive Programming

Storage

Oracle RDBMS, PostgreSQL, SQLite

Frameworks

.NET, Unity, Unity3D, PowerPC, Boost

Other

Graphics Processing Unit (GPU), GPU Computing, Optimization, ARM, VC-1, H.264, Eclipse CDT, Algorithms, Mathematics, TI DSP C2000, Visual Composer, OpenVR, Multiprocessing, Multithreading, Neuroscience, Deep Learning, Computational Linguistics, MPEG, AAC, Digital Signal Processing, Linear Algebra, Machine Learning, Natural Language Processing (NLP), IPU, Generative Pre-trained Transformers (GPT), Computational Neuroscience, Computer Science, Data Structures, Operating Systems

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring