Gautham is available for hire

Gautham Ganapathy

Verified Expert in Engineering

Software Developer

Location

Shrivenham, United Kingdom

Toptal Member Since

May 16, 2017

Gautham has spent 18+ years developing and optimizing software—primarily for parallel-programmed HPC apps (GPU, multicore, clusters) and embedded systems. He has spent over a decade working with NVIDIA CUDA GPUs and is also familiar with JavaScript, Node.js, Socket.IO, and React. Gautham has an academic background in machine learning and computational neuroscience, and a personal interest in Haskell and Rust.

Linux C++C NVIDIA CUDA Python OpenMP Socket.IO Algorithms ARM Object-oriented Programming (OOP)Windows Git Microsoft Visual Studio Bash Unity3D

Portfolio

Graphcore

C++, Python, TensorFlow, IPU

Motion Signal Technologies

C++, Raspberry Pi, Optimization, OpenMP

SocialFlight

Common Lisp (CL), Python

Experience

C++ - 20 years Parallel Computing - 12 years Python - 9 years Unity - 3 years Haskell - 3 years TypeScript - 2 years Rust - 2 years Machine Learning - 2 years

Availability

Part-time

Preferred Environment

Git, Emacs, Ubuntu, Visual Studio Code (VS Code)

The most amazing...

...product I've developed was a real-time microseismic imaging app that ran on a GPU microcluster that reduced turnaround time from weeks to 75 seconds.

Work Experience

Platform Software Developer

2020 - PRESENT

Graphcore

Added Cholesky factorization in Poplar and TensorFlow.
Improved the support for convolutional image resizing.
Worked on improving TensorFlow support on Graphcore IPUs.
Collaborated on a new internal compiler project for our next-generation hardware and software stack.

Technologies: C++, Python, TensorFlow, IPU

C++ Software Developer

2022 - 2022

Motion Signal Technologies

Reviewed code and recommended performance improvement in existing code.
Added generic C/C++ optimizations in the data load and filtering code, with significant improvements in performance.
Reviewed generated assembly to investigate ARM v8-specific optimizations.

Technologies: C++, Raspberry Pi, Optimization, OpenMP

Back-end Common Lisp (CL) Developer

2019 - 2020

SocialFlight

Identified new features and enhancements based on discussions with the client and implemented them in an existing Common Lisp web app.
Integrated the application with an external service that provided social media exploratory services.
Added support for exporting billing data to Google Sheets.

Technologies: Common Lisp (CL), Python

Principal Software Developer

2013 - 2020

Halliburton

Optimized the 4D MTMI stacking application—speeding up the processing time by six times, resulting in financial gains on the rented HPC cluster.
Reorganized and refactored the Make-based build system to make it more efficient and extensible.
Optimized and parallelized various microseismic algorithms to run across a cluster.
Optimized the 4D MTMI stacking application for NVIDIA CUDA GPUs, resulting in a six times improvement over the multithreaded version running on the high-end Xeon CPUs of comparable costs.
Refactored and optimized the offline microseismic workflow into a single application running on a GPU microcluster with OpenMP, Boost. Thread, MPI, and CUDA for use as a field system reducing the event display latency from weeks to around 75 seconds.
Wrote a distributed version of the map() to run over our PBS-based cluster to implement transparent clustering in Python scripts implementing seismic workflows.
Wrote a genetic algorithm framework in Python and used it to implement a travel-time table optimizer.
Integrated Open Inventor's heightfield implementation into the G3D graphics engine used to build the DecisionSpace platform.
Developed deserialization and reserialization code for 3D objects and integrated connectivity backed with DecisionSpace and the Unity3D engine for supporting VR and AR using Unity3D and Microsoft HoloLens.
Integrated Valve's OpenVR SDK to support a VR display using SteamVR headsets with DecisionSpace.

Technologies: OpenVR, Unity3D, OpenGL, Java, MPI, OpenMP, NVIDIA CUDA, Python, C, C++

Software Developer

2012 - 2013

Textensor Ltd

Developed a LEMS (https://github.com/LEMS/pylems, http://neuroml.org/lems) API and a simulator in Python as part of an EPSRC-funded project. The simulator reads a model specification and then dynamically generates executable code to run the model.
Developed a real-time courtroom transcript streaming application using Node.js and Socket.IO that captures input from a transcript stream and streams it live to multiple clients connected to multiple interlinked servers (courtroom, client locations).
Integrated the above application with the Magnum 2 application for managing legal document bundles. http://www.opus2.com/magnum/case-analysis-software#realtime to enable discussions over shared legal documents, including streamed transcripts.

Technologies: NGINX, PHP, Socket.IO, Node.js, JavaScript, Python

Senior Software Developer | Software Architect

2000 - 2010

Wipro Technologies

Ported and optimized a Fortran-based real-time migration (RTM) codebase from Shell to run on NVIDIA's Tesla C2050 card. I handled the initial effort estimation and design and phase 1 of the optimization work.
Ported and optimized the 3D inversion code in Seismic Unix (suinvzco3d) to run on an NVIDIA C1060 Tesla card. Speedups achieved were in the range of 48x-52x.
Ported and optimized image processing algorithms on CUDA for a microscope imaging application for Olympus. The speedups achieved were in the range of 35x-40x (120x for the core interpolation algorithms).
Led a team to develop and optimize VC1 decoder firmware on the Quartics QV1500 platform, a 12-core VLIW-based SOC. The firmware for each DSP was developed using assembly language and then optimized to decode HD videos.
Led a team to develop audio and video components for an automotive infotainment system running on VxWorks.
Ported an optimized Wipro's H.263 encoder IP for TI's DM642 DSP and implemented a multi-channel streaming camera, including developing and RTSP/RTP stack from scratch.
Ported and optimized Wipro's H.264 decoder IP for TI's C64x and ARM's ARM11 architectures, including C and assembly-language optimizations and optimal pipeline and cache usage optimizations for ARM11.

Technologies: PowerPC, TI DSP C2000, Digital Signal Processing, ARM, Assembly Language, SystemC, VxWorks, AAC, MPEG, H.264, MPI, OpenMP, Fortran, OpenGL, NVIDIA CUDA, C++, C

Experience

PyLEMS

https://github.com/LEMS/pylems

Developed a simulator called PyLEMS for LEMS (Lems.github.io/LEMS), the underlying compute specification language for NeuroML v2, a language for describing models of individual and connected neurons. Developed while working at Textensor as part of a research project funded by an EPSRC grant.

Real-time Courtroom Transcript Streaming

https://www.opus2.com/transcripts

This streams courtroom transcripts in real-time into a legal document management system; it is simultaneously visible to multiple clients both in the courtroom and offsite. Additionally, viewers could annotate the document in real-time and chat with other viewers within the same workspace.

Skills

Languages

Python, C++, C, C#, Java, Bash, JavaScript, Haskell, Assembly Language, TypeScript, PHP, Rust, Fortran, R, Perl, Common Lisp (CL), Emacs Lisp, Python 3

Libraries/APIs

OpenMP, Socket.IO, Pandas, React, MPI, OpenGL, Node.js, SystemC, Pthreads, Keras, TensorFlow

Paradigms

Distributed Computing, High-performance Computing, Parallel Computing, Object-oriented Programming (OOP), Functional Programming, Functional Reactive Programming, Reactive Programming

Platforms

Linux, NVIDIA CUDA, VxWorks, Windows, Ubuntu, Steam, Android, Xamarin, Google Chrome, Raspberry Pi, Visual Studio Code (VS Code)

Other

Graphics Processing Unit (GPU), GPU Computing, Optimization, ARM, VC-1, H.264, Eclipse CDT, Algorithms, Mathematics, TI DSP C2000, Visual Composer, OpenVR, Multiprocessing, Multithreading, Neuroscience, Deep Learning, Computational Linguistics, MPEG, AAC, Digital Signal Processing, Linear Algebra, Machine Learning, Natural Language Processing (NLP), IPU, GPT, Generative Pre-trained Transformers (GPT), Computational Neuroscience, Computer Science, Data Structures, Operating Systems

Frameworks

.NET, Unity, Unity3D, PowerPC, Boost

Tools

Microsoft Visual Studio, Git, Subversion (SVN), Emacs, Eclipse IDE, MATLAB, NGINX, Torque, CMake, NEURON, GNU Debugger, GNUMake, Makefile, Make, Apache Ant

Storage

Oracle RDBMS, PostgreSQL, SQLite

Education

2010 - 2011

Master of Science in Computational Neuroscience and Machine Learning

University of Edinburgh - Edinburgh, UK

1996 - 2000

Bachelor of Technology/Engineering in Computer Science

University of Kerala - Kollam, Kerala, India

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring