Nemanja Grujic, Performance Optimization Developer in Niš, Serbia
Nemanja Grujic

Performance Optimization Developer in Niš, Serbia

Member since April 12, 2016
Nemanja is a software engineer with over 11 years of industry experience in C++, CUDA, computer vision, machine learning, performance optimizations, and more. He is passionate about programming professionally and privately and strives to write top quality and top performance code.
Nemanja is now available for hire



  • C++ 16 years
  • Computer Vision 8 years
  • Performance Optimization 8 years
  • CUDA 7 years
  • Performance 7 years
  • GPGPU 7 years
  • Low Latency 7 years
  • C++17 2 years


Niš, Serbia



Preferred Environment

C++17, C++, Git, Visual Studio, Windows

The most amazing...

...thing I've made is an autonomous poker program that uses Bayesian inference to estimate the opponent's style after just a few hands and win even against humans.


  • R&D Lead Engineer

    2013 - 2019
    • Wrote automatic optimizer of data-intensive algorithms in C++. Code automatically generates multi-core and vector optimizations.
    • Worked closely with other researchers on design, implementation, and optimization of computer vision, image processing, video processing, machine learning, AI, and deep learning algorithms.
    • Envisioned an algorithm for creating mosaic images from surveillance footage or sets of aerial images. Guided the team to the successful implementation of the algorithm.
    • Successfully modernized C++ codebase by porting it to C++11, C++14, and C++17. Made code more secure and less prone to errors and memory leaks.
    • Established coding style guidelines and introduced good programming practices to the team, including pair programming, code reviews, and supported teamwork.
    • Led 3D GIS project development - similar to Google Earth with real-time video stream rendering on top of the 3D globe.
    • Managed the R&D division of the company. Monitored all major R&D projects, reported on progress, and provided technical guidance to keep on track.
    Technologies: OpenGL, 3D Graphics, GIS, Agile, Video Processing, Image Processing, Deep Learning, Machine Learning, Computer Vision, C++17, C++14, C++11, C++
  • R&D Engineer - Senior Software Engineer

    2008 - 2013
    • Improved the current multi-frame super-resolution algorithm by making it resilient to ghosting effects present at the time.
    • Ported most of the company's video processing algorithms to CUDA (GPGPU), including super-resolution, de-blurring, contrast enhancement, frame-rate adjustment, and more.
    • Optimized all the above-mentioned algorithms and enabled real-time performance.
    • Created a testing framework for video processing algorithms to ensure successful regression testing under change.
    • Ported an extremely challenging MSER feature detector to CUDA (filed a patent).
    • Created an RAII-based GPU memory management system which hides memory allocation latency and enables even more performance.
    Technologies: Windows, OpenCL/GPU, Git, Caching, Attention to Detail, Parallel Programming, Multithreading, Performance Optimization, Visual Studio, Performance, Optimization, Profiling, Memory Management, Unit Testing, Test-driven Development (TDD), Low Latency, Real-time Systems, Image Processing, Video Processing, Computer Vision, GPGPU, OpenCL, CUDA, C++
  • Junior Researcher

    2007 - 2008
    Deutsche Telekom Laboratories
    • Researched the problem of real-time human head poses estimation in the field of computer vision (CV).
    • Implemented a novel approach to this problem. Used OpenCV, C++, and Linux.
    • Published paper on the proposed method in the Automatic Face and Gesture Recognition conference.
    Technologies: OpenCV, Computer Vision, Linux, C++
  • Software Development Intern

    2005 - 2005
    Faculty of Electronic Enegineering, University of Nis
    • Developed a 3D graphics engine for massive landscape rendering in C++ and OpenGL.
    • The engine was able to render gigabytes of terrain texture data in real time by automatically adjusting level of detail on per frame basis.
    • Optimized engine performance to achieve real-time.
    Technologies: Low Latency, Real-time Systems, Performance, Optimization, Visual Studio, 3D Graphics Engines, 3D Graphics, OpenGL, C++


  • Poker Playing Bot

    An autonomous poker-playing program. The program was winning against humans and used Bayesian inference to estimate the opponents' style of play after just a few hands. The program won first place at Annual Computer Poker Competition 2018 in Six-Player No-Limit Texas Hold'em category and second place at Acpc 2017 in Heads-up No-Limit Texas Hold'em category.


  • Languages

    C++, C++11, C++17, C++14, C, Embedded C, Python, C#.NET, HTML, SQL, C#
  • Paradigms

    Real-time Systems, Parallel Programming, GPGPU, Unit Testing, Clean Code, Scrum, Test-driven Development (TDD), Agile
  • Platforms

    CUDA, Windows, Linux
  • Other

    Low Latency, Computer Science, Performance Optimization, Performance, Multithreading, Optimization, Attention to Detail, Video Processing, Memory Management, OpenCL/GPU, Algorithms, Machine Learning, Data Structures, Computer Vision, Deep Learning, Computer Graphics, Mathematics, Applied Mathematics, Image Processing, Artificial Intelligence (AI), Embedded Software, Caching, Profiling, 3D Graphics, 3D Graphics Engines, Bayesian Inference & Modeling, Naive Bayes, Poker, Linear Algebra
  • Tools

    Visual Studio, GIS, Microsoft Visual Studio, Git, C#.NET WinForms, Intel IPP, MATLAB
  • Frameworks

  • Libraries/APIs

    OpenGL, OpenCV


  • M.Sc. degree in Computer Science
    2000 - 2006
    Nis University - Nis

To view more profiles

Join Toptal
Share it with others