Andrija Gajic, Developer in Belgrade, Serbia
Andrija is available for hire
Hire Andrija

Andrija Gajic

Verified Expert  in Engineering

Bio

Andrija is an AI engineer specializing in machine learning projects, including perception, computer vision, image processing, and NLP. Previously, he acted as the first engineer at AIM Intelligent Machines, growing the company's valuation from $7 million to $120 million, serving as the perception lead. Also, Andrija gained experience in machine learning during his time at Microsoft and specialized in computer vision while working at Nokia Bell Labs.

Portfolio

AIM Intelligent Machines
Machine Learning, Python, PyTorch, Graphics Processing Unit (GPU), Point Clouds
Microsoft
Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP)...
Universidad Autonoma de Madrid
PyTorch, Computer Vision, Semantic Segmentation

Experience

  • Python - 6 years
  • Machine Learning - 4 years
  • Keras - 4 years
  • Image Processing - 3 years
  • Computer Vision - 3 years
  • Deep Learning - 3 years
  • Semantic Segmentation - 2 years
  • PyTorch - 2 years

Availability

Full-time

Preferred Environment

Windows, Linux, Python, Visual Studio Code (VS Code), C++, PyTorch, Keras

The most amazing...

...project I've worked on was making giant bulldozers understand their surroundings and move accordingly.

Work Experience

Lead Perception Engineer

2021 - PRESENT
AIM Intelligent Machines
  • Joined the company as the 1st employee when the evaluation was $7 million. The company currently has 20 employees and is evaluated at $120 million.
  • Designed and developed the perception stack from scratch, both algorithms and selecting HW components – LiDARs, stereo cameras, and navigational sensors.
  • Developed localization based on the information coming from machine sensors, refined by multi-scale ICP registration algorithms.
  • Built algorithms for the real-time update of the machine's surroundings for both bulldozers and excavators, using information from sensors and ICP registration algorithms.
  • Developed a safety stack consisting of obstacle detection based on slopes in front of the machine, point cloud object detection, and camera object detection.
  • Built sensor fusion by combining data from multiple LiDARs and stereo cameras.
  • Participated in onboarding and tutoring new hires into the team and led meetings with vendors.
  • Constructed and graded assignments, both on algorithms and perception.
Technologies: Machine Learning, Python, PyTorch, Graphics Processing Unit (GPU), Point Clouds

Machine Learning Intern

2020 - 2021
Microsoft
  • Developed a real-time object detection algorithm for object grouping in PowerPoint slides. The network was based on a single-shot detector (SSD), which was further optimized by running the lottery ticket hypothesis using structured pruning.
  • Built a hierarchical model used for multitask learning for paragraph role detection and document type classification. The model consists of two transformer-based networks to process text in paragraphs and merge paragraphs into documents.
  • Collaborated with teams in the United States, Microsoft Research Asia (China), India, and Belgrade.
  • Worked with labelers to create a new dataset for training algorithms.
  • Used Azure Machine Learning for training and reached the optimal state of parameters by performing hyperparameter sweeps.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Object Detection, Recurrent Neural Networks (RNNs), BERT, Transformer Models, Models, Machine Learning, Azure Machine Learning

Master's Thesis Student

2020 - 2020
Universidad Autonoma de Madrid
  • Designed an architecture that combines RGB, depth, and semantic data extracted from RGB data for RGB-D scene recognition. Each data modality was processed in a separate branch and then merged before a final classifier using an attention mechanism.
  • Surpassed the previous state-of-the-art in the RGB-D scene recognition task on all available datasets.
  • Understood the developed model by introducing random perturbations in input for each modality and analyzing its impact on the output. In this way, I analyzed the importance of depth and semantic modalities for each sample.
  • Prepared and published a paper entitled "Visualizing the Effect of Semantic Classes in the Attribution of Scene Recognition Models" at the International Conference on Pattern Recognition and started working on another paper.
Technologies: PyTorch, Computer Vision, Semantic Segmentation

Computer Vision Intern

2020 - 2020
Nokia Bell Labs
  • Assisted in implementing a real-time semantic segmentation network based on ThunderNet in Keras used for segmenting each pixel to either a human body or other.
  • Prepared and published a paper titled "Egocentric Human Segmentation for Mixed Reality" at the workshop at the CVPR2020 conference.
  • Created a semisynthetic dataset by blending different egocentric images of human arms taken by headset with backgrounds using an alpha matting algorithm.
  • Extracted depth information from the stream and incorporated another branch for processing depth in addition to RGB information.
Technologies: Python, Keras, Semantic Segmentation, Computer Vision, Image Processing, Mixed Reality (MR)

Undergraduate Assistant

2016 - 2018
University of Belgrade, School of Electrical Engineering
  • Learned and applied OOP and computer organization concepts.
  • Defined homework tasks, wrote test functions for grading homework, and evaluated students' knowledge of basic OOP concepts and C++.
  • Explained laboratory exercises related to computer organiztion, assisted students during lab exercises, and evaluated their knowledge of the architecture of computer processors.
Technologies: C++, Object-oriented Programming (OOP)

Experience

Tennis Analysis

I created an application that analyzes full tennis match videos in collaboration with professional tennis players. The application extracts valuable information on players' and opponents' patterns to prepare for matches. The application combines several computer vision tasks, including object detection, object tracking, video classification, person re-identification, homography matrix extraction, etc.

It uses TrackNet for ball and court detection, as well as YOLOv5 and ReID for player detection. A homography matrix is created using court detection results, and the players and the ball are converted to the 2D frame. Bounces and hits are detected using the R3D video recognition network, while OCR is used for scoreboard detection.

Egocentric Human Segmentation for Mixed Reality

https://arxiv.org/pdf/2005.12074.pdf
Augmented virtuality systems merge real-world objects into virtual surroundings. The typical objects merged are a user's arms and body, which is done by performing semantic segmentation on the pixel level, determining whether each pixel corresponds to the body or background. Before this work, methods used color image processing for segmentation.

In our work, we proposed the usage of deep neural networks for semantic segmentation for mixed reality. Because the segmentation has to be done in real time, we used ThunderNet architecture as a starting point, further optimizing some of its bottlenecks for our cause, such as adding long skip connections between the encoder and decoder and changing the pyramid pooling module.

Since we moved to using deep neural networks for semantic segmentation, we also needed a large dataset, and there was no such dataset available online for mixed reality. Therefore, I was involved in the extraction of a semi-synthetic dataset. This involved recording egocentric videos of a user performing actions in front of green chroma, extracting from it the foreground mask, and then blending it with recorded egocentric videos of the background using an alpha matting algorithm.

Visualizing the Effect of Semantic Categories in the Attribution of Scene Recognition Models

http://www-vpu.eps.uam.es/publications/SemanticEffectSceneRecognition/
The performance of convolutional neural networks for image classification has vastly increased in recent years. This success goes hand in hand with the need to explain and understand their decisions.

The problem of attribution deals specifically with the characterization of the response of convolutional neural networks by identifying the input features responsible for the model's decision. Perturbation-based attribution methods measure the effect of perturbations applied to the input image in the model's output. In this paper, we discussed the limitations of existing approaches and proposed a novel perturbation-based attribution method guided by semantic segmentation.

Our method inhibits specific image areas according to their assigned semantic label to link perturbations with a semantic meaning. The proposed semantic-guided attribution method enables us to delve deeper into scene recognition interpretability by obtaining the sets of relevant, irrelevant, and distracting semantic labels for each scene class.

Experimental results suggest that the method can boost research by increasing the understanding of convolutional neural networks while uncovering dataset biases that may have been included inadvertently.

Ball Tracking in FIFA 21

https://github.com/andrijagajic/ball_tracking
An OpenCV project consisting of two parts: detection and tracking. Detection is used to initialize the tracker and help it reinitialize once the tracker loses the target. Template matching is used as an algorithm for detection with three different templates of a ball (in different scales) used. Once the detection is set, tracking takes place. Tracking is done using the channel and spatial reliability tracking algorithm. The precision of tracking is monitored, and if the difference between template and tracking prediction gets too high, the algorithm is reinitialized. The algorithm performs at approximately an 11FPS frame rate.

Classification of Cancers Based on Genetic Code of Patient

The goal of the project was to perform a classification of patients based on their primary tumor type, using the information about their genetic code. To prepare for the project, I completed a series of bioinformatics courses and learned about the main algorithms used in bioinformatics, such as alignment and assembly, and the most used data types. Then I became familiar with the Cancer Cell Line Encyclopedia (CCLE), which contains 947 BAM files of different cancer cells. Using these BAM files, I generated VCF files using Samtools.

The idea was to use the generated files and, based on the variations present in them, perform classification of the primary tumor type to either cervical or lung cancer. The classification was performed based on the mutations that happened in receptor tyrosine kinases (RTKs). The features selected were locations in the DNA code where the mutations occurred, the number of mutations in each of the RTKs, and the frequency of each alternative base compared to the other bases in one sample. The classification was performed using neural networks in the scikit-learn library.

Education

2019 - 2020

Erasmus Mundus Joint Master's Degree in Image Processing and Computer Vision

Universite de Bordeaux - Bordeaux, France

2019 - 2019

Erasmus Mundus Joint Master's Degree in Image Processing and Computer Vision

Universidad Autonoma de Madrid - Madrid, Spain

2018 - 2018

Erasmus Mundus Joint Master's Degree in Image Processing and Computer Vision

Pazmany Peter Catholic University - Budapest, Hungary

2014 - 2018

Bachelor's Degree in Electrical Engineering and Computer Science

University of Belgrade, School of Electrical Engineering - Belgrade, Serbia

Certifications

NOVEMBER 2018 - PRESENT

Deep Learning Specialization

Coursera

JULY 2017 - PRESENT

Machine Learning Course

Coursera

Skills

Libraries/APIs

NumPy, PyTorch, Keras, Pandas, TensorFlow, OpenCV

Tools

Azure Machine Learning

Languages

Python, C++, Java

Platforms

Windows, Visual Studio Code (VS Code), Linux

Paradigms

Object-oriented Programming (OOP), Variational Methods

Industry Expertise

Bioinformatics

Other

Computer Vision, Deep Learning, Semantic Segmentation, Image Analysis, Machine Learning, Image Processing, 3D Reconstruction, Probability Theory, Statistics, Bayesian Statistics, Artificial Intelligence (AI), Electrical Engineering, Stochastic Modeling, Data Science, Medical Imaging, FPGA, Video Processing, Object Detection, Generative Adversarial Networks (GANs), 3D Image Processing, Image Reconstruction, Natural Language Processing (NLP), Recurrent Neural Networks (RNNs), BERT, Transformer Models, Object Tracking, Models, Deep Neural Networks (DNNs), Mixed Reality (MR), Convolutional Neural Networks (CNNs), Generative Pre-trained Transformers (GPT), Graphics Processing Unit (GPU), Point Clouds, Optical Character Recognition (OCR)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring