
Andrija Gajic
Verified Expert in Engineering
AI Engineer and Developer
Belgrade, Serbia
Toptal member since September 24, 2021
Andrija is an AI engineer specializing in machine learning projects, including perception, computer vision, image processing, and NLP. Previously, he acted as the first engineer at AIM Intelligent Machines, growing the company's valuation from $7 million to $120 million, serving as the perception lead. Also, Andrija gained experience in machine learning during his time at Microsoft and specialized in computer vision while working at Nokia Bell Labs.
Portfolio
Experience
- Python - 6 years
- Machine Learning - 4 years
- Keras - 4 years
- Image Processing - 3 years
- Computer Vision - 3 years
- Deep Learning - 3 years
- Semantic Segmentation - 2 years
- PyTorch - 2 years
Availability
Preferred Environment
Windows, Linux, Python, Visual Studio Code (VS Code), C++, PyTorch, Keras
The most amazing...
...project I've worked on was making giant bulldozers understand their surroundings and move accordingly.
Work Experience
Lead Perception Engineer
AIM Intelligent Machines
- Joined the company as the 1st employee when the evaluation was $7 million. The company currently has 20 employees and is evaluated at $120 million.
- Designed and developed the perception stack from scratch, both algorithms and selecting HW components – LiDARs, stereo cameras, and navigational sensors.
- Developed localization based on the information coming from machine sensors, refined by multi-scale ICP registration algorithms.
- Built algorithms for the real-time update of the machine's surroundings for both bulldozers and excavators, using information from sensors and ICP registration algorithms.
- Developed a safety stack consisting of obstacle detection based on slopes in front of the machine, point cloud object detection, and camera object detection.
- Built sensor fusion by combining data from multiple LiDARs and stereo cameras.
- Participated in onboarding and tutoring new hires into the team and led meetings with vendors.
- Constructed and graded assignments, both on algorithms and perception.
Machine Learning Intern
Microsoft
- Developed a real-time object detection algorithm for object grouping in PowerPoint slides. The network was based on a single-shot detector (SSD), which was further optimized by running the lottery ticket hypothesis using structured pruning.
- Built a hierarchical model used for multitask learning for paragraph role detection and document type classification. The model consists of two transformer-based networks to process text in paragraphs and merge paragraphs into documents.
- Collaborated with teams in the United States, Microsoft Research Asia (China), India, and Belgrade.
- Worked with labelers to create a new dataset for training algorithms.
- Used Azure Machine Learning for training and reached the optimal state of parameters by performing hyperparameter sweeps.
Master's Thesis Student
Universidad Autonoma de Madrid
- Designed an architecture that combines RGB, depth, and semantic data extracted from RGB data for RGB-D scene recognition. Each data modality was processed in a separate branch and then merged before a final classifier using an attention mechanism.
- Surpassed the previous state-of-the-art in the RGB-D scene recognition task on all available datasets.
- Understood the developed model by introducing random perturbations in input for each modality and analyzing its impact on the output. In this way, I analyzed the importance of depth and semantic modalities for each sample.
- Prepared and published a paper entitled "Visualizing the Effect of Semantic Classes in the Attribution of Scene Recognition Models" at the International Conference on Pattern Recognition and started working on another paper.
Computer Vision Intern
Nokia Bell Labs
- Assisted in implementing a real-time semantic segmentation network based on ThunderNet in Keras used for segmenting each pixel to either a human body or other.
- Prepared and published a paper titled "Egocentric Human Segmentation for Mixed Reality" at the workshop at the CVPR2020 conference.
- Created a semisynthetic dataset by blending different egocentric images of human arms taken by headset with backgrounds using an alpha matting algorithm.
- Extracted depth information from the stream and incorporated another branch for processing depth in addition to RGB information.
Undergraduate Assistant
University of Belgrade, School of Electrical Engineering
- Learned and applied OOP and computer organization concepts.
- Defined homework tasks, wrote test functions for grading homework, and evaluated students' knowledge of basic OOP concepts and C++.
- Explained laboratory exercises related to computer organiztion, assisted students during lab exercises, and evaluated their knowledge of the architecture of computer processors.
Experience
Tennis Analysis
It uses TrackNet for ball and court detection, as well as YOLOv5 and ReID for player detection. A homography matrix is created using court detection results, and the players and the ball are converted to the 2D frame. Bounces and hits are detected using the R3D video recognition network, while OCR is used for scoreboard detection.
Egocentric Human Segmentation for Mixed Reality
https://arxiv.org/pdf/2005.12074.pdfIn our work, we proposed the usage of deep neural networks for semantic segmentation for mixed reality. Because the segmentation has to be done in real time, we used ThunderNet architecture as a starting point, further optimizing some of its bottlenecks for our cause, such as adding long skip connections between the encoder and decoder and changing the pyramid pooling module.
Since we moved to using deep neural networks for semantic segmentation, we also needed a large dataset, and there was no such dataset available online for mixed reality. Therefore, I was involved in the extraction of a semi-synthetic dataset. This involved recording egocentric videos of a user performing actions in front of green chroma, extracting from it the foreground mask, and then blending it with recorded egocentric videos of the background using an alpha matting algorithm.
Visualizing the Effect of Semantic Categories in the Attribution of Scene Recognition Models
http://www-vpu.eps.uam.es/publications/SemanticEffectSceneRecognition/The problem of attribution deals specifically with the characterization of the response of convolutional neural networks by identifying the input features responsible for the model's decision. Perturbation-based attribution methods measure the effect of perturbations applied to the input image in the model's output. In this paper, we discussed the limitations of existing approaches and proposed a novel perturbation-based attribution method guided by semantic segmentation.
Our method inhibits specific image areas according to their assigned semantic label to link perturbations with a semantic meaning. The proposed semantic-guided attribution method enables us to delve deeper into scene recognition interpretability by obtaining the sets of relevant, irrelevant, and distracting semantic labels for each scene class.
Experimental results suggest that the method can boost research by increasing the understanding of convolutional neural networks while uncovering dataset biases that may have been included inadvertently.
Ball Tracking in FIFA 21
https://github.com/andrijagajic/ball_trackingClassification of Cancers Based on Genetic Code of Patient
The idea was to use the generated files and, based on the variations present in them, perform classification of the primary tumor type to either cervical or lung cancer. The classification was performed based on the mutations that happened in receptor tyrosine kinases (RTKs). The features selected were locations in the DNA code where the mutations occurred, the number of mutations in each of the RTKs, and the frequency of each alternative base compared to the other bases in one sample. The classification was performed using neural networks in the scikit-learn library.
Education
Erasmus Mundus Joint Master's Degree in Image Processing and Computer Vision
Universite de Bordeaux - Bordeaux, France
Erasmus Mundus Joint Master's Degree in Image Processing and Computer Vision
Universidad Autonoma de Madrid - Madrid, Spain
Erasmus Mundus Joint Master's Degree in Image Processing and Computer Vision
Pazmany Peter Catholic University - Budapest, Hungary
Bachelor's Degree in Electrical Engineering and Computer Science
University of Belgrade, School of Electrical Engineering - Belgrade, Serbia
Certifications
Deep Learning Specialization
Coursera
Machine Learning Course
Coursera
Skills
Libraries/APIs
NumPy, PyTorch, Keras, Pandas, TensorFlow, OpenCV
Tools
Azure Machine Learning
Languages
Python, C++, Java
Platforms
Windows, Visual Studio Code (VS Code), Linux
Paradigms
Object-oriented Programming (OOP), Variational Methods
Industry Expertise
Bioinformatics
Other
Computer Vision, Deep Learning, Semantic Segmentation, Image Analysis, Machine Learning, Image Processing, 3D Reconstruction, Probability Theory, Statistics, Bayesian Statistics, Artificial Intelligence (AI), Electrical Engineering, Stochastic Modeling, Data Science, Medical Imaging, FPGA, Video Processing, Object Detection, Generative Adversarial Networks (GANs), 3D Image Processing, Image Reconstruction, Natural Language Processing (NLP), Recurrent Neural Networks (RNNs), BERT, Transformer Models, Object Tracking, Models, Deep Neural Networks (DNNs), Mixed Reality (MR), Convolutional Neural Networks (CNNs), Generative Pre-trained Transformers (GPT), Graphics Processing Unit (GPU), Point Clouds, Optical Character Recognition (OCR)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring