Milos is available for hire

Milos Grubjesic

Verified Expert in Engineering

Data Engineer and Developer

Location

Novi Sad, Vojvodina, Serbia

Toptal Member Since

November 17, 2015

Milos is a data scientist and engineer with 15+ years of experience in big data and machine learning with Python, Scala, Spark, and other data engineering technologies. He has delivered data solutions for Christie's Auction House, Syndigo (retail), car pricing predictions, fraud detection, commodities trading, and more. Milos's industry experience is backed by a master's degree in computer science.

Machine Learning Pandas Python Spark PyCharm Artificial Intelligence (AI)Data Analysis Data Engineering Statistical Analysis Time Series Analysis Predictive Analytics SQL Linux Databases ETL

Portfolio

PepsiCo

Python, Kubernetes, Pytest, Kubeflow, GitHub...

Databricks

Databricks, PySpark, QualysGuard, Jira REST API, GitHub...

GameAnalytics

Python, gRPC, Protobuf, Docker, GitHub, GitFlow, APIs, API Integration...

Experience

Machine Learning - 8 years Python - 6 years Data Science - 6 years Data Analysis - 6 years Data Engineering - 5 years Spark - 4 years Machine Learning Operations (MLOps) - 3 years Kubeflow - 2 years

Availability

Part-time

Preferred Environment

Databricks, Linux, Python, Spark, PyCharm, Docker, MacOS, Machine Learning Operations (MLOps)

The most amazing...

...complete system I've implemented fights scam and fake users on an online dating site, and catching some really bad guys felt really good.

Work Experience

Machine Learning Operations Engineer

2022 - PRESENT

PepsiCo

Migrated a machine learning project to Kubernetes and Kubeflow.
Set up a machine learning operations platform the eCommerce team uses in PepsiCo.
Worked on the migration of the machine learning product from AWS to Azure.

Technologies: Python, Kubernetes, Pytest, Kubeflow, GitHub, Machine Learning Operations (MLOps), Snowflake, PyCharm, Hydra, Kubernetes Operations (kOps), Nexus, Amazon Web Services (AWS), Datadog, KServe, MLMD, Azure, OpenAI GPT-4 API

Python Engineer

2022 - 2022

Databricks

Created a system for retrieving Qualys vulnerability scan reports for containers and DBS. The process involved extracting data from API, triaging data, and manipulating Jira tickets.
Created a system for extracting Github alerts through GraphQL API calls, merging with internal data, triaging, and cleaning data.
Implemented migration and updates to various Spark jobs.
Used Qualys API to run on-demand scans for container bundles, retrieved vulnerability data, and augmented it with internal data sources.

Technologies: Databricks, PySpark, QualysGuard, Jira REST API, GitHub, Vulnerability Management, Python, Pandas, APIs, REST APIs, GraphQL, Spark SQL, Data Pipelines, ETL

Python Back-end Engineer

2021 - 2022

GameAnalytics

Developed the API integrations for an analytics platform, including Unity Ads, AppsFlyer, and Adjust.
Created Docker images and integrated the services into a client's infrastructure.
Analyzed the retrieved data and extracted meaningful knowledge.
Created ETL data ingestion pipelines with Dagster.

Technologies: Python, gRPC, Protobuf, Docker, GitHub, GitFlow, APIs, API Integration, AppsFlyer, Adjust, Apache Avro, Pandas, Back-end, CI/CD Pipelines, REST APIs, Poetry, PyCharm, Wireshark, Dagster, Code Climate, Codecov, YAML, BigQuery, Linux, ETL, Python 3, Git, Data Pipelines, Test-driven Development (TDD), Amazon Web Services (AWS), Containerization, Containers

Scala and Spark Developer

2020 - 2021

Syndigo

Developed the data pipelines for an enterprise client to process large amounts of data daily in Scala and Spark.
Created hundreds of notebooks on Azure Databricks and set up the ETL process to clean, deduplicate, and aggregate data using Scala and Spark.
Built a custom Python framework to quickly update notebook batches, saving the client a lot of time and money.

Technologies: Scala, User-defined Functions (UDF), Azure, Databricks, Python, ETL, Data Science, Databases, Data Engineering, Spark, Data Analysis, Azure Data Lake, Artificial Intelligence (AI), Spark SQL, Back-end, IntelliJ IDEA, Business Intelligence (BI), Data, Azure Databricks

Machine Learning and Machine Learning Operations Engineer

2019 - 2020

AlgoDriven

Created various ML models for a used car dealership application used through GCC countries, including Saudi Arabia, Kuwait, the United Arab Emirates, Qatar, Bahrain, and Oman.
Deployed various ML models to production, ensuring services were scalable and not interrupted by updates and fixes.
Created pipelines for automated data retrieving, processing, cleaning, deduplicating, augmenting, model building, validating, and deploying to production.

Technologies: Docker, Python, Git, GitHub, SQL, Scikit-learn, Machine Learning, Predictive Analytics, Artificial Intelligence (AI), Data Science, Data Analysis, Databases, Data Engineering, Predictive Modeling, Statistical Analysis, MySQL, Pandas, NumPy, Python 3, Amazon EC2, PyCharm, REST APIs, Back-end, Data Pipelines, Data Visualization, Jupyter Notebook, Regression, Amazon Web Services (AWS), Data, Machine Learning Operations (MLOps), Kubernetes

Data Scientist (Freelance)

2018 - 2019

Jaumo

Organized an administration team and created an automated system for Jaumo, a popular online dating system, to quickly identify and deal with threats to the platform, such as fake users and scams.
Created and defined a machine learning operations process.
Analyzed large amounts of data and gained insights from business owners.
Implemented metrics for estimating a fake user ratio, developed an artificial user classifier, and introduced local interpretability modeling. All this was executed in the Python ecosystem.

Technologies: Python, Apache Cassandra, Jupyter Notebook, SQL, Pandas, Scikit-learn, Data Science, Databases, Data Analysis, Artificial Intelligence (AI), Statistical Analysis, Linux, ETL, Python 3, Jupyter, NoSQL, Git, PyCharm, APIs, SciPy, REST APIs, Back-end, Data Pipelines, Data Visualization, Classification, Regression, Amazon Web Services (AWS), Analytics, Machine Learning Operations (MLOps)

Data Scientist

2016 - 2017

Christie's (Freelance)

Analyzed fine art data and gained insights to develop algorithms in the Python ecosystem for this famous auction house.
Developed algorithms, such as an artist's index, popularity index, demand index, and fine art comparables.
Enabled matchmaking for artists, customer analyses, recommendations, artwork collection value estimation, and more.
Leveraged data from multiple sources to help the marketing team find new customers.

Technologies: Python, Flask, Docker, Git, SQL, Pandas, Data Science, Databases, Data Engineering, Data Analysis, Artificial Intelligence (AI), Statistical Analysis, Linux, ETL, Python 3, Jupyter, PyCharm, REST APIs, Back-end, Jupyter Notebook, Classification, Regression, Amazon Web Services (AWS)

Java Software Developer

2006 - 2009

Custom Software and IT Services Companies

Implemented Java web applications and conducted white hat penetration testing on one of them.
Handled a complex Java application related to diseases for Danish customers and tested it under a high load of requests.
Maintained Linux machines as a junior administrator.
Sniffed out the network traffic, used various tools to collect passwords, and immediately informed IT support about the findings to strengthen security.

Technologies: Linux, Web Applications, Algorithms, Penetration Testing, SQL, MySQL, PostgreSQL

Experience

Data Scientist | Predictive Analytics

A project for commodity market predictive analytics, including time series forecasting and statistical measures and analysis. I implemented classifiers and regressions to identify the best trading strategies and for backtesting.

Data Scientist | Linear TV Viewership Forecasts

https://videoamp.com/

I predicted future TV viewership based on extended research regarding forecasting techniques, including a classical statistical approach versus deep learning. I collaborated with the rest of the data science team to create an accurate Oscar prediction algorithm used to create an accurate Oscar 2018 forecast.

Press release: Globenewswire.com/en/news-release/2018/03/05/1415186/0/en/VideoAmp-s-Oscar-Prediction-Algorithm-Proves-Accurate.html

Python Library for Kubefow Pipelines

I co-authored a Python library (Prometej, Serbian for Prometheus) that drastically increased the data science team's efficiency. This library enables users to quickly implement and run Kubeflow pipelines from their local machines, reuse components, and introduce tests and good practices during pipeline creation.

Skills

Languages

Python, Python 3, Scala, SQL, R, YAML, GraphQL, Snowflake

Frameworks

Spark, Flask, ASM, Apache Spark, gRPC, Hydra

Libraries/APIs

Pandas, NumPy, SciPy, Scikit-learn, Caret, REST APIs, Spark ML, Protobuf, PySpark, Jira REST API, CatBoost

Tools

PyCharm, Jupyter, Spark SQL, IntelliJ IDEA, Git, GitHub, Apache Avro, Wireshark, Code Climate, Codecov, BigQuery, Pytest, Sentry, Amazon SageMaker, AWS Glue, Amazon Athena, Azure Machine Learning

Other

Machine Learning, Back-end, APIs, Regression, Classification, Artificial Intelligence (AI), Time Series Analysis, Predictive Analytics, Data Analysis, Data Engineering, Algorithms, Statistical Analysis, Data, Data Visualization, Predictive Learning, Predictive Modeling, Data Modeling, Data Mining, Time Series, Forecasting, Neural Networks, User-defined Functions (UDF), Apache Cassandra, Computer Science, Big Data, Web Applications, Azure Data Lake, GitFlow, API Integration, Adjust, CI/CD Pipelines, Poetry, Dagster, Vulnerability Management, Containerization, Containers, Analytics, Machine Learning Operations (MLOps), Kubernetes Operations (kOps), Amazon Machine Learning, Azure Databricks, KServe, MLMD, OpenAI GPT-4 API

Paradigms

ETL, Data Science, Functional Programming, Aspect-oriented Programming, Penetration Testing, Test-driven Development (TDD), Business Intelligence (BI)

Platforms

Jupyter Notebook, Amazon EC2, Linux, Docker, Amazon Web Services (AWS), Azure, Databricks, RStudio, AppsFlyer, QualysGuard, Kubernetes, Kubeflow, Nexus, MacOS

Storage

Databases, Data Pipelines, NoSQL, PostgreSQL, MySQL, MongoDB, Datadog

Education

1998 - 2005

Master's Degree in Computer Science

Faculty of Technical Sciences - Novi Sad, Serbia

Certifications

MARCH 2018 - PRESENT

Deep Learning Nanodegree

Udacity

AUGUST 2015 - PRESENT

Scalable Machine Learning

EdX

JULY 2013 - PRESENT

Machine Learning

Coursera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring