Daniel Gafni, Developer in Kotor, Kotor Municipality, Montenegro
Daniel is available for hire
Hire Daniel

Daniel Gafni

Verified Expert  in Engineering

Machine Learning and Operations Developer

Location
Kotor, Kotor Municipality, Montenegro
Toptal Member Since
October 4, 2023

Daniel is a senior machine learning engineering and operations expert with extensive experience across a wide range of machine learning tasks. His expertise ranges from training large-scale custom transformer-based recommender systems, serving millions of users and items, to orchestrating production ML systems in cloud environments. Specializing in tabular data, deep learning, and MLOps, Daniel brings a comprehensive skill set that allows him to tackle complex machine learning challenges.

Portfolio

Generative Alpha
Terraform, Terragrunt, Argo CD, Kubernetes, Amazon EKS, GitLab CI/CD...
Sanas.ai
GitLab CI/CD, Python 3, Docker, Kubernetes, Ray, Dagster, Polars, PyTorch...
Toptal
GitHub, BigQuery, PyTorch, Docker, Kubernetes, PySpark, Polars, Dagster...

Experience

Availability

Part-time

Preferred Environment

Linux, PyCharm, GitLab, PyTorch, Kubernetes, PySpark, FastAPI, Dagster, Polars

The most amazing...

...project I've worked on was distributed model training and inference on millions of audio files for a feature store at sanas.ai, an AI company.

Work Experience

Senior MLOps Engineer

2024 - PRESENT
Generative Alpha
  • Managed multi-regional and multi-tier infrastructure with Terraform/Terragrunt.
  • Deployed to multiple Kubernetes clusters (EKS) with ArgoCD.
  • Backfilled historical crypto and stock data for a few years at the scale of hundreds of assets.
  • Built and deployed a strategy backtesting service using historical data.
  • Created real-time, low-latency streaming pipelines with Redis and Kubernetes for hundreds of crypto assets.
Technologies: Terraform, Terragrunt, Argo CD, Kubernetes, Amazon EKS, GitLab CI/CD, GitHub Actions, Litestar, REST APIs, Python 3, Docker, Redis, polars, Helm, Dagster

Senior Machine Learning Operations Consultant

2023 - PRESENT
Sanas.ai
  • Led the adoption of Machine Learning Operations (MLOps) practices at the company.
  • Created a standard cookiecutter template to start new projects.
  • Designed and implemented MLOps pipelines, including CI/CD, ETL for batch inference, and model training and fine-tuning on new clients' data using the data mesh architecture pattern.
  • Built docker-based CI/CD in GitLab, which took minutes to pass and was shared among 15+ projects.
  • Used Kubernetes, Dagster, and Ray to run neural networks on over 30 million audio files as part of the daily ETL process in a fast and cost-efficient way, with over a thousand pods running at a time, with exactly-once guarantees.
Technologies: GitLab CI/CD, Python 3, Docker, Kubernetes, Ray, Dagster, Polars, PyTorch, Hydra, Kubeflow, Training, Deep Learning, Big Data, Batch File Processing, Poetry, Machine Learning, Artificial Intelligence (AI), Deep Neural Networks, Ray.io, Distributed Training, Distributed Computing, Amazon Web Services (AWS), Cloud, Software Architecture, Bash Script, SQL, Technical Leadership, ETL, Data Modeling, Data Engineering, Data Science, Supervised Learning, Unit Testing, CI/CD Pipelines, Neural Networks

Senior Data Scientist

2022 - 2022
Toptal
  • Upgraded some projects to use modern tools like Poetry and Docker and containerized CI/CD.
  • Built and deployed a custom transformer neural network specifically designed for predicting talent job request acceptance using sequences of tabular data. The new model performed over 30% better compared to the previous CatBoost-based model.
  • Created a data platform for the data science team based on Dagster.
  • Led a few knowledge-sharing sessions on Python packaging using Poetry, ETL using Airflow and Dagster, and Polars. The team readily adopted these tools, resulting in a significant improvement in development speed.
  • Developed a FastAPI endpoint for similar talent recommendations.
  • Engineered a Streamlit web app that produces GPT-4-generated templates tailored to the company's needs.
Technologies: GitHub, BigQuery, PyTorch, Docker, Kubernetes, PySpark, Polars, Dagster, Machine Learning, Artificial Intelligence (AI), Deep Neural Networks, Cloud, Software Architecture, Bash Script, SQL, Time Series Analysis, Large Language Models (LLMs), ETL, Data Modeling, Data Engineering, Data Science, Supervised Learning, Classifier Development, Regression, Unit Testing, CI/CD Pipelines, Poetry, GitLab CI/CD, Neural Networks, Recommendation Systems, Machine Learning Operations (MLOps), Time Series, Google Cloud, LangChain

Middle Machine Learning Engineer

2020 - 2022
SberMarket
  • Engineered the core ML models for the Recommender System (RecSys), utilizing both classical ML techniques and deep learning methodologies, specifically RankNet with LambdaRank loss.
  • Leveraged PySpark on Dataproc, GitLab CI, Docker, Airflow, and Redis to develop the RecSys engineering core.
  • Built internal Python packages and implemented a CI pipeline to publish them to a private GitLab PyPI repository.
  • Created a collection of standard CI templates that were reused across multiple projects.
  • Designed the standard Python QA stack, including pre-commit hooks for formatters, linters, and package managers, Docker build templates, and a standardized project structure encompassing Python scripts, Airflow DAGs, and CI.
  • Developed an Airflow DAGs configuration system heavily using the XCOM feature.
  • Trained a custom TabNet-based recommender system on millions of users and items.
Technologies: GitLab CI/CD, Docker, Kubernetes, Ray, PyTorch, Polars, Recommendation Systems, Apache Airflow, Python 3, Poetry, Elasticsearch, CatBoost, LightGBM, Machine Learning, Artificial Intelligence (AI), Deep Neural Networks, Cloud, Software Architecture, Bash Script, SQL, Time Series Analysis, MLflow, Spark, Large Language Models (LLMs), ETL, Data Modeling, XGBoost, Data Engineering, Data Science, Supervised Learning, Classifier Development, Regression, Unit Testing, CI/CD Pipelines, Neural Networks, Machine Learning Operations (MLOps), Time Series, Google Cloud

dagster-polars

https://github.com/danielgafni/dagster-polars
Authored the Polars integration library for Dagster. This library gained community traction and eventually got merged into the main Dagster project. dagster-polars has been successfully used in production.

RaifHack DS - Real Estate Price Prediction

https://github.com/danielgafni/RAIFHACK
Winning RaifHack DS solution in the most popular vote category.

I utilized a custom neural network architecture of Siamese TabNet to predict commercial real estate prices based on similar residential real estate objects retrieved by FAISS.

Freak

https://github.com/danielgafni/freak
Freak is a Python package that allows interacting with the program's state remotely. You can define the state object as a Pydantic model and use Freak to expose it over HTTP. It supports nested models, partial updates, and data validation and uses FastAPI to run the web server. It can be useful to quickly set up control over long-running programs like bots or neural network training.
2020 - 2022

Master's Degree in Physics

Lomonosov Moscow State University - Moscow, Russia

2016 - 2020

Bachelor's Degree in Physics

Lomonosov Moscow State University - Moscow, Russia

Libraries/APIs

PyTorch, PySpark, XGBoost, CatBoost, Terragrunt, REST APIs

Tools

GitLab, GitHub, GitLab CI/CD, Apache Airflow, PyCharm, Slack, BigQuery, Terraform, Amazon EKS, Helm

Frameworks

Hydra, LightGBM, Spark, Litestar

Languages

Python 3, Python, Bash Script, SQL

Paradigms

Unit Testing, ETL, Data Science, Distributed Computing

Platforms

Linux, Docker, Kubernetes, Kubeflow, Amazon Web Services (AWS)

Storage

Google Cloud, Elasticsearch, Redis

Other

Dagster, Neural Networks, Polars, High Code Quality, CI/CD Pipelines, Poetry, Ray, Deep Learning, Machine Learning, Artificial Intelligence (AI), Deep Neural Networks, Ray.io, Machine Learning Operations (MLOps), Time Series, Time Series Analysis, Data Engineering, Supervised Learning, Classifier Development, Regression, FastAPI, Computer Vision, Recommendation Systems, Big Data, Batch File Processing, Cloud, Software Architecture, Data Modeling, Deep Reinforcement Learning, Training, GitHub Actions, Sphinx, Distributed Training, Technical Leadership, MLflow, Large Language Models (LLMs), FAISS, Processing & Threading, NixOS, LangChain, Argo CD, polars

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring