Dawid Smoleń, Developer in Kraków, Poland
Dawid is available for hire
Hire Dawid

Dawid Smoleń

Bio

Dawid has successfully delivered 30+ machine learning projects, building scalable systems grounded in MLOps best practices. With deep expertise in cloud-native technologies, he automates the entire ML lifecycle, from data gathering to CI/CD pipelines and continuous training. Recently, Dawid has been applying this expertise to LLM-based solutions at scale, helping clients unlock cutting-edge AI capabilities.

Portfolio

Freelance
Python, Scikit-learn, Docker, Amazon Web Services (AWS)...
Sinch
Kubernetes, Google Kubernetes Engine (GKE), Docker, Helm, GitOps...
Toptal
Python, PyTorch, Machine Learning Algorithms, Machine Learning...

Experience

  • Scikit-learn - 10 years
  • Machine Learning Automation - 6 years
  • PyTorch - 6 years
  • Generative Pre-trained Transformers (GPT) - 6 years
  • CI/CD Pipelines - 5 years
  • Azure - 4 years
  • Kubeflow - 2 years
  • Agentic AI - 1 year

Preferred Environment

Python, Scikit-learn, PyTorch, Kubernetes, MongoDB, Cloud, OpenAI, Temporal Cloud

The most amazing...

...professional achievement was engineering a high-density model deployment platform that runs over 1,000 ML models in production.

Work Experience

ML Consultant | MLOps Engineer

2018 - PRESENT
Freelance
  • Deployed modeling services to Kubernetes clusters, Amazon EKS, and Google Kubernetes Engine (GKE).
  • Introduced tracking servers to the existing projects to improve the observability of a model and understanding of a problem.
  • Developed an end-to-end solution from data investigation to a deployed model that monitors daily statistics and business metrics regarding user experience in eCommerce.
  • Consulted an ECG-related company from Latin America. Helped with the design and implementation of crucial Holter analysis steps.
  • Prepared NFT market analysis tools based on machine learning traits valuation.
  • Prepared a deduplication service for a real estate website scraper.
  • Acted as a data science trainer for two training companies and conducted training for around seven teams from various enterprises.
Technologies: Python, Scikit-learn, Docker, Amazon Web Services (AWS), Digital Signal Processing, ECG, Training, Data Science, Jupyter Notebook, Artificial Intelligence (AI), Machine Learning, Regression Modeling, Classification Algorithms, Kubeflow, Kubernetes, CI/CD Pipelines, Machine Learning Operations (MLOps), Data Scraping, Data Engineering, Front-end, Data Analysis, Non-fungible Tokens (NFT), Retrieval-augmented Generation (RAG), Large Language Models (LLMs), Large Language Model Operations (LLMOps), SQL, MongoDB, APIs, AI Agents, Continuous Integration (CI), Machine Learning Automation, Azure, REST APIs, DevOps, Temporal Cloud, Events, FastAPI, Amazon EC2, Amazon S3 (AWS S3), ML Pipelines, Infrastructure as Code (IaC), Model Deployment, Monitoring, Observability Tools, DSP

MLOps Engineer

2022 - 2025
Sinch
  • Introduced the best MLOPS practices at Chatlayer, managing thousands of models in production. Maintained them and also significantly optimized the costs and speed.
  • Drove the adoption of AI solutions across multiple departments at Sinch, including automated campaign analytics, anomaly detection in messaging systems (email and SMS), system integration, and the development of unified standards.
  • Participated in a few LLM and agentic projects running at a large scale, transforming industries.
  • Worked with cutting-edge technologies, including GitOps and event-based architecture, as well as workflow automation using Temporal and Argo Workflows.
  • Migrated massive projects between popular cloud providers.
  • Created an anomaly detection and prediction system for the mailing industry. It processes 2-3 billion emails daily.
  • Enhanced observability by adding tools at multiple levels.
  • Implemented a custom high-density model deployment platform (a Kubernetes engine for 2,000 models).
Technologies: Kubernetes, Google Kubernetes Engine (GKE), Docker, Helm, GitOps, Large-scale Projects, Artificial Intelligence (AI), Data Engineering, Python, LangChain, Prompt Engineering, OpenAI, Amazon Web Services (AWS), SQL, MongoDB, Vector Databases, LangGraph, Generative Artificial Intelligence (GenAI), APIs, AI Agents, Continuous Integration (CI), Machine Learning Automation, Google Cloud, REST APIs, DevOps, Temporal Cloud, Argo Workflows, Events, FastAPI, Databricks, ML Pipelines, Terraform, Model Deployment, Monitoring, Observability Tools, Agentic AI

Machine Learning Consultant

2021 - 2022
Toptal
  • Introduced MLOps design patterns, including pipelines, observability tools, and monitoring solutions, and deployment to a Kubernetes cluster.
  • Built a PoC for an automatic real estate valuation system. Set up the codebase and pipelines, conducted research, and developed a few competitive prototypes.
  • Participated in establishing a team by hiring and training members who later took over the project.
Technologies: Python, PyTorch, Machine Learning Algorithms, Machine Learning, Amazon SageMaker, REST APIs, DevOps, FastAPI, ML Pipelines, Model Deployment, Monitoring, Observability Tools

Machine Learning Engineer

2020 - 2021
Grape Up
  • Developed an end-to-end deep learning automotive project together with full automation (CI, CD, and CT) and infrastructure. Worked on machine learning best practices using modern tools and solutions.
  • Created POCs and demos in machine learning and data science areas, together with simple UI demos and an API first approach.
  • Built a VIN recognition system. https://medium.com/grapeup/leveraging-ai-to-improve-vin-recognition-how-to-accelerate-and-automate-operations-in-the-12eac5286b1d.
  • Created a blog post. Building Intelligent Document Processing Systems: https://grapeup.com/blog/introduction-to-building-intelligent-document-processing-systems/.
Technologies: Azure, Convolutional Neural Networks (CNNs), Deep Learning, Dataiku, Python, Digital Signal Processing, DevOps, Machine Learning Operations (MLOps), REST APIs, React, Deep Neural Networks (DNNs), Metaflow, Data Science, Jupyter Notebook, Artificial Intelligence (AI), Predictive Modeling, Machine Learning, Amazon Web Services (AWS), Data Scraping, APIs, Continuous Integration (CI), Machine Learning Automation, ETL, PDF Scraping, Data Extraction, FastAPI, Amazon EC2, Amazon S3 (AWS S3), ML Pipelines, Model Deployment, Monitoring, Observability Tools

Deep Learning Engineer

2017 - 2018
Lekta
  • Created a library for users' intent classification that employs industry best practices to make predictions millions of times a month in a real-time, demanding environment.
  • Developed a novel speech recognition system based on state-of-the-art papers that beat the current market in some areas in terms of accuracy or performance.
  • Researched numerous topics in the areas of speech recognition, voice-based gender recognition, intent classification, sentence representation, and text representation.
  • Developed machine learning algorithms for both voice bots and chatbots.
Technologies: C++, Audio Processing, Digital Signal Processing, Python, PyTorch, TensorFlow, Deep Learning, Speech Recognition, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Scikit-learn, Data Science, Jupyter Notebook, Chatbots, Artificial Intelligence (AI), Predictive Modeling, Amazon Web Services (AWS), Machine Learning, Data Scraping, Data Engineering, Machine Learning Automation, ML Pipelines, Model Deployment, Monitoring, Observability Tools, DSP

Machine Learning Engineer

2016 - 2017
Aspel SA
  • Created a brand new QRS detector tested on many benchmarks and real-world monitoring tests.
  • Developed clustering algorithms that can efficiently cluster long Holter monitor tests, focusing on user experience.
  • Developed embedded resampling algorithms for ECG devices.
  • Contributed to QRS morphology classifiers that highly improved the work of doctors and met AMA standards.
  • Helped develop user experience-related algorithms that simplify the work of the doctors and technicians.
Technologies: C++, Python, MATLAB, Scikit-learn, SciPy, Artificial Intelligence (AI), Predictive Modeling, Classification Algorithms, Regression Modeling, Data Analysis, Model Deployment, Monitoring, Observability Tools, Anomaly Detection, DSP

NLP Engineer

2015 - 2016
WitKom – Virtual Translator of Sign Communication
  • Developed the first Polish to Polish Sign Language translation system on the language level.
  • Built the first Polish Sign Language to Polish translation system on the language level using Seq2Seq models.
  • Created huge artificial datasets for sign languages based on heuristics, rules, and DL technology.
Technologies: Python, Natural Language Toolkit (NLTK), Deep Learning, Sequence Models, TensorFlow, Data Analysis, Monitoring

Experience

Gomrade — Play Go Against AI on a Real, Physical Board

https://github.com/smolendawid/Gomrade
This repository allows you to play Go with strong AI on a real board. Gomrade analyses the board state from an image using a computer camera and answers the AI moves using a synthesized voice. The example video of Gomrade in action is under development.

Speech Representation and Exploration Notebook

https://www.kaggle.com/davids1992/speech-representation-and-data-exploration
This is one of the top 15 Kaggle notebooks ever, with more than 100,000 views. I introduced a few basic concepts about speech representation and performed data analysis looking for the most interesting examples from the dataset.

The Simplest Python Cache for Data Scientists

https://github.com/smolendawid/cacha
The simplest Python cache for data scientists.

Contrary to many other tools, cacha boasts the following features:

• It is used at the function call, not the definition. Many packages implement the @cache decorator that has to be used before the definition of a function that is not easy enough to use.
• It stores the cache on disk, which means you can use the cache between runs. This is convenient in data science work.

Drifting – The Most Flexible Drift Detection Server

https://github.com/sign-ai/drifting
The most flexible Drift Detection framework for everyone! Python-first, API-first, user-friendly, and open-source!

PYTHON-FIRST
Communicate with the Drift Detection server using a super simple Python client. No additional management needed!

EASY INTEGRATIONS
Using drifting is simple thanks to standardized, ML server-based integrations like Kafka, OpenAPI, and gRPC.

FLEXIBLE
One server for managing many models, projects, versions, and features without any further tools.

STATE-OF-THE-ART
An open-source project built upon the top-tier libraries—Alibi Detect, ML server, and more!

My blogging

https://signai.substack.com/
I am trying to share my thoughts that I find valuable or unique, on both Medium and Substack and Medium:
- https://medium.com/@smolendawid - migration of a few blogposts from my old personal blog
- https://signai.substack.com/ - more up-to-date blogposts.

Education

2019 - 2021

PhD in Electrical and Electronics Engineering

AGH University of Science and Technology - Cracow, Poland

2011 - 2016

Master's Degree in Acoustical Engineering

AGH University of Science and Technology - Cracow, Poland

Certifications

AUGUST 2021 - PRESENT

ML Practitioner

Dataiku

AUGUST 2021 - PRESENT

Core Designer

Dataiku

SEPTEMBER 2017 - PRESENT

Machine Learning

Coursera

SEPTEMBER 2017 - PRESENT

Neural Networks and Deep Learning

Coursera

SEPTEMBER 2017 - PRESENT

Structuring Machine Learning Projects

Coursera

SEPTEMBER 2017 - PRESENT

Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization

Coursera

Skills

Libraries/APIs

Scikit-learn, PyTorch, TensorFlow, REST APIs, Node.js, React, OpenCV, Keras, SciPy, Natural Language Toolkit (NLTK), Pandas

Tools

Observability Tools, Google Kubernetes Engine (GKE), MATLAB, Helm, Amazon SageMaker, Terraform

Languages

Python, C++, SQL

Paradigms

Continuous Integration (CI), DevOps, Anomaly Detection, ETL

Platforms

Azure, Kubeflow, Jupyter Notebook, Amazon Web Services (AWS), Kubernetes, Temporal Cloud, Dataiku, Docker, Cloud Native, Databricks, Amazon EC2

Storage

Google Cloud, Amazon S3 (AWS S3), MongoDB

Frameworks

Metaflow, LangGraph

Other

Machine Learning Automation, Audio Processing, Natural Language Processing (NLP), Deep Neural Networks (DNNs), Deep Learning, Machine Learning, Sequence Models, Machine Learning Operations (MLOps), ECG, Data Science, Artificial Intelligence (AI), CI/CD Pipelines, Generative Pre-trained Transformers (GPT), Large Language Model Operations (LLMOps), APIs, FastAPI, ML Pipelines, Model Deployment, Monitoring, Temporal, Workflows Orchestration, GitOps, Data Scraping, Data Engineering, Data Analysis, Large Language Models (LLMs), Prompt Engineering, OpenAI, Argo Workflows, Infrastructure as Code (IaC), Agentic AI, Acoustics, Digital Signal Processing, Speech Recognition, Convolutional Neural Networks (CNNs), Training, Chatbots, Predictive Modeling, Regression Modeling, Classification Algorithms, Large-scale Projects, Front-end, Non-fungible Tokens (NFT), Acoustical Engineering, Retrieval-augmented Generation (RAG), IT Project Management, Lecturing, LangChain, Vector Databases, Generative Artificial Intelligence (GenAI), AI Agents, Machine Learning Algorithms, Events, PDF Scraping, Data Extraction, DSP, Cloud

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring