Filip Boltuzic, Developer in Zagreb, Croatia
Filip is available for hire
Hire Filip

Filip Boltuzic

Verified Expert  in Engineering

Bio

Filip is a machine learning engineer with several years of professional experience. He's worked on large-scale problems at Amazon Web Services as a software developer and built natural language processing models as a research associate at the University of Zagreb. Filip's main interests are machine learning and natural language processing, with an emphasis on building text classification models.

Portfolio

Online freelance agency
Machine Learning, Supervised Machine Learning, Reinforcement Learning...
Inflexion
ChatGPT, Haystack, Natural Language Processing (NLP), Machine Learning...
Agnostiq
Machine Learning, Python, DevOps, Data Science, Amazon Web Services (AWS)...

Experience

  • Python - 10 years
  • Linux - 7 years
  • Natural Language Processing (NLP) - 6 years
  • Scikit-learn - 6 years
  • Data Science - 6 years
  • Machine Learning - 6 years
  • Generative Pre-trained Transformers (GPT) - 3 years
  • NumPy - 3 years

Availability

Part-time

Preferred Environment

Java, Git, Linux, Docker, Apache Solr, Django, PyTorch, Pandas, NumPy, Scikit-learn, Python

The most amazing...

...machine learning model I've developed was an LSTM and CRF model to segment text into argumentative claims as part of my Ph.D. thesis.

Work Experience

Research Advisor

2022 - PRESENT
Online freelance agency
  • Investigated, researched and documented caching methods in software.
  • Reproduced the most popular caching methods for predicting time-to-live from research papers.
  • Built a simulator and reinforcement learning model which tries to solve TTL prediction for object caching.
Technologies: Machine Learning, Supervised Machine Learning, Reinforcement Learning, Deep Reinforcement Learning, Data Science, NumPy

RAG/GPT4 Expert

2024 - 2024
Inflexion
  • Improved an existing RAG-based tool to help the team search internal documentation more efficiently.
  • Built a document processing system for different types of content (emails, DOCX attachments, Excel spreadsheets, etc).
  • Utilized RAG to implement the 1st version of multiple-question answering.
Technologies: ChatGPT, Haystack, Natural Language Processing (NLP), Machine Learning, OpenAI GPT-4 API, Azure, Amazon Web Services (AWS), Retrieval-augmented Generation (RAG), Generative Pre-trained Transformers (GPT)

Technical Blog Writer

2023 - 2024
Agnostiq
  • Wrote several technical blogs on various topics such as machine learning, quantum computing, cloud computing, and large language models.
  • Implemented reproducible workflows across three cloud providers: AWS, Google Cloud, and Azure.
  • Contributed to the open source workflow covalent library.
Technologies: Machine Learning, Python, DevOps, Data Science, Amazon Web Services (AWS), Azure, Google Cloud Platform (GCP)

AI and ML Developer

2023 - 2023
Aggieland Software
  • Developed a large language model (LLM) LangChain bot to generate software requirements.
  • Built and deployed to the cloud a multi-process application exposed via an API that can chat with a user to generate software requirements.
  • Collaborated with two teams to integrate the LLM app via APIs to provide both web and mobile application access to the LLM app.
Technologies: Artificial Intelligence (AI), Machine Learning, Azure Machine Learning, Large Language Models (LLMs), Llama 2, FastAPI, LangChain

AI Expert

2023 - 2023
PD4 Solutions LLC
  • Developed an LLM-based solution to determine which scientific articles are related to user-inputted free-text criteria.
  • Evaluated the LLM solution performance and demonstrated metrics proving considerable improvement over the previously implemented solution.
  • Worked with ML engineers to deploy solutions and define an optimal architecture for applying the LLM solution.
Technologies: Artificial Intelligence (AI), Machine Learning, Python, Natural Language Processing (NLP), Language Models, Text Classification, Unsupervised Learning, LangChain, Amazon Web Services (AWS), Git, Generative Pre-trained Transformers (GPT), Text Generation

Senior Data Scientist

2021 - 2023
Freelance for Lionbridge (via Newfire Global Partners)
  • Developed a machine learning sequence labeling model on text data that achieved above 0.9 F1 score.
  • Decreased inference time on a previously developed machine learning model without sacrificing their F1 score.
  • Used PySpark and Databricks to perform a large-scale data analysis that the company employed to drive future business decisions.
  • Developed multiple highly scalable Python web services that are currently serving production traffic.
Technologies: Python, Agile, Scrum, Web Services, JSON, PyTorch, SpaCy, Natural Language Toolkit (NLTK), PySpark, Jupyter, Databricks, Open Neural Network Exchange (ONNX), Neural Networks, LSTM, Pandas, Data Science, NumPy, Git, Natural Language Processing (NLP), Data Analysis, Azure Databricks

Data Science Engineer

2022 - 2022
BJS
  • Developed prototype product recommenders which showed customer purchasing patterns.
  • Built simple AWS Lambda functions to conduct an ETL workflow.
  • Worked with PySpark on large sets of data (>100GB of historical purchases).
Technologies: Python, Machine Learning, Spark ML, Scikit-learn, PySpark, Amazon Web Services (AWS), Git

Machine Learning Engineer

2020 - 2021
Alchemy V Ltd (via Toptal)
  • Created a marketing slogan text generator using Hugging Face transformers/text generation pipelines and customer-provided data.
  • Created a data ingestion and reporting process via multiple Google Cloud services: BigQuery, Cloud Functions, Cloud Endpoints, and Dataproc.
  • Ported existing R reporting code to a Python web service.
Technologies: Google Cloud, Google Cloud API, Google BigQuery, R, Python, Text Generation, SQL, Git

Natural Language Processing (NLP) Consultant

2020 - 2021
Granville Knowledge Management (via Toptal)
  • Developed a scraper to download a large (around 20,000) and diverse legal documents (1990 until today) from a European public repository.
  • Used machine learning to build a text classification model to automatically classify categories based on document content.
  • Created a dataset of legal documents and used it to train and evaluate the built machine learning text classification model. Shared results via Google collab such that customers can interactively try the model performance with their held-out data.
Technologies: Python, Scrapy, Web Scraping, PyTorch, Jupyter, Google Colaboratory (Colab), Text Classification, Natural Language Processing (NLP)

Research Associate

2018 - 2020
TakeLab at the University of Zagreb
  • Developed a search engine for Croatian legal documents.
  • Built a named entity recognition model in PyTorch by combining LSTM with a CRF.
  • Mentored several students doing intern projects and wrote my master thesis on natural language processing.
Technologies: Scikit-learn, PyTorch, Apache Solr, Django, Python, Torch, Pandas, Data Science, Git, Natural Language Processing (NLP)

Software Development Engineer

2014 - 2017
Amazon Web Services (AWS)
  • Contributed to developing a scalable time-series database solution in Java and C++, which served around 1 million requests/second.
  • Served as the team scrum master and product owner.
  • Designed and implemented a network correlation engine microservice to handle networking events from the entire Amazon network (patent award https://patents.justia.com/inventor/filip-boltuzic).
Technologies: Amazon Web Services (AWS), C++, Python, Java, Algorithms, Programming, Agile, Git, Web Services

Business Intelligence Analyst

2012 - 2014
Zagrebacka banka Unicredit Group
  • Developed SQL reports to determine the promising retail strategies in a data warehouse.
  • Built an interactive tool in Java to speed up the processes in Oracle Data Integrator.
  • Developed small web applications for the accounting department, using PL/SQL and Oracle Apex.
Technologies: Java, SQL, Data Science

Search Engine for Croatian Legal Documents

A Django and Apache Solr web application.

I was the lead developer on this project and proposed the system's architecture as a set of microservices. The documents were stored and indexed in Solr, whereas the Django front end served requests and communicated with Solr.

Retail Sale Forecasting

The project was to design a model to predict sale amounts based on historical data of orders, previous sales, and regions. The forecasting was done on a regional and global level and acted as a time series prediction matter. I experimented with several time-series prediction techniques such as ARIMA and SARIMA models.

Ulpian

http://ulpian.eu
Developed a ChatGPT-like tool for legal professionals in the EU. It allows users to get an answer to any legal question with source citations. It draws from domestic official legal documents, EU laws, and domestic and EU case law.

The tool is currently under development as part of a startup named Ulpian.
2012 - 2020

Ph.D. Degree in Natural Language Processing

University of Zagreb - Zagreb, Croatia

2010 - 2012

Master's Degree in Computer Science

University of Zagreb - Zagreb, Croatia

2010 - 2011

Erasmus Exchange Study in Computer Science

KTH Royal Institute of Technology - Stockholm, Sweden

2007 - 2010

Bachelor's Degree in Computer Science

University of Zagreb - Zagreb, Croatia

NOVEMBER 2017 - PRESENT

Convolutional Neural Networks

Coursera

Libraries/APIs

Scikit-learn, NumPy, Pandas, PyTorch, Google Cloud API, SpaCy, Natural Language Toolkit (NLTK), PySpark, LSTM, Spark ML

Tools

Vim Text Editor, Solr, Apache Solr, Git, Oh My Zsh, Boto, Jupyter, Open Neural Network Exchange (ONNX), LaTeX, ARIMA, Azure Machine Learning, ChatGPT, Haystack

Languages

Python, SQL, Haskell, Java, C++, R

Platforms

Amazon Web Services (AWS), Linux, Docker, Databricks, SolrCloud, Azure, Google Cloud Platform (GCP)

Frameworks

Django, Scrapy, Streamlit

Paradigms

Anomaly Detection, Agile, Scrum, Business Intelligence (BI), DevOps

Storage

Elasticsearch, Google Cloud, JSON, PostgreSQL

Other

Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Machine Learning, Back-end, Data Science, OpenAI GPT-3 API, Data Analysis, Azure Databricks, Retrieval-augmented Generation (RAG), Clustering Algorithms, Clustering, Classification Algorithms, Text Classification, Torch, Web Scraping, Google Colaboratory (Colab), Google BigQuery, Text Generation, Web Services, Neural Networks, Research, Student Engagement, Supervised Machine Learning, Time Series, LangChain, OpenAI, Reinforcement Learning, Deep Reinforcement Learning, Algorithms, Programming, Heuristics, Optimization, Evolutionary Computation, Genetic Algorithms, Convolutional Neural Networks (CNNs), Sorting Algorithms, Pattern Recognition, Language Models, Unsupervised Learning, Big Data, Unstructured Data Analysis, Large Language Models (LLMs), Llama 2, FastAPI, Prompt Engineering, OpenAI GPT-4 API, Pinecone, FAISS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring