Senior Data Scientist
2021 - PRESENTFreelance for Lionbridge (via Newfire Global Partners)- Developed a machine learning sequence labeling model on text data that achieved above 0.9 F1 score.
- Decreased inference time on a previously developed machine learning model without sacrificing their F1 score.
- Used Pyspark and Databricks to perform a large-scale data analysis which the company employed to drive future business decisions.
- Developed multiple highly scalable Python web services that are currently serving production traffic.
Technologies: Python, Agile, Scrum, Web Services, JSON, PyTorch, SpaCy, NLTK, PySpark, Jupyter, Databricks, Open Neural Network Exchange (ONNX), Neural Networks, LSTM, PandasMachine Learning Engineer
2020 - 2021Alchemy V Ltd (via Toptal)- Created a marketing slogan text generator using Hugging Face transformers/text generation pipelines and customer-provided data.
- Created a data ingestion and reporting process via multiple Google Cloud services: BigQuery, Cloud Functions, Cloud Endpoints, and Dataproc.
- Ported existing R reporting code to a Python web service.
Technologies: Google Cloud, Google Cloud API, Google BigQuery, R, Python, Text Generation, SQLNatural Language Processing (NLP) Consultant
2020 - 2021Granville Knowledge Management (via Toptal)- Developed a scraper to download a large (around 20,000) and diverse legal documents (1990 until today) from a European public repository.
- Used machine learning to build a text classification model to automatically classify categories based on document content.
- Created a dataset of legal documents and used it to train and evaluate the built machine learning text classification model. Shared results via Google collab such that customers can interactively try the model performance with their held-out data.
Technologies: Python, Scrapy, Web Scraping, PyTorch, Jupyter, Google Colaboratory (Colab), Text ClassificationResearch Associate
2018 - 2020TakeLab at the University of Zagreb- Developed a search engine for Croatian legal documents.
- Built a named entity recognition model in PyTorch by combining LSTM with a CRF.
- Mentored several students doing intern projects and wrote my master thesis on natural language processing.
Technologies: Scikit-learn, PyTorch, Apache Solr, Django, Python, Torch, PandasSoftware Development Engineer
2014 - 2017Amazon Web Services (AWS)- Contributed to developing a scalable time-series database solution in Java and C++, which served around 1 million requests/second.
- Served as the team scrum master and product owner.
- Designed and implemented a network correlation engine microservice to handle networking events from the entire Amazon network (patent award https://patents.justia.com/inventor/filip-boltuzic).
Technologies: Amazon Web Services (AWS), C++, Python, JavaBusiness Intelligence Analyst
2012 - 2014Zagrebacka banka Unicredit Group- Developed SQL reports to determine the promising retail strategies in a data warehouse.
- Built an interactive tool in Java to speed up the processes in Oracle Data Integrator.
- Developed small web applications for the accounting department, using PL/SQL and Oracle Apex.
Technologies: Java, SQL