Faruk Pasalic, Developer in Sarajevo, Federation of Bosnia and Herzegovina, Bosnia and Herzegovina
Faruk is available for hire
Hire Faruk

Faruk Pasalic

Verified Expert  in Engineering

Software Developer

Sarajevo, Federation of Bosnia and Herzegovina, Bosnia and Herzegovina

Toptal member since March 11, 2021

Bio

Faruk is a software developer with over 20 years of experience, specializing in Python back-end development. His expertise spans big data technologies like Hadoop and Spark, PostgreSQL, C, and C++. Faruk also has years of experience in machine learning and a hobby-level interest in firmware and hardware design, bridging software and hardware systems.

Portfolio

Freelance Job
Python 3, Milvus, OpenAI API, Retrieval-augmented Generation (RAG), PDF, OpenAI...
RiskFinTech Ltd
Python, Pandas, SciPy, NumPy, Jupiter, Matplotlib, Amazon Web Services (AWS)...
Freelance Clients
Django, React, REST, APIs, HTML, JavaScript, Django REST Framework

Experience

  • Computer Science - 15 years
  • Linux - 10 years
  • GitHub - 10 years
  • REST APIs - 8 years
  • PyCharm - 5 years
  • Data Science - 5 years
  • Python - 4 years
  • Pandas - 4 years

Availability

Part-time

Preferred Environment

Linux, PyCharm, Git, TensorFlow, Keras, Python, Embedded Systems, Embedded Hardware

The most amazing...

...work I've done was designing and implementing a robust system for PDF parsing, text extraction, and layout understanding using machine learning techniques.

Work Experience

Full-stack Developer and AI Developer

2024 - 2024
Freelance Job
  • Implemented text extraction from PDF documents related to laws using ML clustering algorithms, extract text paragraphs, titles, and subtitles.
  • Used OpenAI API, converted paragraphs, titles, and sections into vector embeddings, and stored them in the Milvus vector database.
  • Implemented a search over the Milvus database to retrieve paragraphs and references similar to those in the PDF files.
  • Implemented a simple web app to retrieve search results and download documents.
Technologies: Python 3, Milvus, OpenAI API, Retrieval-augmented Generation (RAG), PDF, OpenAI, Prompt Engineering, OpenAI GPT-3 API

Python Developer and Data Engineer (via Toptal)

2023 - 2024
RiskFinTech Ltd
  • Updated the core library for new features and fixed current issues.
  • Maintained application updates, deployments, and environments.
  • Wrote user documentation and developer documentation, including architecture diagrams and process charts, among others.
  • Discussed new features with owners based on clients' feedback.
Technologies: Python, Pandas, SciPy, NumPy, Jupiter, Matplotlib, Amazon Web Services (AWS), ETL, Java, Scraping, Data Analysis, Data Analytics

Senior Developer

2023 - 2023
Freelance Clients
  • Developed a calendar application with specific requirements. Used Django for a production-ready project. Learned Django quickly using my prior knowledge of web services with other platforms, specifically Spring Boot and Java Play.
  • Created a calendar application from scratch using React and Django.
  • Increased my knowledge of JavaScript and Python. Used Python previously on projects regarding ML and AI.
Technologies: Django, React, REST, APIs, HTML, JavaScript, Django REST Framework

Embedded Developer

2023 - 2023
Fox Montgomery Limited
  • Investigated livestock scales and how to connect them to the cloud.
  • Researched RS232 to TTL converters and RFID readers suitable for the project.
  • Wrote simple code in Python to connect sensors to the Raspberry Pi and sent data to the Azure cloud.
Technologies: C++, Embedded Systems, Embedded Hardware, Internet of Things (IoT), Hardware Design, Embedded Development, Embedded C, Embedded Software, Technical Consulting, System Service & Hardware Control, ESP32, Microcontroller Programming, I2C, MicroPython

Python/Data Engineer Developer

2022 - 2023
RiskFinTech Ltd
  • Implemented new features for the Jupyer Notebook application.
  • Fixed bugs in Jupyter Notebook and Java applications.
  • Worked on the design and refactoring of client applications.
Technologies: Python, Pandas, SciPy, NumPy, Jupiter, Matplotlib, Amazon Web Services (AWS), ETL, Java

Python Developer and Data Engineer

2022 - 2022
RiskFinTech Ltd
  • Developed a transformation engine for processing financial portfolios and creating different kinds of reports.
  • Wrote or fixed forecasting models based on documentation.
  • Conducted demonstrations of the application for the clients and members of the company.
Technologies: Python, Pandas, SciPy, NumPy, Jupiter, Matplotlib, Amazon Web Services (AWS), ETL, Java, Algorithms, Agile, Scrum, GitHub

Python Developer and Data Engineer

2021 - 2022
RiskFinTech Ltd
  • Created a transformation engine for processing financial portfolios and writing various reports.
  • Converted financial rules to code or rules to be run on an internal transformation engine.
  • Wrote or fixed forecasting models based on documentation.
  • Conducted application demonstrations for the clients and members of the company.
Technologies: Python, Pandas, SciPy, NumPy, Jupiter, Matplotlib, Amazon Web Services (AWS), ETL, Java, Algorithms, Agile, Scrum, GitHub

Lead Software Engineer

2019 - 2021
Atlantbh
  • Designed and developed an ingestion system of a large amount of data from XML files to the Spark data frames.
  • Optimized Spark jobs for better performance.
  • Orchestrated a Spark jobs execution order for decreasing time consumption.
  • Exported data from Spark to big XML files in a user-defined format.
  • Mentored new members of the team and organized tasks for new members.
Technologies: Java, Spark, RabbitMQ, Jenkins, OpenShift, Linux, IntelliJ IDEA, Git, Computer Science, PostgreSQL, REST, Amazon S3 (AWS S3), APIs, Back-end, Spring, JSON, XML, SQL, Apache Kafka, Spring Boot, Kubernetes, MinIO, Data Structures, Amazon Web Services (AWS), Big Data, REST APIs, Multithreading, Hibernate, Algorithms, Docker, Agile, Scrum, GitHub, Transmission Control Protocol (TCP), UDP, Networking

Machine Learning Engineer

2016 - 2018
Atlantbh
  • Designed new features for the in-house built product with machine learning algorithms using Python, TensorFlow 1x, TensorFlow 2, and TensorFlow Serving.
  • Recorded deduplication and matching using an unsupervised learning algorithm. Written in Python, distributed processing, and auto-scaling capabilities using AWS.
  • Structured data extraction from unstructured text. Extracted addresses out of HTML or text. Implemented as an NLP algorithm, LSTM neural network using TensorFlow and deployed using TensorFlow Serving.
  • Created text classification using NLP and supervised learning algorithms. Implemented text classification with LSTM recurrent network in Python and TensorFlow.
  • Categorized websites using NLP and supervised learning algorithms. Categorization is done using Python and DNNs with word embeddings in the background.
  • Mentored company interns in machine learning and data science. Used Python, TensorFlow, and Keras for the project. Some projects included expression detection using CNN, a spell checker for the Bosnian language, and driver-level prediction.
Technologies: Python 3, PyCharm, Linux, Amazon Web Services (AWS), Hadoop, TensorFlow, Java, Convolutional Neural Networks (CNNs), Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Machine Learning, Neural Networks, Git, Mathematics, Computer Science, PostgreSQL, TensorBoard, Amazon S3 (AWS S3), Amazon EC2, Data Science, Python, Pandas, Matplotlib, Clustering Algorithms, Artificial Intelligence (AI), NumPy, Back-end, Spring, Deep Learning, JSON, Jupyter, Jupyter Notebook, Spring Boot, Kubernetes, Data Structures, Natural Language Toolkit (NLTK), Algorithms, Docker, Agile, Scrum, GitHub, Web Scraping, AI Programming, Data Scientist, Scikit-learn, Scraping, Website Data Scraping, Data Analysis, Data Analytics

Senior Software Engineer

2011 - 2016
Atlantbh
  • Developed an ingestion system for a big data platform.
  • Created ETL tools for data preprocessing in HDFS.
  • Built ORM tools for mapping between HDFS and Java.
  • Developed message-based communication between different systems on the same platform.
  • Configured an REST service to store and retrieve configurations for different subsystems.
  • Processed logs of real-time data collected using Flume and Scribe collectors. I also worked on MapReduce graph algorithms to connect logs from different stages.
Technologies: Java, Hadoop, RabbitMQ, REST, Eclipse IDE, JBoss, Apache Tomcat, HDFS, ETL, Linux, IntelliJ IDEA, Git, Computer Science, PostgreSQL, Amazon Web Services (AWS), Amazon EC2, APIs, Back-end, Spring, JSON, XML, SQL, Spring Boot, Data Structures, REST APIs, Multithreading, Hibernate, Algorithms, gRPC, Docker, Agile, Scrum, GitHub, Large Data Sets, Transmission Control Protocol (TCP), UDP

Senior Software Engineer

2009 - 2011
Atlantbh
  • Developed location-based services for a client and implemented communication between different subsystems, message storage, and processing facilities.
  • Processed POIs from the supplier's input files and normalized them into the client-specific format stored in HDFS. Implemented a tool for transforming different file formats (XML and JSON files) into a single file format.
  • Created tools for verifying input data based on the JBoss Drools engine.
Technologies: Java, Messaging, REST, JBoss, Linux, IntelliJ IDEA, Git, Computer Science, PostgreSQL, Amazon S3 (AWS S3), Amazon EC2, APIs, Back-end, Spring, JSON, XML, SQL, Spring Boot, Data Structures, Esri, GIS, PostGIS, Amazon Web Services (AWS), Big Data, REST APIs, Multithreading, Hibernate, Redis, C++, Algorithms, Docker, Agile, Scrum, GitHub, Large Data Sets, Beautiful Soup

Software Engineer

2009 - 2009
Freelance
  • Developed a desktop application for processing JPEG images created by a police radar system.
  • Extracted JPEG metadata and used OCR to detect the car's license plates captured on the image.
  • Created a report of the traffic offense in DOC and PDF formats.
Technologies: Computer Science, PostgreSQL, Eclipse IDE, Spring, Data Structures, Algorithms, Agile, Scrum, GitHub, Selenium, Beautiful Soup, JPEG, Optical Character Recognition (OCR)

Software Engineer

2007 - 2009
Atlantbh
  • Implemented geocoding and reverse geocoding algorithms based on GIS data provided by the client.
  • Developed drawing maps out of GIS data provided by the client. Maps were partitioned into tiles. Created tile algorithms and produced and supervised tile rendering. Implemented on-demand tile rendering using the Decarta server.
  • Implemented routing algorithms and drew routes on maps based on GIS data provided by the client.
Technologies: Git, Computer Science, REST, APIs, Spring, JSON, XML, Data Structures, GIS, Esri, PostGIS, Hibernate, Algorithms, Agile, GitHub, Large Data Sets

Software Engineer

2005 - 2007
Atlantbh
  • Worked as a junior software developer on a system for delivering venue maps to the client. The maps contained POI data stored in a database. Charged with maintaining the codebase and adding new features.
  • Contributed to the library for importing different graphical and nongraphical formats such as DXF and PDF into the system and used imported data to create venue maps. Worked on detecting map parts based on object labels from the input file.
  • Created different output files (PDF, SWF) out of venue maps stored in the system.
Technologies: IntelliJ IDEA, Git, Computer Science, Eclipse IDE, APIs, Back-end, Data Structures, Image Processing, Selenium, Beautiful Soup, PDF

Experience

PDF Text Extraction Library

https://github.com/farukpasalic/pdfmage
PDFMage is a Python library designed to extract text from PDF files. It offers a simple and efficient way to parse PDF documents and retrieve their textual content. The library supports extraction from specific pages of a PDF, providing flexibility for users. One of the key features of PDFMage is its configurable interface, which allows users to fine-tune the extraction process according to their needs. This includes setting parameters for the DBSCAN algorithm used in clustering words and columns.

Additionally, PDFMage includes debugging options that create visualizations of the extraction process, aiding in understanding and troubleshooting. The library also provides a Config class for customizing various aspects of the text extraction process, such as the path for output storage and the colors used in debug images. Overall, PDFMage is a comprehensive tool for PDF text extraction, offering high accuracy and customization options to cater to various user requirements.

Web Scraping Library - Python

https://github.com/farukpasalic/skrap
I developed a configurable web scraping library. It is simple to use and provides several features for extracting data from websites. It supports different loading mechanisms (Python requests, Wb drivers). It can be configured to scrape a single article from the web, a list, or tables. It can also be configured to limit the number of elements to scrape and support scraping paginated data.

Data Validation, Forecasting and Reporting Engine

I worked as a senior software developer and architect on the project. My role was to discuss, develop, and maintain features for the data validation, forecasting, and reporting application. The system consists of two applications, one in Python, which is my main application. I also helped align features and minor code changes on another application.

Livestock Scales

I worked mostly as a consultant for a developing livestock scale. My main role was to investigate options for scales and RFIDs and to create the project's architecture. I investigated different hardware for solutions.

Machine Learning and Data Science Project

I developed an address extraction module involving data crawling, labeling, and preprocessing.

I also utilized CNN, RNN models, and multilayer LSTM on synthetic datasets from crawled websites and integrated features into the system.

I then implemented business name extraction using the Smith-Waterman algorithm, combining URL, title, and website content. I also created a website classification module using TensorFlow, with an LSTM-based classifier showing optimal performance among various models. Designing a profanity filter for the English language using the LSTM network with character-level and word-level embeddings demonstrated generalization even with distorted or misspelled words.

Java and Spark Project

As a senior engineer, I led a Java and Spark project focused on ingestion of data from XML/JSON files into Spark, transformation of data for custom reports, and interfacing with external services.

My responsibilities included implementing Spark jobs, maintaining REST API, and documentation.

Django and React Project

I led a small-scale project involving Python, Django, and React to develop a customizable calendar application.
My responsibilities encompassed creating custom functions, implementing a unique color scheme, and integrating other personalized features into the application.

Web Scraping Project

I managed a web scraping project involving the extraction of textual and tabular data from websites. I then utilized PhantomJS, Selenium, and web drivers for data retrieval, processing, and persistence for subsequent analysis.

Education

1998 - 2003

Bachelor's Degree in Mathematics and Computer Science

University of Sarajevo - Sarajevo, Bosnia and Herzegovina

1995 - 1998

High School Diploma in Mathematics and Computer Science

Gymnasium Bosanska Krupa - Bosanska Krupa, Bosnia and Herzegovina

Skills

Libraries/APIs

TensorFlow, Pandas, Matplotlib, NumPy, REST APIs, Scikit-learn, Beautiful Soup, Keras, Natural Language Toolkit (NLTK), SciPy, OpenCV, React, PyQt 5, WebDriver, OpenAI API

Tools

GitHub, PyCharm, RabbitMQ, Jupyter, GIS, Esri, IntelliJ IDEA, Bitbucket, Git, Jenkins, Eclipse IDE, Apache Tomcat, TensorBoard

Languages

Java, XML, Python 3, Python, SQL, Embedded C, HTML, JavaScript, C, C++, MicroPython, Scala

Paradigms

Agile, Scrum, REST, MapReduce, ETL

Storage

JSON, PostGIS, PostgreSQL, HDFS, Amazon S3 (AWS S3), MySQL, Redis

Frameworks

Spring, Spring Boot, Hibernate, Selenium, Django REST Framework, Hadoop, Spark, gRPC, Django

Platforms

Linux, Docker, Amazon Web Services (AWS), Arduino, Jupyter Notebook, OpenShift, JBoss, Amazon EC2, Apache Kafka, Kubernetes, Raspberry Pi 3 GPIO

Other

Data Science, Machine Learning, Neural Networks, Natural Language Processing (NLP), Data Structures, Generative Pre-trained Transformers (GPT), Mathematics, Computer Science, Clustering Algorithms, Artificial Intelligence (AI), APIs, Back-end, Deep Learning, MinIO, Arduino IDE, Image Processing, Big Data, Multithreading, Jupiter, Algorithms, Embedded Systems, Embedded Hardware, Embedded Development, Embedded Software, Microcontroller Programming, AI Programming, Large Data Sets, Data Scientist, Transmission Control Protocol (TCP), UDP, Scraping, Website Data Scraping, OpenAI, Prompt Engineering, OpenAI GPT-3 API, Computer Vision, Applied Mathematics, Messaging, Web Scraping, Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), ESP32, Scripting, Internet of Things (IoT), Hardware Design, Technical Consulting, System Service & Hardware Control, I2C, Networking, PDF, JPEG, Optical Character Recognition (OCR), GitHub Actions, Hugging Face, Large Language Models (LLMs), Data Analysis, Data Analytics, Milvus, Retrieval-augmented Generation (RAG)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring