Andrei Apostol, Developer in Iași, Iași County, Romania
Andrei is available for hire
Hire Andrei

Andrei Apostol

Verified Expert  in Engineering

Artificial Intelligence Developer

Iași, Iași County, Romania
Toptal Member Since
September 1, 2022

Andrei studied computer science in his hometown in Romania and completed his master's degree in AI at the University of Amsterdam. He has accumulated practical experience in AI over years of training, developing data processing pipelines, and deployment. He is an engineer always looking forward to new challenges. Andrei also has academic experience through publishing two papers on neural pruning and quantization, which were well received by the academic community.


Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP)...
Python 3, NVIDIA Triton, FastAPI, Streamlit, Computer Vision, Object Detection...
Mantis NLP
ChatGPT, BERT, Natural Language Processing (NLP), Machine Learning...




Preferred Environment

Linux, Visual Studio Code (VS Code), Slack, Rocket.Chat, Google Cloud/Suite

The most amazing...

...thing I've accomplished is earning the Best Paper award for publishing my master's dissertation in BeneLearn 2020 about a novel pruning algorithm I developed.

Work Experience

Machine Learning Engineer

2021 - PRESENT
  • Built a custom and flexible BERT-like architecture for multi-class document classification and trained on data from various clients, obtaining 90-94% average accuracy.
  • Combined the traditional NLP augmentation model with the GPT-3 large language model to do data augmentation for clients, reducing the error rate by 50%.
  • Used Hugging Face datasets based on Apache Arrow to handle large volumes of data that normally would not fit in memory and implemented an efficient and replicable data processing pipeline with batching and multiprocessing.
  • Held regular client meetings, giving high-level overviews of our technical solution and explaining our core metrics.
Technologies: Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Hugging Face, Transformers, BERT, DVC, Docker, Docker Compose, FastAPI, Machine Learning Operations (MLOps), Language Models, Python, Software Engineering, Machine Learning, Artificial Intelligence (AI), Large Language Models (LLMs), Proof of Concept (POC), Minimum Viable Product (MVP), Natural Language Understanding (NLU), Pandas, Matplotlib, Contract, GitHub, Information Extraction, Open Neural Network Exchange (ONNX), Generative Artificial Intelligence (GenAI), PDF, ChatGPT, OpenAI GPT-3 API, OpenAI API, Data Visualization

Machine Learning Engineer

2020 - PRESENT
  • Trained a YOLOv5 object detection network for waste management and recycling, obtaining mean-average-precision scores of over 95% on over 40 classes of objects with a speed of over 200 frames per second on a conventional GPU.
  • Built an app around the object detection network using FastAPI to expose the endpoints and Streamlit for the UI, converting the network to the ONNX format after training for faster inference time.
  • Created a module for out-of-distribution detection using the CLIP pre-trained model, obtaining over 97% accuracy in the in-/out-of-distribution classification while allowing the class taxonomy to be changed without re-training.
  • Held regular meetings with project stakeholders, led demos and presentations for the client to provide estimates for the next milestones, and conducted sprint planning sessions, keeping track of progress.
  • Published the paper "Highlights of AI Research in Europe" in a special edition of the European Journal of AI, demonstrating that pruning and quantization can bring greater acceleration when used without sacrificing accuracy.
  • Implemented object tracking using the SORT algorithm with re-identification to follow the trajectories of objects over time. Obtained over 85% MOTA (multi-object tracking accuracy) and 80% IDF1.
Technologies: Python 3, NVIDIA Triton, FastAPI, Streamlit, Computer Vision, Object Detection, PyTorch, Docker, Docker Compose, Deep Learning, You Only Look Once (YOLO), Neural Network Pruning, Quantization, Detection Engineering, Open Neural Network Exchange (ONNX), Agile, Bash, Python, Software Engineering, Machine Learning, Artificial Intelligence (AI), Proof of Concept (POC), Minimum Viable Product (MVP), Pandas, Matplotlib, Contract, PyTorch Lightning, GitHub, OpenCV, NVIDIA TensorRT, Image Processing, Hugging Face, Generative Artificial Intelligence (GenAI), Data Visualization

Machine Learning Engineer

2023 - 2023
Mantis NLP
  • Designed a BERT-based architecture for medical research document tagging. Implemented an efficient and scalable training pipeline, able to process 15M+ documents in a matter of hours. Achieved a state-of-the-art micro-F1 score of 70%.
  • Created a user-facing application that allows for easy creation, deployment, and removal of machine learning models in production using AWS Sagemaker. Included monitoring, alerts, and a complete test suite to ensure quality and reliability.
  • Used Langchain with ChatGPT and FAISS. Created a personal assistant that could answer a user's questions based on their collection of notes. Performed prompt engineering to obtain better-quality responses.
  • Held close contact with key clients and stakeholders to ensure we were aligned on requirements and created the highest quality deliverables at all stages of the project.
Technologies: ChatGPT, BERT, Natural Language Processing (NLP), Machine Learning, OpenAI GPT-3 API, OpenAI GPT-4 API, LangChain, Chatbots, AWS IoT, Amazon SageMaker, FAISS, Generative Pre-trained Transformers (GPT), Generative Pre-trained Transformer 3 (GPT-3), Prompt Engineering, Amazon Web Services (AWS), OpenAI API, Data Visualization

Machine Learning Research Internship

2019 - 2020
  • Researched and built expertise in neural network pruning techniques as part of my master's dissertation.
  • Developed a novel pruning algorithm that obtains state-of-the-art results for high sparsity scenarios and other properties such as the ability to prune during training, computational tractability, and hyperparameter invariance.
  • Received the Best Paper award for writing a scientific paper around said algorithm and publishing it at the BeneLearn 2020 conference held in Belgium, Netherlands, and Luxembourg.
Technologies: PyTorch, TensorBoard, Python 3, Scientific Computing, Research, Computer Vision, Deep Neural Networks, Neural Network Pruning, Quantization, Docker, Docker Compose, Python, Software Engineering, Machine Learning, Artificial Intelligence (AI), Pandas, Matplotlib, GitHub, OpenCV, Image Processing, Data Visualization, TensorFlow

Data Scientist

2019 - 2019
  • Built a time series forecasting model with the SARIMA method, achieving a low mean square error for all predictions within the confidence bound.
  • Created an encoder/decoder gated recurrent unit network for document part classification, obtaining over 90% accuracy.
  • Deployed the trained models to production by exposing the core functionality via RESTful APIs and monitored the performance in production.
Technologies: Flask, Long Short-term Memory (LSTM), Time Series, ARIMA Models, REST, APIs, Data Science, Python, Software Engineering, Machine Learning, Artificial Intelligence (AI), Natural Language Understanding (NLU), Pandas, Matplotlib, GitHub, Information Extraction

Research Scientist Intern

2017 - 2017
  • Analyzed customer behavior on the platform and developed a random forest model with a high ROC-AUC score.
  • Handled large volumes of data using Apache Spark and created data processing pipelines to filter and prepare data using the Python and Scala APIs and Spark SQL.
  • Conducted A/B testing and integrated the resulting model into several Amazon sites.
Technologies: Python 3, Apache Spark, Zeppelin, Random Forests, Machine Learning, Spark SQL, Scikit-learn, Python, Software Engineering, Artificial Intelligence (AI), GitHub, Amazon Web Services (AWS)

FlipOut | Uncovering Redundant Weights via Sign Flipping
A neural network pruning method, which obtains state-of-the-art results in terms of accuracy and sparsity trade-off. It works by monitoring weights during training and removing weights that oscillate around the 0 value, under the assumption that oscillations represent a local optimum for those weights.

It can remove over 98% of the connections in common networks with little to no impact on accuracy, allowing for large speed gains. Compared to baselines from literature, this method can prune during training, is insensitive to the selection of hyperparameters, and allows for selecting the sparsity level directly.

I wrote a paper around this method and published it in BeneLearn 2020, obtaining the Best Paper award.
2018 - 2020

Master's Degree in Artificial Intelligence

University of Amsterdam - Amsterdam, The Netherlands

2015 - 2018

Bachelor's Degree in Computer Science

Alexandru Ioan Cuza University - Iași, Romania


Model Parallelism: Building and Deploying Large Neural Networks



IELTS Academic Certificate (Native Level)

British Council


PyTorch, Matplotlib, OpenCV, Scikit-learn, Pandas, PyTorch Lightning, TensorFlow, NumPy


TensorBoard, You Only Look Once (YOLO), Slack, GitHub, ChatGPT, Git, Docker Compose, Open Neural Network Exchange (ONNX), Spark SQL, Amazon SageMaker


Flask, Streamlit, Apache Spark


Python 3, Python, Bash, SQL


Visual Studio Code (VS Code), Rocket.Chat, Linux, Docker, Zeppelin, AWS IoT, Amazon Web Services (AWS)


Object-oriented Programming (OOP), Data Science, Agile, REST


Machine Learning, Deep Learning, Natural Language Processing (NLP), Computer Vision, Long Short-term Memory (LSTM), Deep Neural Networks, Neural Network Pruning, Quantization, FastAPI, Object Detection, Transformers, BERT, Classification, English, Google Cloud/Suite, Artificial Intelligence (AI), Proof of Concept (POC), Minimum Viable Product (MVP), Natural Language Understanding (NLU), Contract, Information Extraction, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Data Visualization, Statistics, Software Engineering, Scientific Data Analysis, Random Forests, APIs, Scientific Computing, Research, NVIDIA Triton, Detection Engineering, Hugging Face, DVC, Machine Learning Operations (MLOps), Language Models, Algorithms, Data Analytics, Large Language Models (LLMs), NVIDIA TensorRT, Image Processing, Generative Adversarial Networks (GANs), Generative Artificial Intelligence (GenAI), OCR, PDF, Chatbots, OpenAI API, Information Retrieval, Cluster Computing, Time Series, ARIMA Models, DeepSpeed, 3D, OpenAI GPT-4 API, LangChain, FAISS, Generative Pre-trained Transformer 3 (GPT-3), Prompt Engineering

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring