Alex Burlacu
Verified Expert in Engineering
Machine Learning Developer
Chișinău, Chisinau, Moldova
Toptal member since May 24, 2022
Over the years, as an experienced machine learning engineer, Alex dealt with diverse problems, ranging from computer vision to natural language processing and time series forecasting. He has worked as a single engineer on the project several times, and despite scarce data and few computational resources, he succeeded where others failed. He has acted as a machine learning team lead for the past few years. In his spare time, Alex enjoys engaging in independent lecturing and ML research.
Portfolio
Experience
- Python 3 - 8 years
- Machine Learning - 7 years
- Scikit-learn - 7 years
- Docker - 6 years
- Deep Learning - 6 years
- PyTorch - 5 years
- Team Mentoring - 5 years
- Machine Learning Operations (MLOps) - 5 years
Availability
Preferred Environment
Ubuntu, Python 3, Visual Studio Code (VS Code), Git, Docker, PyTorch, Neural Networks
The most amazing...
...thing I've made is an actively-learned multilingual BERT model used in document tagging to identify tender attributes and speed up document processing.
Work Experience
ML/MLOps Consultant
Self-employed
- Consulted for a news startup, helping them establish an MLOps culture and processes to retrain their NLP models reliably. Enabled a team of data scientists to use LLMs for news summarization, news fact-checking, and synthetic data generation.
- Advised a team of senior data scientists on optimizing their cloud costs through a combination of software optimizations and right-sizing and moving to a simple AWS, Azure, and on-premise cloud setup (for both batch inference and training).
- Implemented a multi-modal (vision and text) photo editing application using GPT-4 and DALL-E 3. The application was deployed on Google Cloud Run using Docker.
- Helped a startup right-size its GCP VMs for CNN fine-tuning. Used Terraform and Ansible to reliably provision VMs. Defined the architecture to support multi-model placement on A100 GPUs for training and use Spot VMs to improve cost efficiency.
NLP Expert
Undetectable AI
- Developed a custom self-hosted LLM solution for the project, deployed it using Terraform on AWS ECS, and optimized its serving on EC2 g5 instances using vLLM. Also developed load tests to assess, profile, and optimize the inference stack.
- Ran multiple text EDA analyses, including POS tags, semantic coherence, n-grams, and text readability statistics, to identify relevant structural and semantic patterns that we later used to adjust the core product's performance.
- Researched non-LLM-based solutions and automatic textual patterns mining from documents for text enhancement in an effort to create a new iteration of the product.
Software Lead
ClearML
- Led the implementation of the ClearGPT project, a set of no-code tools to train and deploy self-hosted LLMs for enterprises. Actively shaped the project's design and roadmap at different stages, specifically MVP, demo, and customer PoC.
- Tuned and deployed multiple LLaMA-based and FLAN-T5 models on AWS G5 instances for optimal price/performance ratio. Also worked with multi-GPU and multi-node training using HuggingFace Accelerate.
- Built tools to generate Q&A datasets from documentation pages, a RAG-aware dataset generation pipeline, and a custom trainer that oversamples the worst-performing examples to force the model to focus more on improving the performance of hard examples.
- Led the ClearML SDK team that developed features and ensured the timely release of both open-source and enterprise versions of packages. Actively involved in prioritizing and planning features for future releases.
- Contributed to community and enterprise support activities. Handled technical onboarding and actively advised clients on how best to leverage ClearML for their MLOps needs and ClearGPT for their enterprise GenAI setting.
- Debugged multiple issues related to dataset metadata management, pipelines API, distributed LLM training, and environment tracking for reproducibility.
- Acted as the go-to person for Google Cloud-related issues. Helped with the creation of custom machine images and support Spot instances. Provided solution architecture support for enterprise customers running ClearML on GCP.
Machine Learning Team Lead
DevelopmentAid
- Used machine learning (ML) and deep learning for natural language processing (NLP) on documents to make data entry more efficient.
- Developed and produced multiple ML microservices, including one to classify and tag documents through named entity recognition using PyTorch and BERT, and another to deal with an imbalanced multi-output text classification using scikit-learn.
- Defined and wrote programs for fast data annotation and synthetic data enrichment for named entity recognition (NER). Increased the dataset size from a handful of well-annotated documents to more than a hundred.
- Guided the development of new ML models and implemented practices such as ML code review, cross-validation, and replicable experiments.
- Defined some MLOps practices mainly related to model serving using Ray Serve and experiment tracking with MLflow.
- Established an observability infrastructure to reduce the number of unreported errors and accelerated bug discovery from a few days to about 10 minutes. Used Jaeger and ELK and helped in the adoption of Prometheus and Grafana.
- Defined and documented the deployment process and reduced the time to deploy trained models to less than 10 minutes. Managed a Jenkins instance and used Jenkins pipelines for that.
- Established code reviews, periodic one-on-one meetings, explicit coding best practices, and agile processes like iteration planning, planning poker, and standup meetings, reducing feature cycle time by 5x and new bugs per iteration to 0.3.
- Led a team of three junior engineers since July 2020 in developing an automated data entry solution, developing and deploying new ML models, and handling our observability and CI infrastructures.
Research Intern
Universite Sorbonne Paris Nord
- Increased sample efficiency of deep learning algorithms, mixing techniques from self-supervised, semi-supervised, and few-shot learning applicable to images and other data sources.
- Used Google Colab notebooks to run experiments, then switched to Google Cloud Platform. Provisioned with Terraform and Ansible, creating a graphics processing unit (GPU) worker and a tracking server in a single bash command within one to two minutes.
- Used MLFlow for experiment tracking and a combination of Papermill and Optuna for hyperparameter optimization.
University Assistant
Technical University of Moldova
- Recreated and taught the network programming course and two lab projects focusing on concurrency primitives and networking protocols.
- Authored and lectured the real-time programming course and three lab projects covering message-based concurrency, including actor model and CSP, and message-oriented integration patterns and protocols like MQTT and XMPP.
- Overhauled and led the distributed systems and network programming courses and labs. Updated the real-time programming course and taught it as well.
- Covered diverse topics in the distributed systems course, such as data processing systems, distributed databases, microservice design patterns, and main problems of distributed systems, like the consensus, time, and exactly-once delivery.
- Mentored five final-year students for their semester project; two of them chose me as their bachelor thesis supervisor. Led labs for over 40 students per semester.
Summer Intern
Cern
- Participated in the EP-SFT group as an associate partner, receiving a grant from the UK Science and Technology Facilities Council (STFC).
- Developed a project to benchmark the TMVA package against TensorFlow on event-by-event inference performance targeting multi-layered perceptrons for high-energy physics (HEP).
- Searched for the bottlenecks and future directions of optimization for the TMVA subpackage of the ROOT scientific package.
- Concluded that, for one-by-one and small batch (< 32) inference modes, TMVA is up to two orders of magnitude faster than TensorFlow 1.8, built from source with AVX512 enabled using a C++ inference API.
- Presented a poster about this work at a session at the EEML 2019 Summer School in Bucharest.
Machine Learning Engineer
Redox Entertainment
- Researched and developed neural networks for medical image analysis of oocytes for IVF. Created over ten bespoke neural network architectures using techniques like pre-training with autoencoders and siamese networks for self-supervised learning.
- Mentored and trained a Ph.D. intern for three months who became part of the team, also working on deep learning-related projects.
- Developed a specialized architecture for a small-sized, low-variance dataset of medical images with a performance on par with Google's AutoML Vision.
- Debugged a pre-processing data issue leaking the test set and wrongfully giving very high accuracy during evaluation. Prevented releasing the broken model, thus saving the company's reputation.
Co-founder and CTO
BookVoyager
- Developed a search and content-based recommendation system for fiction books that extracts features from raw text and provides recommendations based on those features.
- Implemented logging for faster troubleshooting and defined the architecture as a multiservice system.
- Built the feature extraction and recommendation sub-systems based on token-level and whole-text analysis with SpaCy.
- Participated in customer interviews, defined both business and development processes, and pitched the project at various venues.
- Sped up the computation of recommendation results 85x by using a pre-allocated array and used profiling to identify the bottleneck.
Experience
Serverless Platform
To enrich its functionality, I added a few other services like RabbitMQ, Minio, PostgresSQL, MongoDB, and Apache Tika. To make it easier to use, I wrote an API Gateway-like service, a TCP server translating HTTP requests into messages and sending the responses back to the caller as HTTP responses.
The project later became the base of an independently taught course on distributed systems design. It was a free course, with 25 students enrolled, 11 of which received certificates of completion.
Alex's Occasional Blog Posts | Personal Blog
https://alexandruburlacu.github.ioI created it using Jekyll, customized some of the templates, and added Google Analytics and Google Tag Manager.
Lightweight MLOps Template for AI Research
Moldova's National Python and AI Curriculum
https://mecc.gov.md/sites/default/files/curriculum_ia_aprobat_cnc.pdfEducation
Master's Degree in Computer Science
Stefan cel Mare University - Suceava, Romania
Master's Degree in Computer Science
Technical University of Moldova - Chisinau, Moldova
Bachelor's Degree in Computer Science
Technical University of Moldova - Chisinau, Moldova
Certifications
Google Cloud Certified Professional Machine Learning Engineer
Google Cloud
Google Cloud Certified Professional Cloud Architect
Google Cloud
Certified Kubernetes Application Developer (CKAD)
The Cloud Native Computing Foundation (CNCF)
Deep Learning Engineer
Workera
Skills
Libraries/APIs
Scikit-learn, REST APIs, PyTorch, TensorFlow, Jenkins Pipeline, Pandas, Keras, Vue, OpenCV, SpaCy
Tools
Git, Docker Compose, RabbitMQ, Jekyll, Google Analytics, Jenkins, Grafana, Scikit-image, Terraform, Ansible, Bazel, Kustomize, Helm, BigQuery, AWS CLI, Amazon Elastic Container Service (ECS), Amazon SageMaker
Languages
Python 3, Python, Elixir, Bash, SQL, C++, C, Python 2, Lisp, HTML, CSS, Java 8, Erlang, Scala
Paradigms
REST, Functional Programming, DevOps, Unit Testing, Object-oriented Analysis & Design (OOAD), Object-oriented Programming (OOP), Agile Software Development, Serverless Architecture, Parallel Programming, Actor Model, Microservices, Design Patterns
Platforms
Docker, ClearML, Ubuntu, Amazon EC2, Amazon Web Services (AWS), Kubernetes, Jupyter Notebook, Google Cloud Platform (GCP), Visual Studio Code (VS Code), Azure, Cloud Run
Frameworks
Flask, Ray, Optuna
Storage
JSON, Google Cloud, MongoDB, XML-RPC, PostgreSQL, Amazon S3 (AWS S3)
Other
Deep Learning, Machine Learning, Machine Learning Operations (MLOps), Natural Language Processing (NLP), Data Science, Artificial Intelligence (AI), BERT, Neural Networks, Generative Pre-trained Transformers (GPT), Fine-tuning, Large Language Models (LLMs), Language Models, University Teaching, Team Mentoring, FastAPI, Self-supervised Learning, Computer Vision, Team Leadership, Hugging Face, Graphics Processing Unit (GPU), Generative Artificial Intelligence (GenAI), Prompt Engineering, Data Synthesis, OpenAI, Open-source LLMs, Cloud, Large Language Model Operations (LLMOps), Distributed Systems, Cloud Computing, MinIO, Serverless, Transmission Control Protocol (TCP), HTTP, Coding, HATEOAS, Jaeger, Prometheus, Transformers, MLflow, Medical Imaging, Few-shot Learning, Hyperparameter Optimization, ROOT, HTTP 2, Message Queues, Mentorship, Image Processing, Sentiment Analysis, Data Engineering, Multi-GPU Training, Llama, Flan-T5, Question Generation, Q&A Bots, Retrieval-augmented Generation (RAG), OpenAI GPT-3 API, Debugging, Research, Mistral AI, GCP VMs, Google Cloud AutoML, Llama 3, Convolutional Neural Networks (CNNs), LLM Serving, Profiling
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring