Leonid is available for hire

Leonid Ganeline

Verified Expert in Engineering

Machine Learning Developer

Vancouver, BC, Canada

Toptal member since September 18, 2024

Expertise

NLP Artificial Intelligence Machine Learning Deep Learning RAG LLM Python JSON Software Development Algorithms Cloud Engineering Data Engineering Data Science Chatbot Development XSLT

Bio

Leonid is a machine learning and data science engineer proficient in data exploration, experimentation, model training, and fine-tuning using Python, SQL, and Cloud ML. With experience in natural language processing (NLP), anomaly detection, and expertise in building ML teams, Leonid is ready for his next challenge.

Portfolio

Renes Trading LLC

Python, Machine Learning, Predictive Analytics, Recommendation Systems...

Stealth Startup

Natural Language Processing (NLP), Software Design...

Tigera

NumPy, MXNet, Natural Language Processing (NLP), SQL, Scikit-learn...

Experience

Python - 10 years
Natural Language Processing (NLP) - 9 years
Machine Learning - 9 years
Data Science - 8 years
Pandas - 6 years
Named-entity Recognition (NER) - 3 years
Anomaly Detection - 3 years
LangChain - 2 years

Preferred Environment

Linux, PyCharm, Jira, Slack, GitHub, Python

The most amazing...

...thing I've done is become one of the top 10 contributors to the LangChain package.

Work Experience

AI/ML Engineer

2025 - 2026

Renes Trading LLC

Created the 3D investor classification that allows flexible calculations of the risk score and ROI. See a blog post (https://www.serava.com/blog/real-estate-investor-dna-framework).
Developed the AI investor adviser, which generates investor reports that help with property investment decisions.
Developed and trained the price trend model for properties.

Technologies: Python, Machine Learning, Predictive Analytics, Recommendation Systems, API Integration, Modeling, Large Language Models (LLMs), Artificial Intelligence (AI), Data Science, Django

Senior Machine Learning Engineer

2023 - 2024

Stealth Startup

Created a chat based on retrieval-augmented generation (RAG) using private and public data in different formats.
Productized this chat with the Chroma vector store and open-source large language models (LLMs).
Performed excessive evaluation of synthetic data generated by LLMs.

Technologies: Natural Language Processing (NLP), Software Design, Large Language Models (LLMs), Data Science, LangChain, Software Development, Retrieval-augmented Generation (RAG), Vector Stores, Linux, PyCharm, Jira, Slack, Pandas, Scikit-learn, Python, Machine Learning, Embedding Models, Generative Artificial Intelligence (GenAI), Reinforcement Learning, Open-source LLMs, Prompt Engineering, ChatGPT, OpenAI, Artificial Intelligence (AI), Natural Language Toolkit (NLTK), Hugging Face Transformers, Pinecone, Algorithms, Model Tuning, API Integration, Make (formerly Integromat), AI Model Training, Deep Learning, OpenAI API, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, LlamaIndex, ChatGPT Prompts, Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Analytics, Vector Databases, Document Parsing, Text Classification, JSON, Data Engineering, APIs, Mathematical Statistics, Generative Systems, Cloud, Software Architecture, Model Validation, Fine-tuning, AI Prompts, AI Chatbots, Vector Search, Context Engineering, AI Architecture, Model Evaluation, Model Deployment, RAG Architecture, AI Integration, Git, AI-assisted Development, AI Enablement, Software Development Lifecycle (SDLC), AI Tools, System Development Life Cycle (SDLC), AI Pipeline

Senior Machine Learning Engineer

2020 - 2023

Tigera

Created an anomaly detection model framework for the Calico Enterprise and Calico Cloud products. It included productizing ML models into the Calico Kubernetes clusters.
Developed classification models based on the CatBoost and tokenizers with novel data preprocessing.
Built time-series models based on the GluonTS neural networks, Isolation Forest, and local outlier factor (LOF) and ensemble clustering models.

Technologies: NumPy, MXNet, Natural Language Processing (NLP), SQL, Scikit-learn, Google Cloud Platform (GCP), Kubernetes, REST APIs, Amazon Web Services (AWS), Cloud Computing, Python, PyTorch, GitHub, Elasticsearch, Data Science, GluonTS, Pandas, Docker, Poetry, Linux, PyCharm, Jira, Slack, Anomaly Detection, Linear Algebra, Software Development, Named-entity Recognition (NER), Machine Learning, Embedding Models, Supervised Learning, Open-source LLMs, Algorithms, Clustering, Clustering Algorithms, Model Tuning, API Integration, Make (formerly Integromat), AI Model Training, Deep Learning, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Classification, Data Analytics, Amazon SageMaker, Document Parsing, Unsupervised Learning, Small Language Models (SLMs), System Architecture Design, Leadership, JSON, Data Engineering, APIs, Mathematical Statistics, Statistics, Cloud, Software Architecture, Gradient Boosting, LightGBM, XGBoost, Time Series Data, Model Validation, Fine-tuning, Vector Search, Context Engineering, AI Architecture, Model Evaluation, Model Monitoring, Model Deployment, AI Integration, Git, Taxonomy, AI Enablement, Software Development Lifecycle (SDLC), AI Tools, System Development Life Cycle (SDLC), AI Pipeline, Natural Language Understanding (NLU), Pydantic

Senior Machine Learning Engineer

2018 - 2020

SkyHive

Engaged as the first data scientist at SkyHive. Initiated data science and machine learning projects and created and owned the entire machine learning technology stack, from envisioning to production.
Developed production services and applications. Utilized word2vec, fastText, and embeddings from language models (ELMo) for classification and text similarity. Established workflows for data labeling, model evaluations, and regression testing.
Performed labeling and the evaluation of training datasets with Amazon Mechanical Turk (MTurk).
Implemented REST services and deployed them with Azure DevOps pipelines and Kubernetes in Azure, Google Cloud, and AWS. Reviewed code and hired for the ML team.

Technologies: NumPy, Statistics, Keras, SpaCy, Natural Language Processing (NLP), Information Retrieval, SQL, Scikit-learn, Google Cloud Platform (GCP), Kubernetes, REST APIs, Amazon Web Services (AWS), Cloud Computing, Python, PyTorch, GitHub, Azure DevOps, Data Science, MongoDB, Pandas, Docker, Linux, PyCharm, Linear Algebra, FFT, Software Development, Named-entity Recognition (NER), Machine Learning, Embedding Models, Generative Artificial Intelligence (GenAI), Supervised Learning, Azure, Natural Language Toolkit (NLTK), Algorithms, Clustering, Model Tuning, AWS Lambda, API Integration, Make (formerly Integromat), AI Model Training, Computer Vision, Deep Learning, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Classification, Data Analytics, Vector Databases, Document Parsing, Text Classification, Small Language Models (SLMs), System Architecture Design, Leadership, JSON, Data Engineering, APIs, Data Scraping, Mathematical Statistics, Large Language Models (LLMs), Cloud, Ontologies, Software Architecture, Model Validation, Fine-tuning, Vector Search, Context Engineering, AI Architecture, Model Evaluation, Model Monitoring, Model Deployment, RAG Architecture, AI Integration, Git, Web Scraping, Taxonomy, AI Enablement, Software Development Lifecycle (SDLC), AI Tools, System Development Life Cycle (SDLC), AI Pipeline, Natural Language Understanding (NLU)

BizTalk Developer

2005 - 2006

Visiphor Corporation (former Sunaptic Solutions)

Developed complex XML transformations on Extensible Stylesheet Language Transformations (XSLT) and XML Schema Definition (XSD).
Built SQL queries and stored procedures that are used in the BizTalk adapters.
Designed message orchestrations to transfer messages between systems.

Technologies: Software Development, SQL, XSLT, .NET, C#, BizTalk Server, XML, XSD, Azure, Natural Language Toolkit (NLTK), Algorithms, API Integration, Document Processing, Proof of Concept (POC), Data Pipelines, Data Preprocessing, Architecture, Feature Engineering, Data Analytics, REST APIs, System Architecture Design, JSON, Data Mapping, Data Engineering, HL7 FHIR Standard, Healthcare, APIs, Cloud, Software Architecture, Software Development Lifecycle (SDLC), System Development Life Cycle (SDLC)

Experience

LangChain Project Contributor

https://github.com/langchain-ai/langchain

LangChain is a framework for developing applications powered by large language models (LLMs) which has 125,000 stars on GitHub and is a top framework for AI-related applications, where LangChain covers the entire application lifecycle; I am a content contributor, placed in the top 10 among 3,800 contributors.

https://github.com/langchain-ai/langchain/graphs/contributors
https://python.langchain.com/docs/people/

Processing Legal Case Bundles

A UK-based legal company processes the case bundles. The bundle is a single PDF containing all case-related documents (usually hundreds).

The project's goal was to parse these huge PDF files, extract, and analyze all case information.

Tools:
• Hugging Face Gradio - for the chat UI
• LangGraph - for the agent interactions
• Google Gemini models - for parsing, extraction, analysis

Density Prediction API

https://github.com/leo-gan/density_prediction

This project provides a FastAPI-based service for density prediction using a transformer model. Accurate density estimations of the thermosphere are essential for all spacecraft operations in low-earth orbit. Density estimation is a part of the space weather prediction process.

Legal Document Processing

AI-driven REST service on Google Cloud for legal case filings across multiple court systems. The project involved reviewing and selecting a machine learning model designed to parse various legal document types, extract key elements, and apply meta-tags for e-filing processes.

Code Correcting Agent

This application is used for code correction and code description and it includes a human-in-the-loop agent based on the custom code and the LangGraph and LangChain Python packages. It also includes unit tests, deployment, and test scripts.

DGA_detection

https://github.com/leo-gan/DGA_detection

This project presents you with the model training for DGA anomaly detection. The project contains the best models.

Domain Generation Algorithms (DGA) (see Wikipedia) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets since infected computers will attempt to contact some of these domain names every day to receive updates or commands. The use of public-key cryptography in malware code makes it unfeasible for law enforcement and other actors to mimic commands from the malware controllers, as some worms will automatically reject any updates not signed by the malware controllers.

AI Agent to Develop Software

The online IDE is integrated with AI to support multiple coding languages such as React Native, React for web, and Python. The platform includes advanced features for contextualizing codebases, improving code indexing, and optimizing search mechanisms. I used the OpenAI GPT and local Phi models, the LangGraph agent framework, and the Chroma vector store.

AI-based Real Estate Platform

http://serava.com

An AI-powered SaaS platform helping real estate investors make smarter, data-driven decisions. It combines predictive analytics and proprietary valuation models to simplify complex investment data and empower both professional and high-net-worth users.

Education

1995 - 2001

Master's Degree in Electronic Engineering (Signal Processing)

Samara State Aerospace University - Samara, Russia

Certifications

AUGUST 2025 - PRESENT

Reinforcement Fine-Tuning LLMs With GRPO

DeepLearning.AI

JUNE 2025 - PRESENT

Post-training of LLMs

DeepLearning.AI

JUNE 2025 - PRESENT

Fundamentals of MCP

Hugging Face

MAY 2025 - PRESENT

Fundamentals of Agents

Hugging Face

MAY 2025 - PRESENT

Build Apps with Windsurf’s AI Coding Agents

DeepLearning.AI

FEBRUARY 2025 - PRESENT

Practical Multi AI Agents and Advanced Use Cases with crewAI

DeepLearning.AI

FEBRUARY 2025 - PRESENT

AI Agentic Design Patterns with AutoGen

DeepLearning.ai

JANUARY 2025 - PRESENT

Building AI Applications With Haystack

DeepLearing.AI

JANUARY 2025 - PRESENT

Long-Term Agentic Memory With LangGraph

DeepLearning.AI

NOVEMBER 2024 - PRESENT

Functions, Tools and Agents with LangChain

DeepLearning.AI

SEPTEMBER 2024 - PRESENT

Quantization Fundamentals with Hugging Face

Deeplearning.ai

JULY 2024 - PRESENT

Retrieval Optimization: Tokenization to Vector Quantization

Deeplearning.ai

MARCH 2024 - PRESENT

Vector Databases: from Embeddings to Applications

DeepLearning.AI

MARCH 2024 - PRESENT

LangChain for LLM Application Development

DeepLearning.AI

JANUARY 2024 - PRESENT

LangChain Chat with Your Data

DeepLearning.AI

DECEMBER 2023 - PRESENT

Large Language Models with Semantic Search

DeepLearning.AI

DECEMBER 2023 - PRESENT

How Diffusion Models Work

DeepLearning.AI

JUNE 2017 - PRESENT

Data Manipulation at Scale: Systems and Algorithms

University of Washington

DECEMBER 2016 - PRESENT

Neural Networks for Machine Learning

University of Toronto

MAY 2016 - PRESENT

Machine Learning

Stanford University

Skills

Libraries/APIs

NumPy, Scikit-learn, Pandas, SpaCy, Natural Language Toolkit (NLTK), Hugging Face Transformers, OpenAI API, XGBoost, Pydantic, REST APIs, PyTorch, Keras, Gradio

Tools

PyCharm, Jira, GitHub, Named-entity Recognition (NER), ChatGPT, Make (formerly Integromat), AI Prompts, Git, Slack, Amazon SageMaker, Windsurf, Haystack, Claude Code

Languages

Python, SQL, C, XSLT, C#, XML, XSD

Platforms

Azure, Linux, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes, Docker, AWS Lambda, CrewAI

Storage

JSON, Data Pipelines, Elasticsearch, MongoDB, Google Cloud

Industry Expertise

System Development Life Cycle (SDLC), Healthcare

Frameworks

LangGraph, Agentic Frameworks, LightGBM, .NET, MXNet, LlamaIndex, AutoGen, Django

Paradigms

Anomaly Detection, Azure DevOps, MapReduce, HL7 FHIR Standard, Model Context Protocol (MCP)

Other

BizTalk Server, Natural Language Processing (NLP), Data Science, Large Language Models (LLMs), LangChain, Retrieval-augmented Generation (RAG), Artificial Intelligence (AI), Machine Learning, API Integration, Deep Learning, Natural Language Understanding (NLU), AI-assisted Development, Software Development Lifecycle (SDLC), AI Tools, FFT, Algorithms, Radio, Software Development, Cloud Computing, Embedding Models, Open-source LLMs, AI Agents, Signal Processing, Generative Artificial Intelligence (GenAI), Supervised Learning, Prompt Engineering, OpenAI, Clustering, Clustering Algorithms, Model Tuning, AI Model Training, OpenAI GPT-4 API, Document Processing, Gradient Boosting, Data Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, ChatGPT Prompts, ChatGPT API, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Classification, Data Analytics, Vector Databases, Document Parsing, Text Classification, Unsupervised Learning, System Architecture Design, Leadership, Data Mapping, Data Engineering, APIs, Logistic Regression, Cloud, Software Architecture, Time Series Data, Model Validation, Quantization, AI Chatbots, Vector Search, Context Engineering, AI Architecture, Model Evaluation, RAG Architecture, AI Integration, Web Scraping, Taxonomy, AI Enablement, Chatbots, Chatbot Conversation Design, Statistics, Linear Algebra, GluonTS, Poetry, Information Retrieval, Software Design, Vector Stores, Ruff, Deep Neural Networks (DNNs), Neural Networks, Evaluation, Weaviate, Digital Signal Processing, Electronics, Mathematics, Mathematical Analysis, Reinforcement Learning, Pinecone, Computer Vision, FastAPI, Transformers, Small Language Models (SLMs), Agentic AI, Generative Systems, Data Scraping, Mathematical Statistics, smolagents, Json Patch, Ontologies, PDF, ChromaDB, Qdrant, Tokenization, GRPO, Fine-tuning, Reinforcement Learning from Human Feedback (RLHF), Google Gemini, Google GenAI, Model Monitoring, Model Deployment, Cursor AI, Data Anonymization, Optical Character Recognition (OCR), AI Pipeline, Predictive Analytics, Recommendation Systems, Modeling

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring