Leonid Ganeline, Developer in Vancouver, BC, Canada
Leonid is available for hire
Hire Leonid

Leonid Ganeline

Verified Expert  in Engineering

Natural Language Processing (NLP) Developer

Vancouver, BC, Canada

Toptal member since September 18, 2024

Bio

Leonid is a machine learning and data science engineer proficient in data exploration, experimentation, model training, and fine-tuning using Python, SQL, and Cloud ML. With experience in natural language processing (NLP), anomaly detection, and expertise in building ML teams, Leonid is ready for his next challenge.

Portfolio

Stealth Startup
Natural Language Processing (NLP), Software Design...
Tigera
NumPy, MXNet, Natural Language Processing (NLP), SQL, Scikit-learn...
SkyHive
NumPy, Statistics, Keras, SpaCy, Natural Language Processing (NLP)...

Experience

  • Natural Language Processing (NLP) - 8 years
  • Python - 8 years
  • Data Science - 8 years
  • Machine Learning - 8 years
  • Pandas - 6 years
  • Named-entity Recognition (NER) - 3 years
  • Anomaly Detection - 3 years
  • LangChain - 2 years

Availability

Part-time

Preferred Environment

Linux, PyCharm, Jira, Slack, GitHub, Python

The most amazing...

...thing I've done is become one of the top 10 contributors working on the LangChain package.

Work Experience

Senior Machine Learning Engineer

2023 - 2024
Stealth Startup
  • Created a chat based on retrieval-augmented generation (RAG) using private and public data in different formats.
  • Productized this chat with the Chroma vector store and open-source large language models (LLMs).
  • Performed excessive evaluation of synthetic data generated by LLMs.
Technologies: Natural Language Processing (NLP), Software Design, Large Language Models (LLMs), Data Science, LangChain, Software Development, Retrieval-augmented Generation (RAG), Vector Stores, Linux, PyCharm, Jira, Slack, Pandas, Scikit-learn, Python, Machine Learning, Embedding Models, Generative Artificial Intelligence (GenAI), Reinforcement Learning, Open-source LLMs, Prompt Engineering, ChatGPT, OpenAI, Artificial Intelligence (AI), Natural Language Toolkit (NLTK), Hugging Face Transformers, Pinecone, Algorithms, Model Tuning, API Integration, Make (formely Integromat), AI Model Training, Deep Learning, OpenAI API, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, LlamaIndex, ChatGPT Prompts, Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Analytics, Vector Databases, Document Parsing, Text Classification, JSON, Data Engineering, APIs, Mathematical Statistics, Generative Systems

Senior Machine Learning Engineer

2020 - 2023
Tigera
  • Created an anomaly detection model framework for the Calico Enterprise and Calico Cloud products. It included productizing ML models into the Calico Kubernetes clusters.
  • Developed classification models based on the CatBoost and tokenizers with novel data preprocessing.
  • Built time-series models based on the GluonTS neural networks, Isolation Forest, and local outlier factor (LOF) and ensemble clustering models.
Technologies: NumPy, MXNet, Natural Language Processing (NLP), SQL, Scikit-learn, Google Cloud Platform (GCP), Kubernetes, REST APIs, Amazon Web Services (AWS), Cloud Computing, Python, PyTorch, GitHub, Elasticsearch, Data Science, GluonTS, Pandas, Docker, Poetry, Linux, PyCharm, Jira, Slack, Anomaly Detection, Linear Algebra, Software Development, Named-entity Recognition (NER), Machine Learning, Embedding Models, Supervised Learning, Open-source LLMs, Algorithms, Clustering, Clustering Algorithms, Model Tuning, API Integration, Make (formely Integromat), AI Model Training, Deep Learning, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Classification, Data Analytics, Amazon SageMaker, Document Parsing, Unsupervised Learning, Small Language Models (SLMs), System Architecture Design, Leadership, JSON, Data Engineering, APIs, Mathematical Statistics, Statistics

Senior Machine Learning Engineer

2018 - 2020
SkyHive
  • Engaged as the first data scientist at SkyHive. Initiated data science and machine learning projects and created and owned the entire machine learning technology stack, from envisioning to production.
  • Developed production services and applications. Utilized word2vec, fastText, and embeddings from language models (ELMo) for classification and text similarity. Established workflows for data labeling, model evaluations, and regression testing.
  • Performed labeling and the evaluation of training datasets with Amazon Mechanical Turk (MTurk).
  • Implemented REST services and deployed them with Azure DevOps pipelines and Kubernetes in Azure, Google Cloud, and AWS. Reviewed code and hired for the ML team.
Technologies: NumPy, Statistics, Keras, SpaCy, Natural Language Processing (NLP), Information Retrieval, SQL, Scikit-learn, Google Cloud Platform (GCP), Kubernetes, REST APIs, Amazon Web Services (AWS), Cloud Computing, Python, PyTorch, GitHub, Azure DevOps, Data Science, MongoDB, Pandas, Docker, Linux, PyCharm, Linear Algebra, FFT, Software Development, Named-entity Recognition (NER), Machine Learning, Embedding Models, Generative Artificial Intelligence (GenAI), Supervised Learning, Azure, Natural Language Toolkit (NLTK), Algorithms, Clustering, Model Tuning, AWS Lambda, API Integration, Make (formely Integromat), AI Model Training, Computer Vision, Deep Learning, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Classification, Data Analytics, Vector Databases, Document Parsing, Text Classification, Small Language Models (SLMs), System Architecture Design, Leadership, JSON, Data Engineering, APIs, Data Scraping, Mathematical Statistics, Large Language Models (LLMs)

BizTalk Developer

2005 - 2006
Visiphor Corporation (former Sunaptic Solutions)
  • Developed complex XML transformations on Extensible Stylesheet Language Transformations (XSLT) and XML Schema Definition (XSD).
  • Built SQL queries and stored procedures that are used in the BizTalk adapters.
  • Designed message orchestrations to transfer messages between systems.
Technologies: Software Development, SQL, XSLT, .NET, C#, BizTalk Server, XML, XSD, Azure, Natural Language Toolkit (NLTK), Algorithms, API Integration, Document Processing, Proof of Concept (POC), Data Pipelines, Data Preprocessing, Architecture, Feature Engineering, Data Analytics, REST APIs, System Architecture Design, JSON, Data Mapping, Data Engineering, HL7 FHIR Standard, Healthcare, APIs

Experience

Contributor Work in a LangChain Project

https://github.com/langchain-ai/langchain
LangChain is a framework for developing applications powered by large language models (LLMs). It has 100,000 stars on GitHub and is a standard framework for AI-related applications where LangChain simplifies the entire application lifecycle. I am a content contributor, placed in the top 10 among 3,000 contributors.
https://github.com/langchain-ai/langchain/graphs/contributors
https://python.langchain.com/docs/people/

Density Prediction API

https://github.com/leo-gan/density_prediction
This project provides a FastAPI-based service for density prediction using a transformer model. Accurate density estimations of the thermosphere are essential for all spacecraft operations in low-earth orbit. Density estimation is a part of the space weather prediction process.

DGA_detection

https://github.com/leo-gan/DGA_detection
This project presents you with the model training for DGA anomaly detection. The project contains the best models.

Domain Generation Algorithms (DGA) (see Wikipedia) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets since infected computers will attempt to contact some of these domain names every day to receive updates or commands. The use of public-key cryptography in malware code makes it unfeasible for law enforcement and other actors to mimic commands from the malware controllers, as some worms will automatically reject any updates not signed by the malware controllers.

Code Correcting Agent

This application is used for code correction and code description. It includes a human-in-the-loop agent based on the custom code and the LangGraph and LangChain Python packages. It also includes unit tests, deployment, and test scripts.

Education

1995 - 2001

Master's Degree in Electronic Engineering (Signal Processing)

Samara State Aerospace University - Samara, Russia

Certifications

JUNE 2025 - PRESENT

1. Fundamentals of MCP

Hugging Face

MAY 2025 - PRESENT

Fundamentals of Agents

Hugging Face

MAY 2025 - PRESENT

Build Apps with Windsurf’s AI Coding Agents

DeepLearning.AI

FEBRUARY 2025 - PRESENT

Practical Multi AI Agents and Advanced Use Cases with crewAI

DeepLearning.AI

FEBRUARY 2025 - PRESENT

AI Agentic Design Patterns with AutoGen

DeepLearning.ai

JANUARY 2025 - PRESENT

Building AI Applications With Haystack

DeepLearing.AI

JANUARY 2025 - PRESENT

Long-Term Agentic Memory With LangGraph

DeepLearning.AI

NOVEMBER 2024 - PRESENT

Functions, Tools and Agents with LangChain

DeepLearning.AI

MARCH 2024 - PRESENT

Vector Databases: from Embeddings to Applications

DeepLearning.AI

MARCH 2024 - PRESENT

LangChain for LLM Application Development

DeepLearning.AI

JANUARY 2024 - PRESENT

LangChain Chat with Your Data

DeepLearning.AI

DECEMBER 2023 - PRESENT

Large Language Models with Semantic Search

DeepLearning.AI

DECEMBER 2023 - PRESENT

How Diffusion Models Work

DeepLearning.AI

JUNE 2017 - PRESENT

Data Manipulation at Scale: Systems and Algorithms

University of Washington

DECEMBER 2016 - PRESENT

Neural Networks for Machine Learning

University of Toronto

MAY 2016 - PRESENT

Machine Learning

Stanford University

Skills

Libraries/APIs

NumPy, Scikit-learn, Pandas, SpaCy, Natural Language Toolkit (NLTK), Hugging Face Transformers, OpenAI API, REST APIs, PyTorch, Keras

Tools

PyCharm, Jira, GitHub, Named-entity Recognition (NER), ChatGPT, Make (formely Integromat), Slack, Amazon SageMaker, Haystack

Platforms

Azure, Linux, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes, Docker, AWS Lambda, CrewAI

Storage

JSON, Data Pipelines, Elasticsearch, MongoDB

Languages

SQL, Python, C, XSLT, C#, XML, XSD

Frameworks

LangGraph, .NET, MXNet, LlamaIndex, AutoGen

Paradigms

Anomaly Detection, Azure DevOps, MapReduce, HL7 FHIR Standard, Model Context Protocol (MCP)

Industry Expertise

Healthcare

Other

Natural Language Processing (NLP), Data Science, Large Language Models (LLMs), LangChain, Artificial Intelligence (AI), Machine Learning, API Integration, Deep Learning, FFT, Algorithms, Software Development, Retrieval-augmented Generation (RAG), Embedding Models, Open-source LLMs, AI Agents, Generative Artificial Intelligence (GenAI), Supervised Learning, Prompt Engineering, OpenAI, Clustering, Clustering Algorithms, Model Tuning, AI Model Training, OpenAI GPT-4 API, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, ChatGPT Prompts, ChatGPT API, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Data Classification, Data Analytics, Vector Databases, Document Parsing, Text Classification, Unsupervised Learning, System Architecture Design, Leadership, Data Mapping, Data Engineering, APIs, Logistic Regression, Statistics, Linear Algebra, Radio, BizTalk Server, Cloud Computing, GluonTS, Poetry, Information Retrieval, Software Design, Vector Stores, Ruff, Deep Neural Networks (DNNs), Neural Networks, Evaluation, Weaviate, Signal Processing, Digital Signal Processing, Electronics, Mathematics, Mathematical Analysis, Reinforcement Learning, Pinecone, Computer Vision, Gradient Boosting, FastAPI, Data Processing, Transformers, Small Language Models (SLMs), Agentic AI, Generative Systems, Windsurf, Data Scraping, Mathematical Statistics, smolagents, Json Patch

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring