Leonid Ganeline, Developer in Vancouver, BC, Canada
Leonid is available for hire
Hire Leonid

Leonid Ganeline

Verified Expert  in Engineering

Natural Language Processing (NLP) Developer

Vancouver, BC, Canada

Toptal member since September 18, 2024

Bio

Leonid is a machine learning and data science engineer proficient in data exploration, experimentation, model training, and fine-tuning using Python, SQL, and Cloud ML. With experience in natural language processing (NLP), anomaly detection, and expertise in building ML teams, Leonid is ready for his next challenge.

Portfolio

Stealth Startup
Natural Language Processing (NLP), Software Design...
Tigera
NumPy, MXNet, Natural Language Processing (NLP), SQL, Scikit-learn...
SkyHive
NumPy, Statistics, Keras, SpaCy, Natural Language Processing (NLP)...

Experience

  • Natural Language Processing (NLP) - 8 years
  • Python - 8 years
  • Data Science - 8 years
  • Machine Learning - 8 years
  • Pandas - 6 years
  • Named-entity Recognition (NER) - 3 years
  • Anomaly Detection - 3 years
  • LangChain - 2 years

Availability

Part-time

Preferred Environment

Linux, PyCharm, Jira, Slack, GitHub, Python

The most amazing...

...thing I've done is become one of the top 10 contributors working on the LangChain package.

Work Experience

Senior Machine Learning Engineer

2023 - 2024
Stealth Startup
  • Created a chat based on retrieval-augmented generation (RAG) using private and public data in different formats.
  • Productized this chat with the Chroma vector store and open-source large language models (LLMs).
  • Performed excessive evaluation of synthetic data generated by LLMs.
Technologies: Natural Language Processing (NLP), Software Design, Large Language Models (LLMs), Data Science, LangChain, Software Development, Retrieval-augmented Generation (RAG), Vector Stores, Linux, PyCharm, Jira, Slack, Pandas, Scikit-learn, Python, Machine Learning, Embedding Models, Generative Artificial Intelligence (GenAI), Reinforcement Learning, Open-source LLMs, Prompt Engineering, ChatGPT, OpenAI, Artificial Intelligence (AI), Natural Language Toolkit (NLTK), Hugging Face Transformers, Pinecone, Algorithms, Model Tuning, API Integration, Make, AI Model Training, Deep Learning, OpenAI API, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, LlamaIndex, ChatGPT Prompts, Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering

Senior Machine Learning Engineer

2020 - 2023
Tigera
  • Created an anomaly detection model framework for the Calico Enterprise and Calico Cloud products. It included productizing ML models into the Calico Kubernetes clusters.
  • Developed classification models based on the CatBoost and tokenizers with novel data preprocessing.
  • Built time-series models based on the GluonTS neural networks, Isolation Forest, and local outlier factor (LOF) and ensemble clustering models.
Technologies: NumPy, MXNet, Natural Language Processing (NLP), SQL, Scikit-learn, Google Cloud Platform (GCP), Kubernetes, REST APIs, Amazon Web Services (AWS), Cloud Computing, Python, PyTorch, GitHub, Elasticsearch, Data Science, GluonTS, Pandas, Docker, Poetry, Linux, PyCharm, Jira, Slack, Anomaly Detection, Linear Algebra, Software Development, Named-entity Recognition (NER), Machine Learning, Embedding Models, Supervised Learning, Open-source LLMs, Algorithms, Clustering, Clustering Algorithms, Model Tuning, API Integration, Make, AI Model Training, Deep Learning, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering

Senior Machine Learning Engineer

2018 - 2020
SkyHive
  • Engaged as the first data scientist at SkyHive. Initiated data science and machine learning projects and created and owned the entire machine learning technology stack, from envisioning to production.
  • Developed production services and applications. Utilized word2vec, fastText, and embeddings from language models (ELMo) for classification and text similarity. Established workflows for data labeling, model evaluations, and regression testing.
  • Performed labeling and the evaluation of training datasets with Amazon Mechanical Turk (MTurk).
  • Implemented REST services and deployed them with Azure DevOps pipelines and Kubernetes in Azure, Google Cloud, and AWS. Reviewed code and hired for the ML team.
Technologies: NumPy, Statistics, Keras, SpaCy, Natural Language Processing (NLP), Information Retrieval, SQL, Scikit-learn, Google Cloud Platform (GCP), Kubernetes, REST APIs, Amazon Web Services (AWS), Cloud Computing, Python, PyTorch, GitHub, Azure DevOps, Data Science, MongoDB, Pandas, Docker, Linux, PyCharm, Linear Algebra, FFT, Software Development, Named-entity Recognition (NER), Machine Learning, Embedding Models, Generative Artificial Intelligence (GenAI), Supervised Learning, Azure, Natural Language Toolkit (NLTK), Algorithms, Clustering, Model Tuning, AWS Lambda, API Integration, Make, AI Model Training, Computer Vision, Deep Learning, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Data Pipelines, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering

BizTalk Developer

2005 - 2006
Visiphor Corporation (former Sunaptic Solutions)
  • Developed complex XML transformations on Extensible Stylesheet Language Transformations (XSLT) and XML Schema Definition (XSD).
  • Built SQL queries and stored procedures that are used in the BizTalk adapters.
  • Designed message orchestrations to transfer messages between systems.
Technologies: Software Development, SQL, XSLT, .NET, C#, BizTalk Server, XML, XSD, Azure, Natural Language Toolkit (NLTK), Algorithms, API Integration, Document Processing, Proof of Concept (POC), Data Pipelines, Data Preprocessing, Architecture, Feature Engineering

Experience

Contributor Work in a LangChain Project

https://github.com/langchain-ai/langchain
LangChain is a framework for developing applications powered by large language models (LLMs); it has 93,000 starts on GitHub and is a standard framework for AI-related applications, where for these applications, LangChain simplifies the entire application lifecycle and I am a content contributor, placed in the top 10 among 3,000 contributors, Python.langchain.com/v0.2/docs/people/

Density Prediction API

https://github.com/leo-gan/density_prediction
This project provides a FastAPI-based service for density prediction using a transformer model. Accurate density estimations of the thermosphere are essential for all spacecraft operations in low-earth orbit. Density estimation is a part of the space weather prediction process.

DGA_detection

https://github.com/leo-gan/DGA_detection
This project presents you with the model training for DGA anomaly detection. The project contains the best models.

Domain Generation Algorithms (DGA) (see Wikipedia) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets since infected computers will attempt to contact some of these domain names every day to receive updates or commands. The use of public-key cryptography in malware code makes it unfeasible for law enforcement and other actors to mimic commands from the malware controllers, as some worms will automatically reject any updates not signed by the malware controllers.

Education

1995 - 2001

Master's Degree in Electronic Engineering (Signal Processing)

Samara State Aerospace University - Samara, Russia

Certifications

MARCH 2024 - PRESENT

Vector Databases: from Embeddings to Applications

DeepLearning.AI

MARCH 2024 - PRESENT

LangChain for LLM Application Development

DeepLearning.AI

JANUARY 2024 - PRESENT

LangChain Chat with Your Data

DeepLearning.AI

DECEMBER 2023 - PRESENT

Large Language Models with Semantic Search

DeepLearning.AI

DECEMBER 2023 - PRESENT

How Diffusion Models Work

DeepLearning.AI

JUNE 2017 - PRESENT

Data Manipulation at Scale: Systems and Algorithms

University of Washington

DECEMBER 2016 - PRESENT

Neural Networks for Machine Learning

University of Toronto

MAY 2016 - PRESENT

Machine Learning

Stanford University

Skills

Libraries/APIs

NumPy, Scikit-learn, Pandas, SpaCy, Natural Language Toolkit (NLTK), Hugging Face Transformers, OpenAI API, REST APIs, PyTorch, Keras

Tools

PyCharm, Jira, GitHub, Named-entity Recognition (NER), ChatGPT, Make, Slack

Platforms

Azure, Linux, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes, Docker, AWS Lambda

Languages

SQL, Python, C, XSLT, C#, XML, XSD

Paradigms

Anomaly Detection, Azure DevOps, MapReduce

Storage

Data Pipelines, Elasticsearch, MongoDB

Frameworks

.NET, MXNet, LlamaIndex

Other

Natural Language Processing (NLP), Data Science, Large Language Models (LLMs), LangChain, Artificial Intelligence (AI), Machine Learning, API Integration, Deep Learning, FFT, Algorithms, Software Development, Retrieval-augmented Generation (RAG), Embedding Models, Open-source LLMs, AI Agents, Generative Artificial Intelligence (GenAI), Supervised Learning, Prompt Engineering, OpenAI, Clustering, Clustering Algorithms, Model Tuning, AI Model Training, OpenAI GPT-4 API, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, ChatGPT Prompts, ChatGPT API, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Statistics, Linear Algebra, Radio, BizTalk Server, Cloud Computing, GluonTS, Poetry, Information Retrieval, Software Design, Vector Stores, Ruff, Deep Neural Networks (DNNs), Neural Networks, Evaluation, Weaviate, Signal Processing, Digital Signal Processing, Electronics, Mathematics, Mathematical Analysis, Reinforcement Learning, Pinecone, Computer Vision, Gradient Boosting, FastAPI, Data Processing, Transformers

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring