Leonid Ganeline
Verified Expert in Engineering
Natural Language Processing (NLP) Developer
Vancouver, BC, Canada
Toptal member since September 18, 2024
Leonid is a machine learning and data science engineer proficient in data exploration, experimentation, model training, and fine-tuning using Python, SQL, and Cloud ML. With experience in natural language processing (NLP), anomaly detection, and expertise in building ML teams, Leonid is ready for his next challenge.
Portfolio
Experience
- Natural Language Processing (NLP) - 8 years
- Python - 8 years
- Data Science - 8 years
- Machine Learning - 8 years
- Pandas - 6 years
- Named-entity Recognition (NER) - 3 years
- Anomaly Detection - 3 years
- LangChain - 2 years
Availability
Preferred Environment
Linux, PyCharm, Jira, Slack, GitHub, Python
The most amazing...
...thing I've done is become one of the top 10 contributors working on the LangChain package.
Work Experience
Senior Machine Learning Engineer
Stealth Startup
- Created a chat based on retrieval-augmented generation (RAG) using private and public data in different formats.
- Productized this chat with the Chroma vector store and open-source large language models (LLMs).
- Performed excessive evaluation of synthetic data generated by LLMs.
Senior Machine Learning Engineer
Tigera
- Created an anomaly detection model framework for the Calico Enterprise and Calico Cloud products. It included productizing ML models into the Calico Kubernetes clusters.
- Developed classification models based on the CatBoost and tokenizers with novel data preprocessing.
- Built time-series models based on the GluonTS neural networks, Isolation Forest, and local outlier factor (LOF) and ensemble clustering models.
Senior Machine Learning Engineer
SkyHive
- Engaged as the first data scientist at SkyHive. Initiated data science and machine learning projects and created and owned the entire machine learning technology stack, from envisioning to production.
- Developed production services and applications. Utilized word2vec, fastText, and embeddings from language models (ELMo) for classification and text similarity. Established workflows for data labeling, model evaluations, and regression testing.
- Performed labeling and the evaluation of training datasets with Amazon Mechanical Turk (MTurk).
- Implemented REST services and deployed them with Azure DevOps pipelines and Kubernetes in Azure, Google Cloud, and AWS. Reviewed code and hired for the ML team.
BizTalk Developer
Visiphor Corporation (former Sunaptic Solutions)
- Developed complex XML transformations on Extensible Stylesheet Language Transformations (XSLT) and XML Schema Definition (XSD).
- Built SQL queries and stored procedures that are used in the BizTalk adapters.
- Designed message orchestrations to transfer messages between systems.
Experience
Contributor Work in a LangChain Project
https://github.com/langchain-ai/langchainDensity Prediction API
https://github.com/leo-gan/density_predictionDGA_detection
https://github.com/leo-gan/DGA_detectionDomain Generation Algorithms (DGA) (see Wikipedia) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets since infected computers will attempt to contact some of these domain names every day to receive updates or commands. The use of public-key cryptography in malware code makes it unfeasible for law enforcement and other actors to mimic commands from the malware controllers, as some worms will automatically reject any updates not signed by the malware controllers.
Education
Master's Degree in Electronic Engineering (Signal Processing)
Samara State Aerospace University - Samara, Russia
Certifications
Vector Databases: from Embeddings to Applications
DeepLearning.AI
LangChain for LLM Application Development
DeepLearning.AI
LangChain Chat with Your Data
DeepLearning.AI
Large Language Models with Semantic Search
DeepLearning.AI
How Diffusion Models Work
DeepLearning.AI
Data Manipulation at Scale: Systems and Algorithms
University of Washington
Neural Networks for Machine Learning
University of Toronto
Machine Learning
Stanford University
Skills
Libraries/APIs
NumPy, Scikit-learn, Pandas, SpaCy, Natural Language Toolkit (NLTK), Hugging Face Transformers, OpenAI API, REST APIs, PyTorch, Keras
Tools
PyCharm, Jira, GitHub, Named-entity Recognition (NER), ChatGPT, Make, Slack
Platforms
Azure, Linux, Amazon Web Services (AWS), Google Cloud Platform (GCP), Kubernetes, Docker, AWS Lambda
Languages
SQL, Python, C, XSLT, C#, XML, XSD
Paradigms
Anomaly Detection, Azure DevOps, MapReduce
Storage
Data Pipelines, Elasticsearch, MongoDB
Frameworks
.NET, MXNet, LlamaIndex
Other
Natural Language Processing (NLP), Data Science, Large Language Models (LLMs), LangChain, Artificial Intelligence (AI), Machine Learning, API Integration, Deep Learning, FFT, Algorithms, Software Development, Retrieval-augmented Generation (RAG), Embedding Models, Open-source LLMs, AI Agents, Generative Artificial Intelligence (GenAI), Supervised Learning, Prompt Engineering, OpenAI, Clustering, Clustering Algorithms, Model Tuning, AI Model Training, OpenAI GPT-4 API, Document Processing, Minimum Viable Product (MVP), Proof of Concept (POC), Hugging Face, ChatGPT Prompts, ChatGPT API, Data Preprocessing, Technical Leadership, Architecture, Feature Engineering, Statistics, Linear Algebra, Radio, BizTalk Server, Cloud Computing, GluonTS, Poetry, Information Retrieval, Software Design, Vector Stores, Ruff, Deep Neural Networks (DNNs), Neural Networks, Evaluation, Weaviate, Signal Processing, Digital Signal Processing, Electronics, Mathematics, Mathematical Analysis, Reinforcement Learning, Pinecone, Computer Vision, Gradient Boosting, FastAPI, Data Processing, Transformers
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring