Nilan Saha, Developer in Vancouver, BC, Canada
Nilan is available for hire
Hire Nilan

Nilan Saha

Verified Expert  in Engineering

NLP and Machine Learning Developer

Vancouver, BC, Canada
Toptal Member Since
February 17, 2022

Nilan is a natural language processing and machine learning expert with a bachelor's degree in computer science and a master's degree in data science specializing in computational linguistics. Nilan has vast experience building large-scale recommendation systems, personalization technology, and NLP algorithms. A Kaggle expert, he has well-cited publications in this domain.


Scikit-learn, Python, Go, Elasticsearch, Amazon DynamoDB, Neo4j...
Packt Publishing
Technical Writing, GPT, Natural Language Processing (NLP)...
Python, SpaCy, PyTorch, Deep Learning




Preferred Environment

MacOS, Slack, Zoom, Visual Studio Code (VS Code), Jupyter Notebook

The most amazing...

...thing I've ever built is Convex, an NLP library for part-of-speech (POS) tagging using character and word-level embedding neural nets.

Work Experience

Machine Learning Engineer

2020 - PRESENT
  • Built and deployed multiple personalization ML pipelines to power the main user feed for the app. The pipelines were built using Python, Kafka, AWS S3, scikit-learn, and NLTK and deployed to AWS.
  • Designed, architected, and built an end-to-end search solution using Python and Elasticsearch. The analytics pipeline that drives data-driven decisions and better search results' relevance ranking was built using Kafka, Amazon Pinpoint, and AWS S3.
  • Designed, architected, and built an end-to-end autocomplete solution using Python and Elasticsearch. The analytics pipeline that drives data-driven decisions and relevance ranking of search results was built using Kafka, Amazon Pinpoint, and AWS S3.
Technologies: Scikit-learn, Python, Go, Elasticsearch, Amazon DynamoDB, Neo4j, GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT)

Technical Reviewer

2020 - 2020
Packt Publishing
  • Collaborated with the team to test out all code samples and make sure it was easy for users to replicate the projects from the book. The code consisted of neural networks built in PyTorch and various other pre-processing utilities in NLTK.
  • Worked with the editing team to review all the book chapters and make necessary corrections, technical and otherwise.
  • Suggested various improvements in terms of the book content.
Technologies: Technical Writing, Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), PyTorch

Machine Learning Engineer

2020 - 2020
  • Developed a deep neural net with multiple heads using ELMo embeddings to identify phrases that could be used to generate quizzes and achieve other downstream tasks. The model was built using Python, PyTorch, and Flair and deployed to AWS.
  • Developed a pipeline using spaCy and Python to extract triplets from textual data, build relations using them, and represent it in the form of a knowledge graph.
  • Led initiatives to build a dataset to train models based on implicit user feedback.
Technologies: Python, SpaCy, PyTorch, Deep Learning

Data Scientist

2018 - 2018
  • Developed a heuristic algorithm using named-entity recognition (NER), spaCy, and the natural language toolkit (NLTK) to identify cooking ingredients from recipe instruction data.
  • Created an algorithm to use identified ingredients to generate optimal cooking instructions to reduce friction for end-users.
  • Bundled the model into an API and worked with the in-house tech team to integrate it into the entire stack.
Technologies: Natural Language Toolkit (NLTK), SpaCy, Python, Amazon Web Services (AWS)

Convex is a lightweight NLP library that supports POS tagging using deep learning models coded in PyTorch. It uses both custom character-level and word-level embeddings to power the neural net and provide superior performance.

Multi-label Classifier for Toxic Comments

I built this multi-label classification model using the BERT from the Hugging Face transformers library, PyTorch, and binary cross entropy loss classification for classifying toxic comments and achieved an F1-score of 90.5%.

Categorical Embedding Encoder
I developed a framework using Python and Keras. It can be used to train and represent categorical variables using vector embeddings for tabular datasets, which improves classical machine learning models.
2019 - 2020

Master's Degree in Data Science and Computational Linguistics

University of British Columbia - Vancouver, Canada

2015 - 2019

Bachelor's Degree in Computer Science

Institute of Engineering and Management - Kolkata, India


SpaCy, PyTorch, Scikit-learn, Natural Language Toolkit (NLTK), Keras


Slack, Zoom


Python, Go, GraphQL




Data Science


Databases, Elasticsearch, Amazon DynamoDB, Neo4j


Software Design Patterns, Apache Kafka, MacOS, Visual Studio Code (VS Code), Jupyter Notebook, Amazon Web Services (AWS)


Natural Language Processing (NLP), Machine Learning, GPT, Generative Pre-trained Transformers (GPT), Software, Software Development, Computer Science, Deep Learning, Data Structures, Technical Writing, Computational Linguistics

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring