Nilan Saha, NLP and Machine Learning Developer in Vancouver, BC, Canada
Nilan Saha

NLP and Machine Learning Developer in Vancouver, BC, Canada

Member since February 9, 2022
Nilan is a natural language processing and machine learning expert with a bachelor's degree in computer science and a master's degree in data science specializing in computational linguistics. Nilan has vast experience building large-scale recommendation systems, personalization technology, and NLP algorithms. A Kaggle expert, he has well-cited publications in this domain.
Nilan is now available for hire

Portfolio

  • AAQUA
    Scikit-learn, Python, Go, Elasticsearch, AWS DynamoDB, Neo4j...
  • Packt Publishing
    Technical Writing, Natural Language Processing (NLP), PyTorch
  • Knowt
    Python, SpaCy, PyTorch, Deep Learning

Experience

Location

Vancouver, BC, Canada

Availability

Part-time

Preferred Environment

MacOS, Slack, Zoom, Visual Studio Code, Jupyter Notebook

The most amazing...

...thing I've ever built is Convex, an NLP library for part-of-speech (POS) tagging using character and word-level embedding neural nets.

Employment

  • Machine Learning Engineer

    2020 - PRESENT
    AAQUA
    • Built and deployed multiple personalization ML pipelines to power the main user feed for the app. The pipelines were built using Python, Kafka, AWS S3, scikit-learn, and NLTK and deployed to AWS.
    • Designed, architected, and built an end-to-end search solution using Python and Elasticsearch. The analytics pipeline that drives data-driven decisions and better search results' relevance ranking was built using Kafka, Amazon Pinpoint, and AWS S3.
    • Designed, architected, and built an end-to-end autocomplete solution using Python and Elasticsearch. The analytics pipeline that drives data-driven decisions and relevance ranking of search results was built using Kafka, Amazon Pinpoint, and AWS S3.
    Technologies: Scikit-learn, Python, Go, Elasticsearch, AWS DynamoDB, Neo4j, Natural Language Processing (NLP)
  • Technical Reviewer

    2020 - 2020
    Packt Publishing
    • Collaborated with the team to test out all code samples and make sure it was easy for users to replicate the projects from the book. The code consisted of neural networks built in PyTorch and various other pre-processing utilities in NLTK.
    • Worked with the editing team to review all the book chapters and make necessary corrections, technical and otherwise.
    • Suggested various improvements in terms of the book content.
    Technologies: Technical Writing, Natural Language Processing (NLP), PyTorch
  • Machine Learning Engineer

    2020 - 2020
    Knowt
    • Developed a deep neural net with multiple heads using ELMo embeddings to identify phrases that could be used to generate quizzes and achieve other downstream tasks. The model was built using Python, PyTorch, and Flair and deployed to AWS.
    • Developed a pipeline using spaCy and Python to extract triplets from textual data, build relations using them, and represent it in the form of a knowledge graph.
    • Led initiatives to build a dataset to train models based on implicit user feedback.
    Technologies: Python, SpaCy, PyTorch, Deep Learning
  • Data Scientist

    2018 - 2018
    Cookt
    • Developed a heuristic algorithm using named-entity recognition (NER), spaCy, and the natural language toolkit (NLTK) to identify cooking ingredients from recipe instruction data.
    • Created an algorithm to use identified ingredients to generate optimal cooking instructions to reduce friction for end-users.
    • Bundled the model into an API and worked with the in-house tech team to integrate it into the entire stack.
    Technologies: NLTK, SpaCy, Python, AWS

Experience

  • Convex
    https://github.com/nilansaha/convex

    Convex is a lightweight NLP library that supports POS tagging using deep learning models coded in PyTorch. It uses both custom character-level and word-level embeddings to power the neural net and provide superior performance.

  • Multi-label Classifier for Toxic Comments

    I built this multi-label classification model using the BERT from the Hugging Face transformers library, PyTorch, and binary cross entropy loss classification for classifying toxic comments and achieved an F1-score of 90.5%.

  • Categorical Embedding Encoder
    https://github.com/nilansaha/CategoricalEmbeddingEncoder

    I developed a framework using Python and Keras. It can be used to train and represent categorical variables using vector embeddings for tabular datasets, which improves classical machine learning models.

Skills

  • Languages

    Python, Go, GraphQL
  • Libraries/APIs

    SpaCy, PyTorch, Scikit-learn, NLTK, Keras
  • Other

    Natural Language Processing (NLP), Machine Learning, Software, Software Development, Computer Science, Deep Learning, Data Structures, Technical Writing, Computational Linguistics, AWS
  • Frameworks

    Flask
  • Paradigms

    Data Science
  • Platforms

    Software Design Patterns, Apache Kafka, MacOS, Visual Studio Code, Jupyter Notebook
  • Tools

    Slack, Zoom
  • Storage

    Databases, Elasticsearch, AWS DynamoDB, Neo4j

Education

  • Master's Degree in Data Science and Computational Linguistics
    2019 - 2020
    University of British Columbia - Vancouver, Canada
  • Bachelor's Degree in Computer Science
    2015 - 2019
    Institute of Engineering and Management - Kolkata, India

To view more profiles

Join Toptal
Share it with others