Shyam Mukherjee, Developer in Allahabad, Uttar Pradesh, India
Shyam is available for hire
Hire Shyam

Shyam Mukherjee

Verified Expert  in Engineering

Bio

Shyam is a senior data scientist with expertise in scalable AI/ML solutions, specializing in recommender systems, NLP, and real-time bidding. Skilled in Python, SQL, scikit-learn, and cloud platforms like AWS and Vertex AI, he drives customer engagement and optimizes costs through impactful projects. Shyam is recognized for excellence with a Master of Technology gold medal and top Kaggle rankings for delivering data-driven business outcomes efficiently.

Portfolio

Redaptive
Python, SQL, Recommendation Systems, Machine Learning, Deep Learning, AutoML...
Blinkit
Python, SQL, Recommendation Systems, Natural Language Processing (NLP)...
Tokopedia
Natural Language Processing (NLP), Python, SQL, Vertex AI, BigQuery, Ads...

Experience

  • Python - 6 years
  • Machine Learning - 6 years
  • Natural Language Processing (NLP) - 6 years
  • Deep Learning - 5 years
  • Recommendation Systems - 2 years
  • LlamaIndex - 1 year
  • Generative Artificial Intelligence (GenAI) - 1 year
  • LangChain - 1 year

Availability

Part-time

Preferred Environment

Slack, MacOS, Windows, Microsoft Teams

The most amazing...

...project I've led was the development of a recommendation system at Blinkit, boosting average order value by 10% and customer engagement for 500,000 daily users.

Work Experience

Senior Data Scientist

2023 - 2024
Redaptive
  • Spearheaded the development of a scalable generative AI system for automatic summarization of sustainability reports using advanced NLP.
  • Reduced manual work by 80% and saved 300 person-hours monthly through an interactive chat app by integrating retrieval-augmented generation and LLMs.
  • Built an ML pipeline to estimate project costs for Redaptive, leveraging ensemble models and AWS Lambda for real-time vendor cost validation. Identified discrepancies and savings in 10% of estimates, resulting in more competitive project bids.
  • Implemented an interactive Plotly Dash app to verify newly installed smart meters using meter timestream data.
Technologies: Python, SQL, Recommendation Systems, Machine Learning, Deep Learning, AutoML, AWS Lambda, Amazon SageMaker, Vertex AI, FastAPI, Generative Artificial Intelligence (GenAI), LangChain, LlamaIndex, Slack, Artificial Intelligence (AI), Retrieval-augmented Generation (RAG), Large Language Models (LLMs)

Data Science Lead

2022 - 2023
Blinkit
  • Introduced scalable product recommendations across the app to improve AOV by upselling and cross-selling products across 500,000 daily active users, resulting in a 10% AOV boost in the A/B (50/50) test.
  • Deployed a gradient boosting-based ETA model for delivery service, enhancing delivery time accuracy by factoring in geospatial polygons and distance, resulting in increased customer satisfaction and a notable 4% improvement in the conversion rate.
  • Spearheaded a nationwide periodic recommendations product launch.
Technologies: Python, SQL, Recommendation Systems, Natural Language Processing (NLP), Machine Learning, Deep Learning, Product Ownership, Stakeholder Management, Large Language Models (LLMs)

Senior Data Scientist

2021 - 2022
Tokopedia
  • Developed and implemented a scalable real-time bidding (RTB) system for sponsored ads, leveraging advanced ranking algorithms and cloud infrastructure to optimize ad placements, which resulted in a 2% increase in ad revenue.
  • Engineered a keyword targeting tool to enhance product listing visibility and optimize ad performance by improving search hit rates. This led to a notable increase in ad revenue, driving higher engagement and profitability for the platform.
  • Contributed to active learning pipeline development for NLP tasks.
Technologies: Natural Language Processing (NLP), Python, SQL, Vertex AI, BigQuery, Ads, Ranking and Recommendations, Machine Learning, Deep Learning, AutoML

Experience

Effortless SQL

Benchmarked multiple text-to-SQL models to evaluate their performance across various datasets, aiming to identify the most effective solutions for accurate query generation.

I focused on refining prompts to enhance the precision and optimization of SQL queries, leading to improved data retrieval. I also developed reusable query templates by abstracting and substituting literals, which streamlined the process and significantly reduced the number of API calls, thus lowering associated LLM usage costs. The project also included fine-tuning LLMs on specific domain data to improve their contextual understanding and query accuracy, resulting in a more efficient and intelligent database querying system.

Domain-specific Crawler

Developed a specialized web crawler for domain-specific content retrieval by integrating Apache Nutch, Solr, and Lucene. The system effectively identified the domains of the crawled documents, ensuring targeted and relevant data collection. It utilized Wikipedia category graphs to accurately tag document topics, enhancing the precision of content classification and retrieval. The project involved configuring and optimizing Apache Nutch for scalable crawling, indexing the content with Solr for efficient search and retrieval, and leveraging Lucene's powerful indexing and search capabilities. This comprehensive approach facilitated the extraction of high-quality, domain-specific information for improved data analysis and research.

Skills

Tools

Slack, Microsoft Teams, AutoML, Amazon SageMaker, BigQuery

Frameworks

LlamaIndex

Languages

Python, SQL

Platforms

MacOS, Windows, AWS Lambda, Vertex AI

Other

Recommendation Systems, Machine Learning, Generative Artificial Intelligence (GenAI), LangChain, Natural Language Processing (NLP), Retrieval-augmented Generation (RAG), Deep Learning, FastAPI, Artificial Intelligence (AI), Large Language Models (LLMs), Product Ownership, Stakeholder Management, Ads, Ranking and Recommendations

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring