Vishnu Chevli, Developer in Surat, Gujarat, India
Vishnu is available for hire
Hire Vishnu

Vishnu Chevli

Verified Expert  in Engineering

Data Science Developer

Location
Surat, Gujarat, India
Toptal Member Since
February 4, 2022

Vishnu, a data scientist at Reddit with 14+ years of experience, has specialized in data science and machine learning for 11+ years. He excels in providing data-driven solutions to diverse stakeholders and has a rich professional background, having worked with renowned companies such as KPIT, General Mills, Cognizant, and Wipro. Vishnu seeks to contribute his expertise to projects encompassing data science, machine learning, analytics, data mining, operational optimization, and logistics design.

Portfolio

Reddit, Inc.
SQL, Data Science, ETL, Data Analysis, Python, Data Visualization, GitHub...
Venus Jewel
Python, Machine Learning, R, RStudio Shiny, Tableau, Predictive Modeling...
KPIT
Python, R, Data Science, Data Analytics, Tableau, Machine Learning...

Experience

Availability

Part-time

Preferred Environment

Python, R, Tableau, RStudio, Tableau Server, SQL, Jupyter Notebook, Microsoft Excel, Microsoft Power BI

The most amazing...

...projects I've developed are machine learning models which have automated tasks of skilled labor, saving 14 full-time resources and four days of processing time.

Work Experience

Data Scientist

2022 - PRESENT
Reddit, Inc.
  • Developed dynamic Mode Analytics dashboards for engineering and product teams, delivering real-time insights, expediting decision-making, and enhancing operational efficiency.
  • Analyzed user behavior patterns, deriving actionable insights for product enhancement. Identified key improvement areas and new development opportunities through a detailed examination of user usage patterns.
  • Led data-centric A/B testing, ensuring statistical rigor and robust methodologies. Collaborated cross-functionally to prepare, support, and approve tests, influencing data-driven product decisions.
  • Investigated and provided insights on incidents, ensuring prompt resolution. Applied advanced analytics to detect pattern shifts, contributing to proactive incident prevention strategies.
  • Designed and implemented fact tables on BigQuery, optimizing storage for analytical use. Orchestrated Apache Airflow DAGs, automating workflows for enhanced efficiency and reliability in data processing.
  • Led optimization for complex queries, achieving substantial improvements in space and time performance. Implemented strategies for enhanced database efficiency and reduced latency.
Technologies: SQL, Data Science, ETL, Data Analysis, Python, Data Visualization, GitHub, Apache Airflow, A/B Testing, Analytics, BigQuery, Google BigQuery, Mode Analytics, Machine Learning, PostgreSQL, Time Series, Big Data, CSV File Processing, Cohort Analysis, Microsoft Excel, SQL, Plotly, Data Analytics

Principal Data Scientist

2017 - 2022
Venus Jewel
  • Constructed predictive machine learning models in Python using scikit-learn for automating rough and polished diamond grading.
  • Leveraged data modeling and statistical analysis to identify data-driven solutions within the organization.
  • Created reports, dashboards, and decision engines using RStudio Shiny and Tableau to cater to various stakeholders, including an automated MS suggestion system for grading.
  • Developed and deployed analytical models utilizing statistical and machine-learning techniques in R and Python.
  • Provided mathematical modeling expertise for diamond pricing, considering factors such as demand and supply.
  • Designed a customized machine learning class architecture that combines algorithms like XGBoost, gradient boosting, random forests, and support vector machines, optimizing model predictability.
Technologies: Python, Machine Learning, R, RStudio Shiny, Tableau, Predictive Modeling, Pandas, NumPy, SQL, Scikit-learn, Scikit-image, ETL, ETL Tools, Data Analysis, Data Analytics, Complex Data Analysis, Linear Programming, Computer Vision, PyTorch, Analytics, Reporting, Leadership, Dashboards, Team Leadership, Data Visualization, Supervised Learning, Supervised Machine Learning, Feature Analysis, Git, GitLab, Data Modeling, Jupyter, Deep Neural Networks, Image Recognition, Tableau Desktop Pro, CSS, XGBoost, Support Vector Regression, Support Vector Machines (SVM), Random Forests, Random Forest Regression, Gradient Boosting, Gradient Boosted Trees, CatBoost, K-means Clustering, K-nearest Neighbors (KNN), Clustering, Clustering Algorithms, Dimensionality Reduction, Matplotlib, Keras, TensorFlow, Computer Vision Algorithms, Power Pivot, Microsoft Power BI, Pivot Tables, Power Query, Oracle Database, Time Series, Microsoft SQL Server, CSV, Data Reporting, Artificial Intelligence (AI), Data Engineering, Data-informed Recommendations, Real-time Data, Statistical Analysis, Data Cleaning, Data Management, Decision Modeling, Exploratory Data Analysis, ChatGPT, Excel 365, Programming, Integration, OpenCV, Causal Inference, Unstructured Data Analysis, Data Gathering, Office 365, B2B, GitHub, PostgreSQL, CSV File Processing, Data Matching, Technical Leadership, Unsupervised Learning, Microsoft Excel, SQL, Spreadsheets, Plotly

Data Scientist

2013 - 2017
KPIT
  • Developed a predictive model in Python using scikit-learn based on supervised and semi-supervised learning for anomaly detection and prediction in engine failures.
  • Provided city traffic and community planning analysis based on cell phone data in Python using NLTK and clustering methods.
  • Designed a business model for a hybrid and electric vehicle charging station based on telematics data using R and R Shiny.
  • Optimized territory unit planning for a smart meter data collection based on geographical data using distance and density-based clustering and optimization techniques.
  • Performed vehicle and crew scheduling for a state transportation corporation using resource optimization techniques in Python.
  • Analyzed vehicle driving patterns and prepared a driver scorecard using R and R Shiny telematics data.
  • Delivered anomaly detection in utility (gas and electricity) meter data using a rule-based engine and semi-supervised modeling in Python.
  • Handled data churning, scraping, and processing for a client in the engineering domain. Processed over 1TB of data to generate business insights using complex statistical analysis.
Technologies: Python, R, Data Science, Data Analytics, Tableau, Machine Learning, Predictive Modeling, Optimization, Pandas, NumPy, Scikit-learn, Scikit-image, SQL, MySQL, MongoDB, Data Engineering, Data Pipelines, Linear Programming, Mixed-integer Linear Programming, Linear Optimization, Generalized Linear Model (GLM), Gusek, Team Leadership, Dashboards, Data Visualization, Natural Language Toolkit (NLTK), Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Supervised Learning, Supervised Machine Learning, Clustering, Git, GitLab, Geospatial Data, Geospatial Analytics, Data Modeling, Business Analysis, XML, Operations Research, XGBoost, Support Vector Regression, Support Vector Machines (SVM), Random Forests, Random Forest Regression, K-means Clustering, K-nearest Neighbors (KNN), Gradient Boosting, Gradient Boosted Trees, Matplotlib, Dimensionality Reduction, Clustering Algorithms, Statistical Forecasting, Statistical Modeling, Multivariate Statistical Modeling, Forecasting, P&L Forecasting, Trend Forecasting, APIs, Time Series Analysis, Tableau Server, Qlik Sense, QlikView, CSV, Data Manipulation, Data Reporting, Data Cleaning, Data Management, Scheduling, Decision Modeling, Exploratory Data Analysis, Excel 365, Causal Inference, Unstructured Data Analysis, Large Data Sets, Spreadsheets, Data Gathering, Office 365, B2B, GitHub, Time Series, CSV File Processing, Algorithms, Technical Leadership, Unsupervised Learning, Microsoft Excel, SQL, Spreadsheets, Plotly

Programmer Analyst

2010 - 2011
Cognizant
  • Worked as a module lead for a 2-member team to develop middleware technologies. Acted as a subject matter expert (SME) for end-to-end middleware technology applications. Automated the security and traffic system in the security domain.
  • Contributed to the traffic control software for Ikusi (Spain). Designed the integration driver module for the Remote Control Unit—a device to control end devices (traffic activity tracker, variable message panels, weather stations, cameras, etc.).
  • Developed the driver integration module for the security system in VC++. Designed the integration driver module to control the CCTV server.
Technologies: XML, Programming

Software Engineer

2008 - 2010
Wipro
  • Worked as an application designer collaborating with client business heads to gauge customer needs. Also served as the single point of contact for handling client escalations and support queries.
  • Handled the enhancement of a simulator for an electronic chip-making machine in the embedded domain using MFC (Microsoft Foundation Classes) as technology.
  • Contributed to the development of a simulator on C#.NET to integrate the back end, which was developed for a chip-making machine. Implemented communication functionalities between the front and back end based on XML protocols.
Technologies: Programming

Enhanced Search Functionality with an LLM-based ML Algorithm for Precise Product Matching

1. LLM and Language Encoders:
• Utilized context-aware sentence encoding for dynamic dataset adaptability.
• Implemented cosine similarity for precise matching.

Benefits: Improved adaptability, semantic understanding, and matching efficiency.

2. Internal UI for Model Validation:
• Developed a Streamlit-based web app for input and display of matched products.
• Integrated feedback mechanism for continuous model improvement.

Benefits: Transparency, user engagement, and agile model enhancement.

3. Cloud Implementation of REST APIs:
• Designed Flask APIs for secure login and on-demand queries.
• Enabled online accessibility and flexible system integration.

Benefits: Online access, secure authentication, and flexibility.

4. Vector Database Implementation:
• Transitioned to vector-based storage in PostgreSQL for memory efficiency and scalability.
• Utilized database facilities for streamlined updates and optimized query performance.

Benefits: Memory optimization, streamlined updates, and enhanced scalability.

Data-driven Insights: Tableau Visualization and User-friendly Business Survey Results

I prepared a ready-to-use visualization template using Tableau and presented user-friendly business survey results. The template was designed to allow users to easily explore and analyze the survey data, leveraging the powerful features of Tableau for data visualization.

In addition to creating the visualization template, I managed a detailed document outlining the process of regenerating the dashboard. This document captures all the steps involved in creating and updating the dashboard, including data sources, data transformations, and any calculations or visual elements used. The documentation serves as a comprehensive guide for future reference, ensuring that the dashboard can be easily replicated or modified as needed.

By combining my expertise in Tableau with the business survey results, I delivered a robust and user-friendly solution for visualizing and understanding the data. The detailed documentation I provided ensures that the dashboard can be maintained and improved over time, enabling stakeholders to make informed decisions based on the survey findings.

Transforming Surveys: Leveraging Telematics for Enhanced Planning and Visualization

The project aimed to replace the humongous survey task of 10,000 family members and replace it with sample data 100x times bigger from telematics and mobile devices.

PROJECT DELIVERABLES
• Identifying points of interest like residential, commercial, and industrial spots.
• Planning of new roads and traffic signals for the given territory.
• Planning of means of transportation and stations to serve population needs.
• Planning of commercial points like shopping complexes, billboards, and service centers.
• Optimizing current transportation services crew and vehicles.
• Visualizing and creating dashboards on Shiny and Tableau.

Rain Gauge Total Prediction: Python Model Soars to 3rd Place on Kaggle

PROBLEM
The challenge was to generate a probabilistic distribution of the hourly rain gauge total using the provided polarimetric data for various variables over a span of 15 days.

SOLUTION
To address this, a Python-based predictive model was developed. This model demonstrated exceptional accuracy and achieved the 3rd position on Kaggle, showcasing its effectiveness in solving the problem at hand.

Microsoft Malware Classification Challenge | Malware Classification: Accurate Family Detection

https://github.com/vrajs5/Microsoft-Malware-Classification-Challenge
PROBLEM
The tand was to classify a collection of known malware files encompassing a diverse range of nine distinct families. The volume of uncompressed data amounted to a substantial 0.5TB (500GB).

SOLUTION
To address this challenge, an advanced approach was employed. Byte-wise frequency counts were meticulously calculated from the malware files, capturing the occurrences of each byte across the dataset. This process involved analyzing the binary representation of the files and deriving statistical information from the byte-level patterns.

Based on these byte-wise frequency counts, a sophisticated model was developed using advanced machine learning techniques. The model utilized this comprehensive statistical representation to accurately classify and differentiate the malware files into their respective families. The approach showcased the ability to effectively analyze and classify large-scale malware datasets, contributing to the broader domain of cybersecurity.

Telematic Fingerprinting: Accurately Identifying Drivers Through Predictive Modeling

PROBLEM
The challenge was to develop a telematic fingerprint that could accurately identify instances when a specific driver conducted a trip.

SOLUTION
To address this problem, a predictive model was created, leveraging advanced techniques in machine learning and data analysis. The model was designed to establish a unique signature for each driver based on their individual driving behavior. By analyzing various telematic data such as speed, acceleration, braking patterns, and other driving characteristics, the model could effectively differentiate and identify the driving style of a particular driver. This solution enabled the precise identification of driver-specific trips, contributing to enhanced monitoring and analysis in the field of telematics.

Efficient Milk Run Routing: Excel-based Tool for Cost Reduction and Delivery Optimization

PROBLEM
The challenge involved designing a generic Excel-based tool capable of replicating the milk run routing, a transportation model that utilizes mixed-integer linear programming.

SOLUTION
To address this problem, a comprehensive solution was developed. The tool incorporated advanced algorithms and optimization techniques to optimize the routing process and achieve the following deliverables:

1. Reduction in transportation costs by determining the most efficient routes for delivering goods.

2. Improvement in promise delivery by enhancing the predictability and reliability of goods reaching customers within specified timeframes.

3. Decrease in truckload transportation by optimizing the allocation and utilization of available resources, resulting in more efficient transportation operations.

By leveraging this Excel-based tool, businesses could benefit from cost savings, improved delivery performance, and optimized resource allocation in their milk run routing processes.

Driving Efficiency: Optimizing Routes, Vehicles, and Crew for Fleet Management Success

SOLUTION
I led the development and implementation of diverse optimization techniques for a fleet management organization. Solutions delivered included:

1. Route Optimization: Employed advanced techniques to optimize routes for scheduled trips, ensuring efficient and cost-effective transportation services for corporate clients.

2. Vehicle Optimization: Utilized sophisticated models to optimize vehicle allocation for inter-state and inter-city transportation, maximizing resource utilization and minimizing costs.

3. Intracity Shuttle Optimization: Developed strategies to optimize the deployment of vehicles for intracity shuttle services, improving service reliability and reducing operational expenses.

4. Crew Optimization: Designed an advanced solution for crew allocation, considering labor laws and operational demands, resulting in improved productivity and compliance.

These solutions revolutionized the organization's operations, enhancing route planning, vehicle allocation, and crew management. The outcomes included improved efficiency, cost reduction, and better compliance with labor regulations.

Virtual Stock Trader: A Dynamic Stock Trading Game with Interactive Panels and Visualizations

I conceived and developed a virtual stock trading game with the following features:

• Admin panel
• Broker panel
• Player dashboards
• Technological implementations

To bring this game to life, I employed HTML, PHP, and CSS to craft the user interface. I designed the database using MySQL and employed PHP-Flash plugins to visualize various values.

Insightful Data Pipeline: Python-based Scraping and Summarization with XML Parsing and CSS Output

PROBLEM
There was a need to prepare a data pipeline for scraping business-relevant insights from various sources.

SOLUTION
I developed a Python-based module that efficiently parses chunks of XML files and extracts the necessary information, which is then summarized and organized in CSS files.

TECHNOLOGIES
The solution involved the use of Python for programming, XML parsing techniques to extract data, CSS writing mode for structuring the output, as well as libraries such as Pandas and NumPy for statistical analysis and data scraping.

Dynamic Financial and Operational Analysis: Hospital Dashboards for Informed Decision-making

I worked on the hospital's financial analysis dashboards:
• Developed dynamic reports at the cost-profit level for various stakeholders.
• Analyzed segments and departments, including OPD, operatives, pharmacy, and pathology.
• Conducted specialty analysis, including cardiology, orthopedics, gynecology, and more.
• Examined business segments, such as cashless, cash paid, and corporate.

I focused on the hospital's operational dashboards:
• Monitored segment KPIs like occupancy rate, expenses, and length of stay.
• Conducted material consumption analysis for IPD and pathology.
• Created staff schedules and presence dashboards.
• Performed customer feedback and drill-down analysis.

DATA MANAGEMENT
• Utilized secured data connections over SQL and NoSQL databases to access the required data for the analysis.

VISUALIZATION TOOLS USAGE
• Leveraged Tableau and QlikView's powerful features to create interactive and visually appealing dashboards for financial analysis, operational metrics, and stakeholder reports.
• These visualization tools added an extra layer of insight and interactivity to the project, enabling stakeholders to make data-driven decisions effectively.

Revolutionizing Healthcare Staffing Predictions: Python, Snowflake-Snowpark, and Streamlit

I utilized Python and Snowflake-Snowpark to create predictive models addressing staffing challenges in the healthcare industry. These models were designed to forecast staffing needs accurately. Additionally, I developed several user-defined functions (UDF) and user-defined table functions (UDTF) that serve as prediction generators for end users. To enhance usability, I also created a Streamlit-based application that leverages the UDTF capabilities and presents the results of the predictive models in a language that is easily understandable by business professionals.

Data Analyst Work

I utilized a combination of Excel and Python for project execution, covering the following steps:

1. Normalized data based on business requirements.
2. Conducted data sanity checks.
3. Prepared a financial decline curve.
4. Documented intermediate steps for client debugging.

Languages

Python, R, SQL, Python 3, XML, SQL DDL, SQL DML, Data Manipulation Language (DML), CSS, Markdown, Snowflake

Libraries/APIs

XGBoost, Scikit-learn, Pandas, NumPy, PyTorch, Natural Language Toolkit (NLTK), Matplotlib, Keras, OpenCV, SpaCy, CatBoost, TensorFlow, REST APIs

Tools

Tableau, Microsoft Excel, Microsoft Power BI, Tableau Desktop Pro, Scikit-image, Excel 2013, Jupyter, Qlik Sense, Power Pivot, Spreadsheets, GitHub, BigQuery, Plotly, Git, GitLab, Power Query, Apache Airflow

Paradigms

Data Science, ETL, Linear Programming, Data-informed Visual Design, B2B

Storage

PostgreSQL, MySQL, Data Pipelines, Microsoft SQL Server, Data Definition Languages (DDL), NoSQL, MongoDB, Redshift

Other

Optimization, Statistics, Machine Learning, Predictive Modeling, Data Analytics, Feature Analysis, Data Analysis, Mixed-integer Linear Programming, Linear Optimization, Computer Vision, Analytics, Reporting, Dashboards, Data Visualization, Supervised Learning, Supervised Machine Learning, Natural Language Processing (NLP), Data Modeling, Operations Research, Time Series, Statistical Forecasting, Statistical Modeling, Random Forest Regression, K-means Clustering, Clustering Algorithms, Random Forests, Multivariate Statistical Modeling, CSV, A/B Testing, GPT, Generative Pre-trained Transformers (GPT), Statistical Analysis, Data Cleaning, Regression, Classification Algorithms, Data Scientist, Classification, Predictive Analytics, Data Transformation, Decision Tree Classification, Data-driven Dashboards, Exploratory Data Analysis, Causal Inference, Unstructured Data Analysis, CSV File Processing, Unsupervised Learning, Microsoft Excel, SQL, ETL Tools, Data Engineering, Complex Data Analysis, Generalized Linear Model (GLM), Gusek, Leadership, Team Leadership, Clustering, Statistical Data Analysis, Deep Learning, Neural Networks, Forecasting, Trend Forecasting, Time Series Analysis, Support Vector Machines (SVM), Gradient Boosting, Dimensionality Reduction, Support Vector Regression, Gradient Boosted Trees, K-nearest Neighbors (KNN), Computer Vision Algorithms, PuLP, Integer Programming, Pivot Tables, Artificial Intelligence (AI), Data Manipulation, Data Reporting, Real-time Data, Data Management, Decision Trees, Text Classification, Mode Analytics, Google BigQuery, Scheduling, Decision Modeling, Excel 365, Programming, Integration, Large Data Sets, Data Gathering, Office 365, Logistics, Big Data, Algorithms, Technical Leadership, Spreadsheets, Tableau Server, Geospatial Data, Geospatial Analytics, Business Analysis, Convolutional Neural Networks (CNN), Text Analytics, Semantic Analysis, Topic Modeling, Pyomo, Deep Neural Networks, Image Recognition, APIs, P&L Forecasting, Web Scraping, Data Scraping, Amazon Redshift, Recommendation Systems, Data-informed Recommendations, Digital Twin, User-defined Functions (UDF), DAX, ChatGPT, Supply Chain Optimization, Inventory Management, Demand Planning, Data Matching, Financial Modeling, Large Language Models (LLMs), Embeddings from Language Models (ELMo), Vector Data, pgvector, Cohort Analysis

Frameworks

RStudio Shiny, Streamlit, Flask

Platforms

RStudio, Jupyter Notebook, QlikView, Oracle Database, Amazon Web Services (AWS), Amazon EC2

2011 - 2013

Postgraduate Diploma in Industrial Engineering in Operations and Supply Chain Management

Indian Institute of Management Mumbai (formerly known as NITIE Mumbai) - Mumbai, Maharashtra, India

2004 - 2008

Bachelor of Technology Degree in Computer Engineering

Sardar Vallabhbhai National Institute of Technology - Surat, Gujarat, India

MARCH 2024 - PRESENT

Python for Time Series Data Analysis

Udemy

JANUARY 2024 - PRESENT

The Complete Google BiqQuery Masterclass: Beginner to Expert

Udemy

MAY 2023 - PRESENT

Power BI Masterclass from Scratch

Udemy

DECEMBER 2022 - PRESENT

Digital Twins: Enhancing Model-based Design with AR, VR and MR

University of Oxford

JULY 2022 - PRESENT

Tableau 20 Advanced Training | Master Tableau in Data Science

Udemy

JUNE 2022 - PRESENT

Complete Course on Product A/B Testing

Udemy

JUNE 2022 - PRESENT

Tableau 2020 A-Z | Hands-on Tableau Training for Data Science

Udemy

JUNE 2022 - PRESENT

AWS Redshift | A Comprehensive Guide

Udemy

MARCH 2022 - PRESENT

Optimization with Python | Solve Operations Research Problems

Udemy

MARCH 2022 - PRESENT

Natural Language Processing (NLP) with Python

Udemy

MARCH 2022 - PRESENT

Pytorch for Deep Learning and Computer Vision

Udemy

DECEMBER 2014 - PRESENT

Data Analysis and Statistical Inference

Duke University via Coursera.org

DECEMBER 2014 - PRESENT

Data Science Specialization

John Hopkins University via Coursera.org

DECEMBER 2012 - DECEMBER 2015

Six Sigma Green Belt

RABASQ

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring