Kalpit Desai, Ph.D., Sensor Data Pattern Recognition Developer in Bengaluru, Karnataka, India
Kalpit Desai, Ph.D.

Sensor Data Pattern Recognition Developer in Bengaluru, Karnataka, India

Member since October 12, 2018
Kalpit is a developer with a Ph.D. and over 12 years of experience—in machine learning and AI—working with both large corporations and startups. He’s a practiced hand with Python, R, and MATLAB and is known to devise the best data strategies to mine business value with deep learning technologies. He also specializes in computer vision, time-series analytics, dynamic system modeling, text mining, and industrial process optimization.
Kalpit is now available for hire




Bengaluru, Karnataka, India



Preferred Environment

Linux, PyCharm, TensorFlow, Keras, PyTorch

The most amazing...

...algorithm I have built applied computer vision to estimate the proportion of liquid, semisolid, and gas in a mixture flowing through a pipe in real time.


  • Chief Data Scientist

    2018 - PRESENT
    Datakalp, LLP
    • Built a news article classification and recommendation algorithm based on deep neural network (DNN) and vector-space models.
    • Designed and implemented a machine learning algorithm that takes a document as the input, and retrieves duplicate documents as well as related (but not duplicate) documents from a given corpus.
    • Developed image segmentation algorithms using Deep U-Net and ResNet blocks that identify areas of mineral deposits from a given seismographic radar image.
    • Designed the deep learning architecture for optimizing product yield without any hardware change for a manufacturing unit.
    • Wrote a masked R-CNN based machine learning algorithm for analyzing and detecting various attributes of people in a video.
    • Built a deep learning-based algorithm to extract information from scanned invoices. No templates were necessary. Without using proprietary third-party tools. Using Python, TensorFlow, and OpenCV.
    • Developed an algorithm that tells if one should see a doctor based on a picture of one's skin taken from a commodity mobile phone camera using Python, TensorFlow.
    Technologies: Deep Learning, Signal Processing, Python, R, PyTorch, TensorFlow, Keras, C, C++
  • Director of Data Science

    2016 - 2018
    Tally Analytics, Pvt Ltd | ClustrData
    • Architected a scalable data-mining system for building a product catalog for small businesses using text mning, named-entity-recognition and deep-graph reasoning.
    • Planned and coordinated the work of a cross-functional team building a unique suite of machine-learning based products for small businesses.
    • Developed algorithms to detect and auto-suggest corrections of typographical mistakes in manually entered data for various processes (billing, tax return filing, and so on).
    • Guided the development of a deep-learning model to automatically mine aspect-based sentiment from reviews. For example. the aspects could be the ambiance, food taste, price, and so on in restaurant reviews; or it could be the story, direction, acting, and so on for a movie.
    • Prototyped a text-mining system to automatically categorize the support requests coming from a vast diversity of customers and implement route them to appropriate support personnel or to send a canned response.
    Technologies: Python, Docker, R
  • Manager and Lead—Data Science

    2014 - 2016
    Bidgely Technologies, Pvt Ltd
    • Guided and supervised the development of the company's proprietary technology of disaggregation. Disaggregation is about taking whole-house energy consumption trace (with granularity ranging from one second to one hour), and without appliance level sub-meters, determining which appliance is consuming how much energy in a given home.
    • Invented algorithms for detecting events in a home-based on whole-house energy consumption.
    • Guided the development of algorithms to estimate the solar-power generation for a given home based on weather data at the geographic location and satellite image of the particular home.
    Technologies: MATLAB, Python, R, Signal Processing, Pattern Recognition, Machine Learning
  • Lead Scientist

    2010 - 2014
    GE Global Research
    • Invented a novel algorithm for vocabulary compression that increased the efficiency of the downstream processing of the text corpora by machine learning algorithms; also patented.
    • Led the development of a novel machine learning algorithm that makes alarms and beeps in an intensive care unit (ICU) more relevant by learning from past data; partially published and patented.
    • Built a suite of algorithms for dynamically optimizing the control logic of a wind-turbine farm so that the power generated is maximized, the wear and tear are minimized, and the noise produced is well within the regulated upper limit; partially patented.
    • Constructed algorithm prototypes to identify various objects in a CT scan and measure their shape and size. It was deployed in deep-learning based image segmentation techniques.
    • Developed scalable algorithms for summarizing a large corpus of unstructured text documents with heavy domain-specific jargon. For example, one such corpus was a set of around a million email chains capturing emails between the customer, field engineer, subject matter experts, and so on during the maintenance, service, and repair events on a big steam turbine.
    Technologies: R, Python, MATLAB, C, C++, Statistical Signal Processing, Machine Learning, Pattern Recognition
  • Research Scientist

    2007 - 2010
    Zargis Medical
    • Built the company's proprietary algorithms to detect heart sounds from a stethoscope (audio waveform) and aid doctors in their diagnoses.
    • Built fourier transform and wavelet transform based algorithms for preprocessing the acoustic waveform from a stethoscope and transforming the signal into a representation that is suitable for machine learning.
    • Built machine learning algorithms to differentiate the subtle third heart sound (S3) from other murmur related sounds.
    • Built deep time-delay neural network toolkit for extracting events from multiple time-series data.
    • Translated core algorithms from Matlab to production-grade C++.
    Technologies: MATLAB, Python, Signal Processing, Pattern Recognition, Machine Learning
  • Graduate Research Assistant

    2002 - 2007
    Computer Science Department of University of North Carolina at Chapel Hill
    • Assisted in the development of a new microscope for use by physicists. Used linear algebra, signal processing, mathematical modeling, and control theory to develop a novel high-resolution laser-interferometry-based tracking system.
    • Developed a mathematical model of a 3D magnetic force exertion system to determine the directions of strong forces and validated the model using actual data.
    Technologies: MATLAB, C, C++


  • Toolkit for Building Time-delayed Neural Networks (Octave, MATLAB) (Development)

    I wrote a complete, self-contained toolkit for constructing, configuring, training and applying time-delayed neural networks on time-series data.

    Over the years, the toolkit has been a powerful resource for event detection from a variety of temporal signals—phonocardiograph, seismograph, power consumption, patient-monitoring devices in an ICU, and so on. I recently released the toolkit to a public domain on GitHub.

  • Workshop on Topic Modeling (Other amazing things)

    I conducted a full-day workshop on topic modeling from an arbitrary corpus of text with the attendees having five to ten years of experience in machine learning. The workshop covered both the theory (which you see at the link) in sufficient detail including the mathematical formulations as well as practice by applying the techniques learned for two distinct corpora.

  • Optimal Control of Wind Turbine Farms (Development)

    I developed novel algorithms that optimize the control logic of various wind turbines operating in a wind turbine farm. The approach included building a mathematical model of the dynamics of wind due to wake effects and also modeling the behavior of each turbine by predicting power that would be generated by the turbine as a function of the wind speed, wind direction, air temperature, air humidity, blade angles, rotor speed, yaw, pitch, roll, and so on. The algorithms were also patented.

  • Event Detection in a Home (Development)

    I built novel algorithms for detecting events in a home from energy profile data and energy waveforms for the home. The algorithms leverage pattern recognition, statistics, and machine learning. The algorithms are also patented.

  • Hemodynamic Impact-based Prioritization of Ventricular Tachycardia Alarms (Development)

    Ventricular tachycardia (V-tach) is a very serious condition that occurs when the ventricles are driven at high rates. However, almost half of the V-tach alarms declared through the processing of patterns observed in electrocardiography are not clinically actionable. The focus of this project was to provide guidance on determining whether a technically-correct V-tach alarm is clinically-actionable by determining its “hemodynamic impact.” A predictive, supervised machine-learning approach based on conditional inference trees was employed to determine the hemodynamic impact of a V-tach alarm.

  • Patented the Algorithm for Hemodynamic Impact-based Prioritization of Ventricular Tachycardia Alarms (Development)

    The algorithm is also patented and here's the link.

  • The Design and Architecture of a Scalable Machine Learning Pipeline to Build a Product Catalog (Development)

    The goal of this project was to build a system that can create and update a dynamic catalog of products based on information coming from a diverse variety of sources and in multiple languages (e.g., product masters coming from distributors, data from small business transactions, product reviews, and so on). The system heavily leveraged machine learning algorithms for deciphering a noisy textual mention of a product and anchoring it onto a known entity in the catalog. The system also leveraged a good dose of data engineering to ensure scalability and robustness.

  • Award-winning Algorithm for the Wikipedia Challenge (Development)

    This was a real-world prediction problem floated as IEEE International Conference for Data Mining (ICDM) 2011 contest. The goal was to predict how many edits a Wikipedia editor will make in the next six months, based on past data.

    This ensemble model won us the honorable mention prize in the contest. The code is written in R and Python.

    Here is the related paper explaining the mathematics behind the approach.

  • Related Paper to the Algorithm for the Wikipedia Challenge (Development)

    Here is the related paper explaining the mathematics behind the approach for my algorithm for the Wikipedia Challenge.

  • Video-based Defect Detection on Automobile Silencers (Development)

    The client was an automobile silencer manufacturer. I built a novel algorithm to detect if a silencer that was just manufactured had any defect, based on multi-camera video feed looking at the silencer. The algorithm was deployed on the Nvidia Jetson Nano Edge Computing platform.

    Technologies: Computer Vision, Object Detection, Nvidia Jetson, TensorRT, Python, C++


  • Tools

  • Paradigms

    Data Science
  • Other

    Time Series Analysis, Pattern Recognition, Text Mining, Signal Processing, Machine Learning, Sensor Data Pattern Recognition, Leadership, OCR, System Architecture, Tech Roadmapping, Algorithms, Computer Vision, Deep Neural Networks, Statistical Learning, Linear Algebra, Image Processing, Statistical Modeling, Data Modeling, Gated Recurrent Unit (GRU), Recurrent Neural Networks, Artificial Intelligence (AI), Dynamic Systems Modeling, Programming, Applied Mathematics, Deep Learning, Nvidia Jetson, Recommendation Systems, Natural Language Processing (NLP), Natural Language Understanding, Natural Language Queries, Data Structures, GraphDB, Data Engineering, Reinforcement Learning
  • Languages

    C, R, Python, C++
  • Libraries/APIs

    Scikit-learn, TensorFlow, Keras, OpenCV, NLTK, Spark ML, PyTorch
  • Platforms

    Linux, Raspberry Pi
  • Storage

    Redis, MongoDB, Cassandra


  • Ph.D. degree in Biomedical Engineering and Computer Science
    2002 - 2007
    University of North Carolina at Chapel Hill - Chapel Hill, NC, USA
  • Bachelor of Engineering degree in Electrical Engineering
    1998 - 2002
    Gujarat University, LD College of Engineering - Ahmedabad, Gujarat, India


  • Machine Learning
    Stanford University via Coursera
  • Certified Associate in Project Management
    JANUARY 2007 - JANUARY 2012
    PMI | Project Management Institute

To view more profiles

Join Toptal
Share it with others