Sunil Kumar, Developer in Zürich, Switzerland
Sunil is available for hire
Hire Sunil

Sunil Kumar

Verified Expert  in Engineering

Senior Data Scientist and Developer

Zürich, Switzerland
Toptal Member Since
December 20, 2021

Sunil's expertise lies in developing novel statistical and mathematical models for quantitative and qualitative data assessment. His primary strengths are ML, AI, big data, systems biology, and high-performance computing. He has co-authored 5+ papers in sleep science and helped develop solutions for wireless data transfer, DB management, and monitoring software lifecycle processes. Sunil is adept in overseeing research to ensure rapid development and deployment of innovative product lines.


Sleepiz AG
Python 3, Cloud Computing, Statistics, Docker, Data Science...
IBM Research
Artificial Intelligence (AI), Python 3, MATLAB, Slack, Mathematics, Statistics...
IMRB International, Kantar
Statistics, Bayesian Statistics, Bayesian Inference & Modeling, Data Inference...




Preferred Environment

MacOS, Unix, Python 3, Teams, Slack, Amazon Web Services (AWS), Azure

The most amazing...

...statistical tools I've built use advanced machine learning for sleep disorder assessment.

Work Experience

Senior Data Scientist

2018 - PRESENT
Sleepiz AG
  • Co-authored 5+ conference papers and posters in the area of sleep science.
  • Demonstrated the importance of cardiorespiratory features in terms of sleep staging and sleep disorder diagnosis.
  • Obtained ethics approval from the Swiss Association of Research Ethics Committees (Swissethics) on sleep research.
Technologies: Python 3, Cloud Computing, Statistics, Docker, Data Science, Statistical Data Analysis, Data Integration, English, Project Design & Management, NumPy, Pandas, PyTorch, Keras, TensorFlow, Plotly, Dash, PyQt, SciPy, Scikit-learn, Anaconda, Ubuntu Linux, Azure Machine Learning, SQL, Cassandra

Doctoral Student, Contractor

2014 - 2016
IBM Research
  • Developed inference algorithms for protein networks using cytometry data.
  • Created inference algorithms for protein networks using mass spectrometry data.
  • Built probabilistic graphical models using moment dynamics.
Technologies: Artificial Intelligence (AI), Python 3, MATLAB, Slack, Mathematics, Statistics, Data Science, English, NumPy, Pandas, SciPy, Scikit-learn, Ubuntu Linux

Research Manager

2011 - 2012
IMRB International, Kantar
  • Developed a market simulation application using Visual Basic Applications (VBA).
  • Created a user segmentation model using their consumer product's uses and behavior.
  • Gained corporate-level presentation skills and project management.
Technologies: Statistics, Bayesian Statistics, Bayesian Inference & Modeling, Data Inference, Causal Inference, Visual Basic for Applications (VBA), Excel VBA, IBM SPSS Statistics, Statistical Data Analysis, English

AI-based Sleep Disorder Assessment
In my current role as a senior data scientist at Sleepiz AG, I develop novel statistical models for sleep assessment. I have co-authored 5+ conference papers and posters in sleep science. Our work has particularly demonstrated the importance of cardiorespiratory features in terms of sleep staging and sleep disorder diagnosis. As an integral part of the software development team at Sleepiz, I am responsible for providing efficient solutions for wireless data transfer, database management, and monitoring the software lifecycle process to ensure quality standards.

Stabilized Reconstruction of Signaling Networks from Single-cell Cue-response Data

Inferring cell-signaling networks from high-throughput data is a challenging problem in systems biology. Recent advances in cytometric technology enable us to measure the abundance of many proteins at the single-cell level across time. Traditional network reconstruction approaches usually consider each time point separately, resulting in inferred networks that enormously vary across time. To account for the possibly time-invariant physical couplings within the signaling network, we extend the traditional graphical lasso with an additional regularizer that penalizes network variations over time. ROC evaluation of the method on in silico data showed higher reconstruction accuracy than standard graphical lasso. We also tested our approach on single-cell mass cytometry data of IFNγ-stimulated THP1 cells with 26 phospho-proteins simultaneously measured. Our approach recapitulated known signaling relationships, such as connection within the JAK/STAT pathway, and was further validated in characterizing perturbed signaling networks with PI3K, MEK1/2, and AMPK inhibitors.

Inferring Gene Expression Networks with Hubs Using Degree-weighted Lasso Approach
Genome-scale gene networks contain regulatory genes called hubs that have many interaction partners. These genes usually play an essential role in gene regulation and cellular processes. Despite recent advancements in high-throughput technology, inferring gene networks with hub genes from high-dimensional data remains a challenging problem. To address this, we propose DW-Lasso, a degree-weighted Lasso method which infers gene networks with hubs efficiently under the low sample size setting. In a simulation study, we demonstrate good predictive performance of the proposed method in comparison to traditional Lasso-type methods in inferring hub and scale-free graphs. We show the effectiveness of our method in an application to microarray data of Escherichia coli and RNA sequencing data of Kidney Clear Cell Carcinoma from The Cancer Genome Atlas datasets.

Hybrid Approach for Improved Content-based Image Retrieval Using Segmentation
The objective of Content-based Image Retrieval (CBIR) methods is to extract, from large (image) databases, a specified number of images similar in visual and semantic content to a so-called query image.

To bridge the semantic gap that exists between the representation of an image by low-level features, namely color, shape, texture, and its high-level semantic content as perceived by humans, CBIR systems typically make use of the relevance feedback (RF) mechanism. RF iteratively incorporates user-given inputs regarding the relevance of retrieved images to improve retrieval efficiency.

In this work, an attempt has been made to improve retrieval accuracy by enhancing a CBIR system based on color features alone, through implicit incorporation of shape information obtained through prior segmentation of the images. Novel schemes for feature reweighting and initialization of the relevant set for improved relevance feedback, have also been proposed for boosting the performance of RF-based CBIR. At the same time, new measures for evaluating retrieval accuracy have been suggested to overcome the limitations of existing measures in the RF context.

Feature Selection for Sleep Staging Using Cardiorespiratory and Movement Signals
EEG-based sleep staging is commonly conducted in a clinical setting, which may disturb patients’ sleep habits and impair study results. A non-invasive method of sleep staging through cardiorespiratory signals and body movement allows us to classify the stages of awake, light, deep, and REM sleep using random forest (RF) with good clinical accuracy. The aim is to improve the latter by tuning the RF hyperparameters. The hyperparameters were tuned over the splitting criteria Gini and entropy, maximal tree depth (up to fully grown), number of trees (up to 1,000), and maximal number of features considered at each split (p, vp, or log p). Selecting only the most essential features may increase accuracy further by reducing noisy inputs while decreasing computation time. Cardiorespiratory features became more relevant than movement, indicating that the latter may be omitted without risking a meaningful decrease in scoring accuracy.


Data Science


Teams, Statistics, Machine Learning, English, Statistical Data Analysis, Deep Learning, Time Series Analysis, Machine Learning Operations (MLOps), Dash, Cloud Computing, Project Design & Management, Mathematics, Cell Biology, Signal Processing, Team Management, Offshore Team Management, Artificial Intelligence (AI), Stochastic Modeling, Probability Theory, Probabilistic Graphical Models, Multivariate Statistical Modeling, Optimization, Bayesian Statistics, Statistical Modeling, Causal Inference, Bayesian Inference & Modeling, Data Inference, Software Development Lifecycle (SDLC), Medical Devices, Information Security, Privacy, Data Privacy, Records Management, Pattern Recognition, Random Forests, Hyperparameters, Linear Algebra, Numerical Analysis, Physics, Hypothesis Testing, Discrete Multivariate Modeling


Python 3, SQL, Python, Visual Basic for Applications (VBA), Excel VBA, R


NumPy, Pandas, PyTorch, Keras, TensorFlow, PyQt, SciPy, Scikit-learn


Slack, Plotly, Azure Machine Learning, MATLAB, Git, IBM SPSS Statistics


Docker, Anaconda, Ubuntu Linux, MacOS, Unix, Amazon Web Services (AWS), Azure


Data Integration, Cassandra

2012 - 2017

Ph.D. in Biomedical Science

Swiss Federal Institute of Technology (ETH) - Zurich, Switzerland

2009 - 2011

Master's Degree in Statistics

Indian Statistical Institute - Kolkata, India

2006 - 2009

Bachelor's Degree in Statistics

Indian Statistical Institute - Kolkata, India


Information Security and Management Refresher

National Institutes of Health (NIH)


Software Lifecycle (EN 62304) Training

Medidee Services SA


Active Medical Devices–Standard Requirements Training

Medidee Services SA