Sunil is available for hire

Sunil Kumar

Verified Expert in Engineering

Senior Data Scientist and Developer

Location

Zürich, Switzerland

Toptal Member Since

December 20, 2021

Sunil's expertise lies in developing novel statistical and mathematical models for quantitative and qualitative data assessment. His primary strengths are ML, AI, big data, systems biology, and high-performance computing. He has co-authored 5+ papers in sleep science and helped develop solutions for wireless data transfer, DB management, and monitoring software lifecycle processes. Sunil is adept in overseeing research to ensure rapid development and deployment of innovative product lines.

Machine Learning Statistical Data Analysis Time Series Analysis Deep Learning Ubuntu Linux Python 3 NumPy Pandas SciPy Anaconda Keras TensorFlow PyQt Docker PyTorch Bayesian Statistics Statistics

Portfolio

Sleepiz AG

Python 3, Cloud Computing, Statistics, Docker, Data Science...

IBM Research

Artificial Intelligence (AI), Python 3, MATLAB, Slack, Mathematics, Statistics...

IMRB International, Kantar

Statistics, Bayesian Statistics, Bayesian Inference & Modeling, Data Inference...

Experience

Statistics - 15 years MacOS - 12 years Python 3 - 9 years Machine Learning - 8 years Time Series Analysis - 8 years Deep Learning - 4 years Docker - 2 years Machine Learning Operations (MLOps) - 2 years

Availability

Part-time

Preferred Environment

MacOS, Unix, Python 3, Teams, Slack, Amazon Web Services (AWS), Azure

The most amazing...

...statistical tools I've built use advanced machine learning for sleep disorder assessment.

Work Experience

Senior Data Scientist

2018 - PRESENT

Sleepiz AG

Co-authored 5+ conference papers and posters in the area of sleep science.
Demonstrated the importance of cardiorespiratory features in terms of sleep staging and sleep disorder diagnosis.
Obtained ethics approval from the Swiss Association of Research Ethics Committees (Swissethics) on sleep research.

Technologies: Python 3, Cloud Computing, Statistics, Docker, Data Science, Statistical Data Analysis, Data Integration, English, Project Design & Management, NumPy, Pandas, PyTorch, Keras, TensorFlow, Plotly, Dash, PyQt, SciPy, Scikit-learn, Anaconda, Ubuntu Linux, Azure Machine Learning, SQL, Cassandra

Doctoral Student, Contractor

2014 - 2016

IBM Research

Developed inference algorithms for protein networks using cytometry data.
Created inference algorithms for protein networks using mass spectrometry data.
Built probabilistic graphical models using moment dynamics.

Technologies: Artificial Intelligence (AI), Python 3, MATLAB, Slack, Mathematics, Statistics, Data Science, English, NumPy, Pandas, SciPy, Scikit-learn, Ubuntu Linux

Research Manager

2011 - 2012

IMRB International, Kantar

Developed a market simulation application using Visual Basic Applications (VBA).
Created a user segmentation model using their consumer product's uses and behavior.
Gained corporate-level presentation skills and project management.

Technologies: Statistics, Bayesian Statistics, Bayesian Inference & Modeling, Data Inference, Causal Inference, Visual Basic for Applications (VBA), Excel VBA, IBM SPSS Statistics, Statistical Data Analysis, English

Experience

AI-based Sleep Disorder Assessment

http://www.sleepiz.com

In my current role as a senior data scientist at Sleepiz AG, I develop novel statistical models for sleep assessment. I have co-authored 5+ conference papers and posters in sleep science. Our work has particularly demonstrated the importance of cardiorespiratory features in terms of sleep staging and sleep disorder diagnosis. As an integral part of the software development team at Sleepiz, I am responsible for providing efficient solutions for wireless data transfer, database management, and monitoring the software lifecycle process to ensure quality standards.

Stabilized Reconstruction of Signaling Networks from Single-cell Cue-response Data

Inferring cell-signaling networks from high-throughput data is a challenging problem in systems biology. Recent advances in cytometric technology enable us to measure the abundance of many proteins at the single-cell level across time. Traditional network reconstruction approaches usually consider each time point separately, resulting in inferred networks that enormously vary across time. To account for the possibly time-invariant physical couplings within the signaling network, we extend the traditional graphical lasso with an additional regularizer that penalizes network variations over time. ROC evaluation of the method on in silico data showed higher reconstruction accuracy than standard graphical lasso. We also tested our approach on single-cell mass cytometry data of IFNγ-stimulated THP1 cells with 26 phospho-proteins simultaneously measured. Our approach recapitulated known signaling relationships, such as connection within the JAK/STAT pathway, and was further validated in characterizing perturbed signaling networks with PI3K, MEK1/2, and AMPK inhibitors.

Inferring Gene Expression Networks with Hubs Using Degree-weighted Lasso Approach

https://academic.oup.com/bioinformatics/article/35/6/987/5085370

Genome-scale gene networks contain regulatory genes called hubs that have many interaction partners. These genes usually play an essential role in gene regulation and cellular processes. Despite recent advancements in high-throughput technology, inferring gene networks with hub genes from high-dimensional data remains a challenging problem. To address this, we propose DW-Lasso, a degree-weighted Lasso method which infers gene networks with hubs efficiently under the low sample size setting. In a simulation study, we demonstrate good predictive performance of the proposed method in comparison to traditional Lasso-type methods in inferring hub and scale-free graphs. We show the effectiveness of our method in an application to microarray data of Escherichia coli and RNA sequencing data of Kidney Clear Cell Carcinoma from The Cancer Genome Atlas datasets.

Hybrid Approach for Improved Content-based Image Retrieval Using Segmentation

https://arxiv.org/abs/1502.03215

The objective of Content-based Image Retrieval (CBIR) methods is to extract, from large (image) databases, a specified number of images similar in visual and semantic content to a so-called query image.

To bridge the semantic gap that exists between the representation of an image by low-level features, namely color, shape, texture, and its high-level semantic content as perceived by humans, CBIR systems typically make use of the relevance feedback (RF) mechanism. RF iteratively incorporates user-given inputs regarding the relevance of retrieved images to improve retrieval efficiency.

In this work, an attempt has been made to improve retrieval accuracy by enhancing a CBIR system based on color features alone, through implicit incorporation of shape information obtained through prior segmentation of the images. Novel schemes for feature reweighting and initialization of the relevant set for improved relevance feedback, have also been proposed for boosting the performance of RF-based CBIR. At the same time, new measures for evaluating retrieval accuracy have been suggested to overcome the limitations of existing measures in the RF context.

Feature Selection for Sleep Staging Using Cardiorespiratory and Movement Signals

https://openres.ersjournals.com/content/5/suppl_3/P40

EEG-based sleep staging is commonly conducted in a clinical setting, which may disturb patients’ sleep habits and impair study results. A non-invasive method of sleep staging through cardiorespiratory signals and body movement allows us to classify the stages of awake, light, deep, and REM sleep using random forest (RF) with good clinical accuracy. The aim is to improve the latter by tuning the RF hyperparameters. The hyperparameters were tuned over the splitting criteria Gini and entropy, maximal tree depth (up to fully grown), number of trees (up to 1,000), and maximal number of features considered at each split (p, vp, or log p). Selecting only the most essential features may increase accuracy further by reducing noisy inputs while decreasing computation time. Cardiorespiratory features became more relevant than movement, indicating that the latter may be omitted without risking a meaningful decrease in scoring accuracy.

Skills

Paradigms

Data Science

Other

Teams, Statistics, Machine Learning, English, Statistical Data Analysis, Deep Learning, Time Series Analysis, Machine Learning Operations (MLOps), Dash, Cloud Computing, Project Design & Management, Mathematics, Cell Biology, Signal Processing, Team Management, Offshore Team Management, Artificial Intelligence (AI), Stochastic Modeling, Probability Theory, Probabilistic Graphical Models, Multivariate Statistical Modeling, Optimization, Bayesian Statistics, Statistical Modeling, Causal Inference, Bayesian Inference & Modeling, Data Inference, Software Development Lifecycle (SDLC), Medical Devices, Information Security, Privacy, Data Privacy, Records Management, Pattern Recognition, Random Forests, Hyperparameters, Linear Algebra, Numerical Analysis, Physics, Hypothesis Testing, Discrete Multivariate Modeling

Languages

Python 3, SQL, Python, Visual Basic for Applications (VBA), Excel VBA, R

Libraries/APIs

NumPy, Pandas, PyTorch, Keras, TensorFlow, PyQt, SciPy, Scikit-learn

Tools

Slack, Plotly, Azure Machine Learning, MATLAB, Git, IBM SPSS Statistics

Platforms

Docker, Anaconda, Ubuntu Linux, MacOS, Unix, Amazon Web Services (AWS), Azure

Storage

Data Integration, Cassandra

Education

2012 - 2017

Ph.D. in Biomedical Science

Swiss Federal Institute of Technology (ETH) - Zurich, Switzerland

2009 - 2011

Master's Degree in Statistics

Indian Statistical Institute - Kolkata, India

2006 - 2009

Bachelor's Degree in Statistics

Indian Statistical Institute - Kolkata, India

Certifications

SEPTEMBER 2018 - PRESENT

Information Security and Management Refresher

National Institutes of Health (NIH)

JUNE 2018 - PRESENT

Software Lifecycle (EN 62304) Training

Medidee Services SA

MARCH 2018 - PRESENT

Active Medical Devices–Standard Requirements Training

Medidee Services SA

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring