Saikat Banerjee, Developer in Jersey City, NJ, United States
Saikat is available for hire
Hire Saikat

Saikat Banerjee

Verified Expert  in Engineering

Predictive Modeling Developer

Jersey City, NJ, United States

Toptal member since July 29, 2022

Bio

Saikat is a computational genomics scientist at the New York Genome Center. Previously, he was a postdoc at UChicago and MPI, Germany. He did a PhD in computational biophysics and a master's degree in chemistry. He is an expert in Bayesian methods, machine learning, biostatistics, and statistical genetics. During his PhD, Saikat co-founded a marketing management company. He enjoys solving problems and creating value, often learning new skills at a professional level.

Portfolio

New York Genome Center
Machine Learning, Natural Language Processing (NLP), Bayesian Statistics...
The University of Chicago
Statistical Methods, Bayesian Statistics, Linear Regression...
Max Planck Society
Bayesian Statistics, Statistical Methods, Linear Regression...

Experience

Availability

Part-time

Preferred Environment

Ubuntu, Python, C++

The most amazing...

...method I've developed helped scientists discover the network of human genome transcriptional regulation.

Work Experience

Staff Scientist

2023 - PRESENT
New York Genome Center
  • Developed a low rank matrix approximation algorithm using convex optimization.
  • Performed biobank-scale data analysis to infer shared and distinct genetic components of heterogeneous complex diseases.
  • Presented our discovery in multiple international conferences.
Technologies: Machine Learning, Natural Language Processing (NLP), Bayesian Statistics, Numerical Optimization, PyTorch

Postdoctoral Scientist

2020 - 2023
The University of Chicago
  • Led multiple projects on Bayesian statistics with international collaborations and challenging deadlines.
  • Developed machine learning algorithms for sparse multiple regression.
  • Introduced the gradient descent technique for variational inference.
Technologies: Statistical Methods, Bayesian Statistics, Linear Regression, Logistic Regression, Predictive Modeling, Machine Learning, Generalized Linear Model (GLM)

Postdoctoral Scientist

2015 - 2020
Max Planck Society
  • Developed statistical methods to understand disease mechanisms from large-scale biomedical data.
  • Collaborated with medical doctors, leading to two peer-reviewed publications.
  • Presented our work at the 2019 International Society for Computational Biology conference and 2020 e:Med. Invited to hold a visiting lecture at the University of Göttingen.
  • Supervised a master's thesis and mentored three internship students.
Technologies: Bayesian Statistics, Statistical Methods, Linear Regression, Logistic Regression, Predictive Modeling, Machine Learning

Trans-eQTL Discovery from GTEx Data

https://doi.org/10.1186/s13059-021-02361-8
Genetic variants regulating distant target genes are called trans-acting expression quantitative trait loci (trans-eQTLs). Many genetic variants are believed to mediate disease risk via the trans-eQTLs. It is crucial to discover trans-eQTLs and understand their mechanism to reveal the genetic variants' link to disease phenotypes. It is challenging to identify trans-eQTLs due to small effect sizes, tissue specificity, and a severe multiple-testing burden.

Our goal was to develop a reliable method of identifying trans-eQTLs. We proposed a new model and created open-source software. Applying our method to the eQTL data from the Genotype-Tissue Expression Project (GTEx) proved its performance is significantly better than the state-of-the-art.

Bayesian Multiple Logistic Regression

https://doi.org/10.1371/journal.pgen.1007856
Logistic regression is the method of choice to analyze binary outcomes. Multiple logistic regression uses numerous variables in a logistic model. Bayesian multiple logistic regression offers several benefits, including variable selection, prediction, easier interpretation of results, and leveraging prior information. However, Bayesian multiple logistic regression requires costly and technically challenging Markov Chain Monte Carlo (MCMC) sampling or approximations that significantly reduce the logistic model's flexibility.

We proposed a methodology using the point-normal prior for faster and more accurate Bayesian multiple logistic regression, developing open-source software for the project. Applying our method to human genetics data, we proved it outperforms state-of-the-art variable selection and prediction for sparse multiple logistic regression problems of high dimension (n >> p problems.)
2010 - 2015

PhD in Computational Biophysics

Indian Institute of Science - Bangalore, India

2007 - 2010

Master's Degree in Chemistry

Indian Institute of Science - Bangalore, India

Libraries/APIs

NumPy, SciPy, Scikit-learn, Matplotlib, MPI, OpenMP, PyTorch

Tools

Jupyter, Shell, Adobe Illustrator, GitHub, Adobe Photoshop

Languages

Python, Bash, HTML, PHP, CSS, Fortran, C++, Hugo, CSS3

Platforms

Ubuntu, Linux, Debian

Paradigms

Parallel Programming

Storage

MySQL, JSON

Other

Bayesian Statistics, Statistical Methods, Linear Regression, Logistic Regression, Biostatistics, Predictive Modeling, Machine Learning, Research, Generalized Linear Model, Mechanics, Generalized Linear Model (GLM), Mixed-effects Models, Biophysics, Data Analysis, Data Science, Computational Biological Physics, Natural Language Processing (NLP), Numerical Optimization

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring