Saikat Banerjee, Developer in Chicago, IL, United States
Saikat is available for hire
Hire Saikat

Saikat Banerjee

Verified Expert  in Engineering

Linear Regression Developer

Location
Chicago, IL, United States
Toptal Member Since
July 29, 2022

Saikat is a postdoctoral scientist at the University of Chicago with a PhD in computational biophysics and a master's degree in chemistry. He is an expert in biostatistics, statistical genetics, Bayesian methods, and machine learning. As a graphic design and web development freelancer, Saikat co-founded a marketing management company. He enjoys solving problems, creating value, and learning new expert-level skills.

Availability

Part-time

Preferred Environment

Ubuntu, Python, C++

The most amazing...

...method I've developed helped scientists to discover the network of human genome transcriptional regulation.

Work Experience

Postdoctoral Scientist

2020 - PRESENT
The University of Chicago
  • Led multiple projects on Bayesian statistics with international collaborations and challenging deadlines.
  • Developed machine learning algorithms for sparse multiple regression.
  • Introduced gradient descent technique for variational inference.
Technologies: Statistical Methods, Bayesian Statistics, Linear Regression, Logistic Regression, Predictive Modeling, Machine Learning

Postdoctoral Scientist

2015 - 2020
Max Planck Society
  • Developed statistical methods to understand disease mechanisms from large-scale biomedical data.
  • Collaborated with medical doctors leading to two peer-reviewed publications.
  • Presented our work at the 2019 International Society for Computational Biology conference and 2020 e:Med; invited to hold a visiting lecture at the University of Göttingen.
  • Supervised a master's thesis and mentored three internship students.
Technologies: Bayesian Statistics, Statistical Methods, Linear Regression, Logistic Regression, Predictive Modeling, Machine Learning

Trans-eQTL Discovery from GTEx Data

https://doi.org/10.1186/s13059-021-02361-8
Genetic variants regulating distant target genes are called trans-acting expression quantitative trait loci (trans-eQTLs). Many genetic variants are believed to mediate disease risk via the trans-eQTLs. It is crucial to discover trans-eQTLs and understand their mechanism to reveal the genetic variants' link to disease phenotypes. It is challenging to identify trans-eQTLs due to small effect sizes, tissue specificity, and a severe multiple-testing burden.

Our goal was to develop a reliable method of identifying trans-eQTLs. We proposed a new model and created open-source software. Applying our method to the eQTL data from the Genotype-Tissue Expression Project (GTEx) proved its performance is significantly better than the state-of-the-art.

Bayesian Multiple Logistic Regression

https://doi.org/10.1371/journal.pgen.1007856
Logistic regression is the method of choice to analyze binary outcomes. Multiple logistic regression uses numerous variables in a logistic model. Bayesian multiple logistic regression offers several benefits, including variable selection, prediction, easier interpretation of results, and leveraging prior information. However, Bayesian multiple logistic regression requires costly and technically challenging Markov Chain Monte Carlo (MCMC) sampling or approximations that significantly reduce the logistic model's flexibility.

We proposed a methodology using the point-normal prior for faster and more accurate Bayesian multiple logistic regression, developing open-source software for the project. Applying our method to human genetics data, we proved it outperforms state-of-the-art variable selection and prediction for sparse multiple logistic regression problems of high dimension (n >> p problems.)

Languages

Python, Bash, HTML, PHP, CSS, Fortran, C++, Hugo, CSS3

Libraries/APIs

NumPy, SciPy, Scikit-learn, Matplotlib, MPI, OpenMP

Tools

Jupyter, Shell, Adobe Illustrator, GitHub, Adobe Photoshop

Platforms

Ubuntu, Linux, Debian

Other

Bayesian Statistics, Statistical Methods, Linear Regression, Logistic Regression, Biostatistics, Predictive Modeling, Machine Learning, Research, Mechanics, Generalized Linear Model (GLM), Mixed-effects Models, Biophysics, Data Analysis, Computational Biological Physics

Paradigms

Data Science, Parallel Programming

Storage

MySQL, JSON

2010 - 2015

PhD in Computational Biophysics

Indian Institute of Science - Bangalore, India

2007 - 2010

Master's Degree in Chemistry

Indian Institute of Science - Bangalore, India

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring