Saikat Banerjee, Linear Regression Developer in Chicago, IL, United States
Saikat Banerjee

Linear Regression Developer in Chicago, IL, United States

Member since July 29, 2022
Saikat is a postdoctoral scientist at the University of Chicago with a PhD in computational biophysics and a master's degree in chemistry. He is an expert in biostatistics, statistical genetics, Bayesian methods, and machine learning. As a graphic design and web development freelancer, Saikat co-founded a marketing management company. He enjoys solving problems, creating value, and learning new expert-level skills.
Saikat is now available for hire

Portfolio

Experience

  • Python 10 years
  • Machine Learning 7 years
  • Predictive Modeling 7 years
  • Statistical Methods 7 years
  • Linear Regression 7 years
  • Bayesian Statistics 7 years
  • Logistic Regression 7 years
  • Biostatistics 7 years

Location

Chicago, IL, United States

Availability

Part-time

Preferred Environment

Ubuntu, Python, C++

The most amazing...

...method I've developed helped scientists to discover the network of human genome transcriptional regulation.

Employment

  • Postdoctoral Scientist

    2020 - PRESENT
    The University of Chicago
    • Led multiple projects on Bayesian statistics with international collaborations and challenging deadlines.
    • Developed machine learning algorithms for sparse multiple regression.
    • Introduced gradient descent technique for variational inference.
    Technologies: Statistical Methods, Bayesian Statistics, Linear Regression, Logistic Regression, Predictive Modeling, Machine Learning
  • Postdoctoral Scientist

    2015 - 2020
    Max Planck Society
    • Developed statistical methods to understand disease mechanisms from large-scale biomedical data.
    • Collaborated with medical doctors leading to two peer-reviewed publications.
    • Presented our work at the 2019 International Society for Computational Biology conference and 2020 e:Med; invited to hold a visiting lecture at the University of Göttingen.
    • Supervised a master's thesis and mentored three internship students.
    Technologies: Bayesian Statistics, Statistical Methods, Linear Regression, Logistic Regression, Predictive Modeling, Machine Learning

Experience

  • Trans-eQTL Discovery from GTEx Data
    https://doi.org/10.1186/s13059-021-02361-8

    Genetic variants regulating distant target genes are called trans-acting expression quantitative trait loci (trans-eQTLs). Many genetic variants are believed to mediate disease risk via the trans-eQTLs. It is crucial to discover trans-eQTLs and understand their mechanism to reveal the genetic variants' link to disease phenotypes. It is challenging to identify trans-eQTLs due to small effect sizes, tissue specificity, and a severe multiple-testing burden.

    Our goal was to develop a reliable method of identifying trans-eQTLs. We proposed a new model and created open-source software. Applying our method to the eQTL data from the Genotype-Tissue Expression Project (GTEx) proved its performance is significantly better than the state-of-the-art.

  • Bayesian Multiple Logistic Regression
    https://doi.org/10.1371/journal.pgen.1007856

    Logistic regression is the method of choice to analyze binary outcomes. Multiple logistic regression uses numerous variables in a logistic model. Bayesian multiple logistic regression offers several benefits, including variable selection, prediction, easier interpretation of results, and leveraging prior information. However, Bayesian multiple logistic regression requires costly and technically challenging Markov Chain Monte Carlo (MCMC) sampling or approximations that significantly reduce the logistic model's flexibility.

    We proposed a methodology using the point-normal prior for faster and more accurate Bayesian multiple logistic regression, developing open-source software for the project. Applying our method to human genetics data, we proved it outperforms state-of-the-art variable selection and prediction for sparse multiple logistic regression problems of high dimension (n >> p problems.)

Skills

  • Languages

    Python, Bash, HTML, PHP, CSS, Fortran, C++, Hugo, CSS3
  • Libraries/APIs

    NumPy, SciPy, Scikit-learn, Matplotlib, MPI, OpenMP
  • Tools

    Jupyter, Shell, Adobe Illustrator, GitHub, Adobe Photoshop
  • Platforms

    Ubuntu, Linux, Debian
  • Other

    Bayesian Statistics, Statistical Methods, Linear Regression, Logistic Regression, Biostatistics, Predictive Modeling, Machine Learning, Research, Mechanics, Generalized Linear Model (GLM), Mixed-effects Models, Biophysics, Physical Chemistry, Data Analysis, Computational Biological Physics
  • Paradigms

    Data Science, Parallel Programming
  • Storage

    MySQL, JSON

Education

  • PhD in Computational Biophysics
    2010 - 2015
    Indian Institute of Science - Bangalore, India
  • Master's Degree in Chemistry
    2007 - 2010
    Indian Institute of Science - Bangalore, India

To view more profiles

Join Toptal
Share it with others