Thomas Debray, Developer in Utrecht, Netherlands
Thomas is available for hire
Hire Thomas

Thomas Debray

Verified Expert  in Engineering

Data Scientist and Developer

Utrecht, Netherlands
Toptal Member Since
March 18, 2022

Thomas has 17 years of experience in risk modeling and causal inference and has managed over €1 million in research funds as a scientist. Since 2019, he has worked as an independent contractor for various global pharmaceutical companies and CROs. His goal is to improve data-driven decision making by adopting state-of-the-art analysis methods and delivering scientific scrutiny in a timely fashion.


Smart Data Analysis and Statistics
Bayesian Inference & Modeling, Biostatistics, Meta-analysis, PHP, MySQL, HTML...
Undisclosed Pharmaceutical Company
R, RStudio, Microsoft PowerPoint, Causal Inference, Clinical Trials...
BioMed Central Ltd
Publishing, Biostatistics, Machine Learning, Literature Review...




Preferred Environment

R, PHP, Statistics, Machine Learning, Risk Models, Causal Inference

The most amazing... I've developed are statistical methods, software, and guidelines that major scientific journals and international communities have endorsed.

Work Experience

Senior Statistician

2022 - PRESENT
Smart Data Analysis and Statistics
  • Provided statistical support during the design and analysis of Phase IV trials, post-authorization safety studies, historical control studies, and pooled studies (e.g., meta-analysis).
  • Built a Shiny app to facilitate blinded sample size re-estimation (BSSR) in bio-equivalence studies with multiple primary endpoints and >2 treatment arms.
  • Developed an R package for precision medicine. This package is hosted on CRAN and implements a doubly robust precision medicine approach to fit, cross-validate, and visualize prediction models for the conditional average treatment effect.
  • Developed, evaluated, and implemented risk prediction models using R and Python.
  • Set up advanced simulation studies using GCP and Amazon AWS.
  • Managed several data scientists and statisticians to develop training materials on biostatistics and machine learning.
  • Developed and maintained the company's main website using PHP and MySQL and implemented various APIs such as Bootstrap, Carousel, Google Charts, and Calendly.
  • Edited a handbook guiding conducting comparative effectiveness research and personalized medicine using real-world data.
Technologies: Bayesian Inference & Modeling, Biostatistics, Meta-analysis, PHP, MySQL, HTML, JavaScript, Machine Learning, CSS, Bioinformatics, Python, Jupyter Notebook,, Risk Models, Data Science, Amazon Web Services (AWS), SQL, Linux Mint, Linux, Database Design, Graphical User Interface (GUI), Statistical Data Analysis, Predictive Modeling, Data Analytics, Database Analytics, Google Ads

Contract Senior Biostatistician

2021 - PRESENT
Undisclosed Pharmaceutical Company
  • Developed a study protocol to create a synthetic control arm for a noninterventional cohort study.
  • Reviewed statistical analysis plans for conducting a systematic literature review and network meta-analysis of randomized trials and real-world evidence.
  • Critically reviewed available data sources and assessed their utility for generating a synthetic control arm.
Technologies: R, RStudio, Microsoft PowerPoint, Causal Inference, Clinical Trials, Epidemiology, Biostatistics, Statistical Analysis, Health Economics & Outcomes Research (HEOR), Literature Review, Database Design, Statistical Data Analysis, Data Analytics

Associate Editor

2016 - PRESENT
BioMed Central Ltd
  • Managed article submissions and editorial peer review for the open-access journal BMC diagnostic and prognostic research.
  • Provided feedback to manuscript authors about the required revisions.
  • Invited domain experts to provide critical reviews of submitted manuscripts.
Technologies: Publishing, Biostatistics, Machine Learning, Literature Review, Causal Inference, Risk Models

Contract Senior Data Scientist

2021 - 2022
  • Assisted in the implementation of machine learning methods for credit risk modeling.
  • Planned the development of a software platform for automating micro-loans.
  • Critically reviewed quotes for the development of the software platform.
Technologies: R, Machine Learning, Fintech

Assistant Professor

2013 - 2022
University Medical Center Utrecht
  • Developed the statistical methodology and guidelines for conducting risk prediction and causal inference. Key topics: regression, meta-analysis, multiple imputations, multilevel modeling, Bayesian inference, propensity score analysis, machine learning.
  • Created master-of-science courses, workshops, online training modules, and a wiki for education and information provision to international students and staff.
  • Built an open-source R software package and maintained the updates and bug fixes via a callable range accrual note (CRAN).
  • Set up the advanced simulation studies using Amazon AWS and GCP to evaluate and compare the performance of analytical approaches.
  • Developed and validated prediction models using penalized regression, multilevel regression, random forests, XGBoost, neural networks, and support vector machines.
  • Acted as a principal investigator for various international projects funded by the European Commission and World Health Organization. Applied for national and international research grants.
  • Managed an international team of master-of-science students, Ph.D. candidates, and post-docs and supervised their daily activities.
  • Provided critical reviews and analytical support in epidemiological studies.
  • Set up new collaborations with international organizations, including universities, healthcare agencies, and pharmaceutical companies.
  • Published around 100 peer-reviewed scientific manuscripts.
Technologies: Epidemiology, Biostatistics, Machine Learning, Meta-analysis, Training & Training Content Development, R, RStudio, Bayesian Inference & Modeling, Clinical Trials, Causal Inference, GitHub, Subversion (SVN), Wikis, Data Visualization, Data Analysis, JAGS, WinBUGS, Eclipse IDE, Data Science, SQL, Amazon Web Services (AWS), Google Cloud Platform (GCP), Linux Mint, Linux, Database Design, Graphical User Interface (GUI), Research, XGBoost, Regression, Statistical Data Analysis, Predictive Modeling, Data Analytics, Education

Scientific Consultant

2021 - 2021
Undisclosed Health Technology Assessments (HTA) Agency
  • Reviewed the validity of health economic models to assess the cost-effectiveness of a new therapy.
  • Evaluated the Java source code of a discrete event simulation model to identify computational and coding errors.
  • Verified the consistency between the technical report and the parameters and outputs of the discrete event simulation model.
  • Provided scientific advice on improving the transparency and usability of the discrete event simulation model.
  • Participated in teleconferences to discuss the technical report, disease and clinical area, and appropriateness of the health economic model.
  • Reviewed the draft advice report from the client and addressed their queries via mail.
Technologies: Java, Biostatistics, Markov Model, Markov Chain Monte Carlo (MCMC) Algorithms, Monte Carlo Simulations, Health Economics & Outcomes Research (HEOR), Statistical Data Analysis, Data Analytics

Contract Senior Biostatistician

2020 - 2021
Undisclosed Nonprofit Association
  • Reviewed the study protocol for a systematic literature review and provided feedback on the required analysis steps.
  • Conducted a multilevel meta-analysis of published evidence obtained through a literature review.
  • Assisted in drafting the final report, preparing a scientific publication, and addressing reviewer comments relating to methodological and statistical inquiries.
  • Developed R code to recover missing information from published reports and conducted a meta-analysis.
Technologies: Literature Review, Meta-analysis, R, RStudio, Biostatistics, Database Design, Statistical Data Analysis, Data Analytics

Contract Senior Biostatistician

2019 - 2021
Undisclosed Pharmaceutical Company
  • Developed, evaluated, and implemented statistical methods for a systematic review and meta-analysis of real-world evidence studies, conducting causal inference, and imputing missing data.
  • Participated as a domain expert in advisory panels and discussed existing approaches for developing and validating risk prediction models using data from multiple sources.
  • Provided critical input on statistical analysis plans, study designs, statistical approaches, results in interpretation, and supported drafting reports and manuscripts.
  • Evaluated the performance of advanced data analysis methods using extensive simulation studies in R and JAGS on the Google Cloud Platform (GCP).
  • Prepared a critical overview of existing statistical methods for synthesizing RCTs and observational data and assessed their strengths and weaknesses.
  • Developed a statistical framework for predicting individualized treatment effects estimates and conducted simulation studies to evaluate their accuracy.
  • Managed several independent consultants to coordinate research and development activities.
  • Developed R code for various advanced statistical methods and maintained updates via GIT.
Technologies: R, RStudio, Biostatistics, Statistics, Meta-analysis, Bayesian Inference & Modeling, Causal Inference, Epidemiology, Clinical Trials, GitHub, Risk Models, Markov Chain Monte Carlo (MCMC) Algorithms, Literature Review, Google Cloud Platform (GCP), Database Design, Statistical Data Analysis, Predictive Modeling, Data Analytics

Contract Senior Data Scientist

2020 - 2020
Infodation B.V.
  • Reviewed an R Shiny application to facilitate project planning and management.
  • Identified and fixed software bugs using GIT versioning.
  • Drafted a technical report with key recommendations for improving the R Shiny application and its long-term sustainability.
  • Managed feedback and input from one independent consultant who maintained the R Shiny software.
Technologies: R, RStudio Shiny, MySQL, JavaScript, Git, Data Science, Database Design, Graphical User Interface (GUI), Data Analytics, Database Analytics

Software Developer Consultant

2007 - 2010
Source NV-SA
  • Maintained the front end and back end of Source NV-SA, which was aquired by the Tech Data Corporation in 2010.
  • Developed and implemented new modules for the content management system (CMS) of the company's main website.
  • Developed a web-based Java tool to support customers in identifying an appropriate backup and sizing solution.
Technologies: Java, C#, SQL, HTML, JavaScript, ASP.NET, Database Design, Graphical User Interface (GUI)

Metamisc | An R Package for Conducting Meta-analysis in Risk Prediction
The open-source and open-access R package Metamisc facilitates frequentist and Bayesian meta-analysis of diagnosis and prognosis research studies.

I was the leading developer and incorporated functions to conduct a multivariate meta-analysis to summarize estimates of prediction model performance (doi:10.1177/0962280218785504) and to evaluate the presence of publication bias (doi:10.1002/jrsm.1266).

The R package was initially developed to facilitate the education of master's degree-level and Ph.D. students and is now mainly used by researchers embarking on a systematic literature review.

In 2022, the R package has been implemented as a formal extension module for the JASP software.

Improving the Generalizability of Risk Models

I secured over €1 million in funding as a scientist to lead and conduct this innovative methodological research. This enabled me to develop, evaluate and implement new statistical methods for risk prediction. These methods have been published in major scientific journals, and allow to improve the generalizability of risk models across multiple settings and populations.

Key references:
2010 - 2013

Master of Science Degree in Epidemiology

Utrecht University - Utrecht, The Netherlands

2009 - 2013

PhD in Epidemiology and Biostatistics

Utrecht University - Utrecht, The Netherlands

2007 - 2009

Master of Science Degree in Artificial Intelligence

Maastricht University - Maastricht, The Netherlands

2004 - 2007

Master of Science Degree in Computer Science

Hogeschool Gent - Gent, Belgium




LaTeX, GitHub, Eclipse IDE, Microsoft PowerPoint, Git, Subversion (SVN)


R, PHP, Java, HTML, JavaScript, SQL, Python, COBOL, C#, CSS


RStudio, Windows, Ubuntu, Linux Mint, Linux, MacOS, Jupyter Notebook, Fedora, Google Cloud Platform (GCP), Amazon Web Services (AWS)


Data Science, Database Design


RStudio Shiny, ASP.NET



Industry Expertise



JAGS, WinBUGS, Training & Training Content Development, Statistics, Machine Learning, Bayesian Inference & Modeling, Biostatistics, Regression, Epidemiology, Causal Inference, Meta-analysis, Risk Models, Monte Carlo Simulations, Health Economics & Outcomes Research (HEOR), Literature Review, Statistical Data Analysis, Predictive Modeling, Data Analytics, Scientific Data Analysis,, Wikis, Clinical Trials, Graphical User Interface (GUI), Database Analytics, Data Mining, Image Processing, Information Retrieval, Signal Processing, Statistical Methods, Statistical Analysis, Markov Model, Markov Chain Monte Carlo (MCMC) Algorithms, Publishing, Data Visualization, Data Analysis, Programming, Big Data, Research, Predictive Analytics, Education, Google Ads, Fintech

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring