Rajat Singhania, Developer in Toronto, ON, Canada
Rajat is available for hire
Hire Rajat

Rajat Singhania

Verified Expert  in Engineering

Bioinformatics Engineer and Developer

Toronto, ON, Canada

Toptal member since July 20, 2023

Bio

Rajat is an experienced bioinformatician specializing in next generation sequencing (NGS) and data analysis. Proficient in NGS data processing from demultiplexing to mutation calling, he excels in pipeline tool selection and optimization. Rajat is an expert in using machine-learning methodologies for cancer-type classification. His work with LifeLabs Genetics benefited hundreds of patients, and he co-first authored a Nature paper that launched Adela, a startup focusing on early cancer detection.

Portfolio

Adela Bio
R, Next-generation Sequencing, Machine Learning, Bash, Git, Docker, Nextflow...
LifeLabs Genetics
R, Bash, Alissa Interpret, GATK, Integrative Genomics Viewer (IGV), BWA...
Princess Margaret Cancer Center, University Health Network
R, Bash, Integrative Genomics Viewer (IGV), BEDtools, Bowtie 2...

Experience

  • R - 15 years
  • Bioinformatics - 12 years
  • BEDtools - 10 years
  • Bash - 10 years
  • Linux HPC - 5 years
  • SAMtools - 5 years
  • Machine Learning - 4 years
  • Next-generation Sequencing - 3 years

Availability

Part-time

Preferred Environment

Linux, R, Bash

The most amazing...

...thing I've launched is an NGS variant calling pipeline for LifeLabs Genetics, a leading diagnostics company in Canada, that has benefited hundreds of patients.

Work Experience

Bioinformatics Scientist

2021 - 2023
Adela Bio
  • Developed and implemented a custom pipeline to process raw NGS data from samples run through the novel cell-free methylated DNA immunoprecipitation-sequencing (cfMeDIP-seq) assay to create the company's ctDNA-based oncology liquid biopsy product.
  • Selected among various vendors the platform to run the pipeline on as per business and technical needs, saving the company around $100,000 in licensing costs per year.
  • Conducted extensive performance evaluation of the tools available for each pipeline component to choose the best ones, decreasing run time per sample by around 50%.
  • Optimized the AWS clusters for each pipeline component to reduce cost while maintaining reasonable run time, reducing processing cost per sample by more than 50%.
  • Detailed documentation and threshold selection of quality control (QC) metric output from the pipeline.
  • Developed and implemented the tools and workflows for ML-based cancer classification calls using the downstream outputs of the processing pipeline.
  • Conducted various custom analyses for and troubleshot errors introduced by a wet lab.
  • Received Adela's CEO Award for outstanding performance and contributions.
Technologies: R, Next-generation Sequencing, Machine Learning, Bash, Git, Docker, Nextflow, Bcl2fastq, SAMtools, BamTools, BCFtools, BEDOPS, BEDtools, DESeq2, MACS2, DNA Sequencing, MEDIPS, Fastp, BWA, Bowtie 2, UMI-tools, Integrative Genomics Viewer (IGV), FastQC, MultiQC, Preseq, Megalodon, Databricks, Illumina BaseSpace, Python, K-means Clustering, Hierarchical Clustering, Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), T-distributed Stochastic Neighbor Embedding (t-SNE), ComBat-seq, Jira, Bioinformatics, UCSC Genome Browser, Statistics, Genomics, Biotechnology, Dragen

Bioinformatics Programmer

2018 - 2021
LifeLabs Genetics
  • Built an NGS-based DNA mutation and copy number variation (CNV) detection pipeline from upstream steps requiring raw data analyses to downstream steps, including variant calling and helping clinical scientists do variant filtering and interpretation.
  • Acted as a key player in helping the company launch its hereditary diagnostics product covering panels from cancers to rare diseases, and participated in validation and QC of the bioinformatics components.
  • Ensured the product met the regulatory requirements of the Clinical Laboratory Improvement Amendments (CLIA) and the College of American Pathologists (CAP).
Technologies: R, Bash, Alissa Interpret, GATK, Integrative Genomics Viewer (IGV), BWA, SAMtools, BEDtools, Alamut Visual Plus, Human Gene Mutation Database (HGMD), Linux HPC, Git, Linux, Bioinformatics, Bcl2fastq, Next-generation Sequencing, Variant Calling, Python, HPCC Systems, Genomics, Biotechnology

De Carvalho Lab Scientific Associate and Postdoctoral Fellow

2012 - 2018
Princess Margaret Cancer Center, University Health Network
  • Co-first authored a Nature paper demonstrating the potential of the novel cfMeDIP-Seq assay combined with ML to detect and classify tumors using DNA methylation marks from circulating cell-free tumor DNA.
  • Conducted computational analysis of epigenetic mechanisms in cancer, in particular DNA methylation.
  • Used various bioinformatics tools and methods to process NGS data from standard assays such as RNA-Seq, ChIP-Seq, and ATAC-Seq and interpret results.
  • Provided customized analyses as needed by experimental colleagues and collaborators.
Technologies: R, Bash, Integrative Genomics Viewer (IGV), BEDtools, Bowtie 2, Machine Learning, MEDIPS, DESeq2, The Cancer Genome Atlas (TCGA), Principal Component Analysis (PCA), T-distributed Stochastic Neighbor Embedding (t-SNE), Linux HPC, Linux, Bioinformatics, Next-generation Sequencing, UCSC Genome Browser, Uniform Manifold Approximation and Projection (UMAP), Hierarchical Clustering, K-means Clustering, FastQC, MultiQC, MACS2, HPCC Systems, Statistics, Genomics, Biotechnology, Sun Grid Engine

LifeLabs Genetics | Hereditary Diagnostics Product

I was a key player in helping the company launch its hereditary diagnostics product covering panels from cancers to rare diseases. I built an NGS-based DNA mutation and CNV detection pipeline from upstream steps requiring raw data analyses to downstream steps, including variant calling. Also, I helped clinical scientists do variant filtering and interpretation. I was heavily involved in validating and QC of all the pipeline components and ensuring that all CAP and CLIA regulatory requirements were met. This project also involved building and validating bioinformatically a custom panel of genes covering the various disease conditions.

Various bioinformatics tools were used, including GATK, BWA, SAMtools, BEDtools, and IGV, along with the HGMD and Alamut databases. The primary programming language used was R, and bash scripting was done, too. The version control system Git was used. The project also involved communicating with various internal and external stakeholders.

This product impacted the care of hundreds of patients from referring physicians and hospitals. With the bioinformatics pipeline that formed the essence of this product, actionable variants and the associated disease were identified for personalized care.

Validation of a Novel Liquid Biopsy Approach

https://pubmed.ncbi.nlm.nih.gov/30429608/
During my postdoc at the prestigious Princess Margaret Cancer Center, I was a co-first author of a Nature paper that showed the utility of the novel cfMeDIP-seq assay developed by Professor Daniel De Carvalho in detecting and classifying cancer from a patient's cell-free DNA that can be obtained via a simple blood draw. I showed the proof of principle of this liquid biopsy approach by:

• Developing the bioinformatics pipeline that processed the raw NGS data from samples run with this assay.
• Creating a machine learning approach by conducting cross-validation and choosing the best technique among a generalized linear model (GLM), support vector machine (SVM), and random forest that showed the high sensitivity and specificity of the assay.

A variety of bioinformatics tools were used for the execution of this project, including BEDtools, Bowtie 2, IGV, MEDIPS, and DESeq2. The main programming language used was R, and bash scripting was done. Data analyses included dimension reduction techniques such as PCA and t-SNE. A Sun Grid Engine HPC cluster for PBS job scheduling was used.

This Nature paper formed the basis for the launch of the startup Adela where I made further contributions that earned me the CEO Award in 2022.
2005 - 2011

PhD in Genetics, Bioinformatics, and Computational Biology

Virginia Polytechnic Institute - Blacksburg, VA, United States

2001 - 2005

Bachelor's Degree in Computer Science

Virginia Polytechnic Institute - Blacksburg, VA, United States

JUNE 2018 - PRESENT

Bioinformatics of Genomic Medicine

Canadian Bioinformatics Workshop

JUNE 2014 - PRESENT

Informatics on High-throughput Sequencing Data

Canadian Bioinformatics Workshop

JUNE 2014 - PRESENT

Machine Learning

Stanford University | via Coursera

Tools

SAMtools, Integrative Genomics Viewer (IGV), FastQC, Git, MEDIPS, MultiQC, Jira, Sun Grid Engine, MATLAB, BamTools

Industry Expertise

Bioinformatics

Languages

R, Bash, C++, Perl, Python

Platforms

Linux, Docker, Databricks, Alissa Interpret, Alamut Visual Plus

Other

BEDtools, HPCC Systems, Biological Systems Modeling, Computer Science, Machine Learning, DESeq2, Fastp, Bowtie 2, UMI-tools, Preseq, Illumina BaseSpace, K-means Clustering, Hierarchical Clustering, UCSC Genome Browser, Next-generation Sequencing, Linux HPC, Variant Calling, Statistics, Genomics, Biotechnology, Dragen, Nextflow, Bcl2fastq, BCFtools, BEDOPS, MACS2, DNA Sequencing, BWA, Megalodon, Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), T-distributed Stochastic Neighbor Embedding (t-SNE), ComBat-seq, GATK, Human Gene Mutation Database (HGMD), The Cancer Genome Atlas (TCGA)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring