Rajat Singhania
Verified Expert in Engineering
Bioinformatics Engineer and Developer
Toronto, ON, Canada
Toptal member since July 20, 2023
Rajat is an experienced bioinformatician specializing in next generation sequencing (NGS) and data analysis. Proficient in NGS data processing from demultiplexing to mutation calling, he excels in pipeline tool selection and optimization. Rajat is an expert in using machine-learning methodologies for cancer-type classification. His work with LifeLabs Genetics benefited hundreds of patients, and he co-first authored a Nature paper that launched Adela, a startup focusing on early cancer detection.
Portfolio
Experience
- R - 15 years
- Bioinformatics - 12 years
- BEDtools - 10 years
- Bash - 10 years
- Linux HPC - 5 years
- SAMtools - 5 years
- Machine Learning - 4 years
- Next-generation Sequencing - 3 years
Availability
Preferred Environment
Linux, R, Bash
The most amazing...
...thing I've launched is an NGS variant calling pipeline for LifeLabs Genetics, a leading diagnostics company in Canada, that has benefited hundreds of patients.
Work Experience
Bioinformatics Scientist
Adela Bio
- Developed and implemented a custom pipeline to process raw NGS data from samples run through the novel cell-free methylated DNA immunoprecipitation-sequencing (cfMeDIP-seq) assay to create the company's ctDNA-based oncology liquid biopsy product.
- Selected among various vendors the platform to run the pipeline on as per business and technical needs, saving the company around $100,000 in licensing costs per year.
- Conducted extensive performance evaluation of the tools available for each pipeline component to choose the best ones, decreasing run time per sample by around 50%.
- Optimized the AWS clusters for each pipeline component to reduce cost while maintaining reasonable run time, reducing processing cost per sample by more than 50%.
- Detailed documentation and threshold selection of quality control (QC) metric output from the pipeline.
- Developed and implemented the tools and workflows for ML-based cancer classification calls using the downstream outputs of the processing pipeline.
- Conducted various custom analyses for and troubleshot errors introduced by a wet lab.
- Received Adela's CEO Award for outstanding performance and contributions.
Bioinformatics Programmer
LifeLabs Genetics
- Built an NGS-based DNA mutation and copy number variation (CNV) detection pipeline from upstream steps requiring raw data analyses to downstream steps, including variant calling and helping clinical scientists do variant filtering and interpretation.
- Acted as a key player in helping the company launch its hereditary diagnostics product covering panels from cancers to rare diseases, and participated in validation and QC of the bioinformatics components.
- Ensured the product met the regulatory requirements of the Clinical Laboratory Improvement Amendments (CLIA) and the College of American Pathologists (CAP).
De Carvalho Lab Scientific Associate and Postdoctoral Fellow
Princess Margaret Cancer Center, University Health Network
- Co-first authored a Nature paper demonstrating the potential of the novel cfMeDIP-Seq assay combined with ML to detect and classify tumors using DNA methylation marks from circulating cell-free tumor DNA.
- Conducted computational analysis of epigenetic mechanisms in cancer, in particular DNA methylation.
- Used various bioinformatics tools and methods to process NGS data from standard assays such as RNA-Seq, ChIP-Seq, and ATAC-Seq and interpret results.
- Provided customized analyses as needed by experimental colleagues and collaborators.
Experience
LifeLabs Genetics | Hereditary Diagnostics Product
Various bioinformatics tools were used, including GATK, BWA, SAMtools, BEDtools, and IGV, along with the HGMD and Alamut databases. The primary programming language used was R, and bash scripting was done, too. The version control system Git was used. The project also involved communicating with various internal and external stakeholders.
This product impacted the care of hundreds of patients from referring physicians and hospitals. With the bioinformatics pipeline that formed the essence of this product, actionable variants and the associated disease were identified for personalized care.
Validation of a Novel Liquid Biopsy Approach
https://pubmed.ncbi.nlm.nih.gov/30429608/• Developing the bioinformatics pipeline that processed the raw NGS data from samples run with this assay.
• Creating a machine learning approach by conducting cross-validation and choosing the best technique among a generalized linear model (GLM), support vector machine (SVM), and random forest that showed the high sensitivity and specificity of the assay.
A variety of bioinformatics tools were used for the execution of this project, including BEDtools, Bowtie 2, IGV, MEDIPS, and DESeq2. The main programming language used was R, and bash scripting was done. Data analyses included dimension reduction techniques such as PCA and t-SNE. A Sun Grid Engine HPC cluster for PBS job scheduling was used.
This Nature paper formed the basis for the launch of the startup Adela where I made further contributions that earned me the CEO Award in 2022.
Education
PhD in Genetics, Bioinformatics, and Computational Biology
Virginia Polytechnic Institute - Blacksburg, VA, United States
Bachelor's Degree in Computer Science
Virginia Polytechnic Institute - Blacksburg, VA, United States
Certifications
Bioinformatics of Genomic Medicine
Canadian Bioinformatics Workshop
Informatics on High-throughput Sequencing Data
Canadian Bioinformatics Workshop
Machine Learning
Stanford University | via Coursera
Skills
Tools
SAMtools, Integrative Genomics Viewer (IGV), FastQC, Git, MEDIPS, MultiQC, Jira, Sun Grid Engine, MATLAB, BamTools
Industry Expertise
Bioinformatics
Languages
R, Bash, C++, Perl, Python
Platforms
Linux, Docker, Databricks, Alissa Interpret, Alamut Visual Plus
Other
BEDtools, HPCC Systems, Biological Systems Modeling, Computer Science, Machine Learning, DESeq2, Fastp, Bowtie 2, UMI-tools, Preseq, Illumina BaseSpace, K-means Clustering, Hierarchical Clustering, UCSC Genome Browser, Next-generation Sequencing, Linux HPC, Variant Calling, Statistics, Genomics, Biotechnology, Dragen, Nextflow, Bcl2fastq, BCFtools, BEDOPS, MACS2, DNA Sequencing, BWA, Megalodon, Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP), T-distributed Stochastic Neighbor Embedding (t-SNE), ComBat-seq, GATK, Human Gene Mutation Database (HGMD), The Cancer Genome Atlas (TCGA)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring