Yilong is available for hire

Yilong Li

Verified Expert in Engineering

Data Scientist and Developer

Location

Cambridge, MA, United States

Toptal Member Since

August 12, 2021

Yilong is a seasoned data scientist specialized in oncology and cancer genomics research. He completed his bioinformatics PhD at the University of Cambridge and has several publications in top scientific journals such as Nature, Science and Cell. After that, he worked in various R&D roles in the industry, focusing on genomics algorithm development, bioinformatics data analysis, and machine learning. Yilong follows best programming practices in his coding and data science projects.

Data Analysis Statistical Analysis Statistical Data Analysis Machine Learning Algorithms R Linux Python Perl Bioinformatics Genomic Data Computational Biology

Portfolio

AbbVie Inc.

Single-cell RNA Sequencing, CRISPR/Cas9, Transcriptomics Technologies...

Totient

Biology, CRISPR/Cas9, Machine Learning, Genomics, Python...

Seven Bridges Genomics

Genomics, Python, R, Algorithms, Bioinformatics, Computational Biology...

Experience

R - 12 years Genomics - 12 years Transcriptomics Technologies - 12 years Oncology & Cancer Treatment - 12 years Computational Biology - 12 years Bioinformatics - 12 years CRISPR/Cas9 - 8 years Python - 5 years

Availability

Part-time

Preferred Environment

Python, R, Linux

The most amazing...

...study I've published as part of a large international cancer genome sequencing involved analyzing several terabytes of cancer genome sequencing data.

Work Experience

Senior Scientist | Bioinformatics

2020 - 2021

AbbVie Inc.

Studied cell type changes in immunological diseases using single-cell RNA sequencing.
Analyzed in vivo genome-wide CRISPR/Cas9 knock-out screening data.
Explored differential gene expression data in clinical datasets.

Technologies: Single-cell RNA Sequencing, CRISPR/Cas9, Transcriptomics Technologies, Machine Learning, Computational Biology, Data Science, Statistical Analysis, Statistical Data Analysis, Statistical Modeling

VP | Platform and Collaborations

2017 - 2021

Totient

Used large-scale genomics to identify two new target genes that were entered into drug development programs.
Designed and led the development of a cloud-based data infrastructure for harmonizing and storing genomic data.
Led the development of machine learning algorithms to deconvolute gene expression data from bulk tissue into its constituent cell types.

Technologies: Biology, CRISPR/Cas9, Machine Learning, Genomics, Python, Oncology & Cancer Treatment, R, Computational Biology, Data Science, Data Analysis, Statistical Data Analysis, Statistical Analysis, Statistical Modeling

Principal Scientist | R&D

2016 - 2017

Seven Bridges Genomics

Developed novel bioinformatics algorithms for identifying different genomic patterns (see projects under the Experience section).
Created a suite of quality control algorithms for production-grade analysis of whole-genome sequencing data.
Built an algorithm for memoizing scientific data analysis workflows (see "Detection of Insufficient Homology Regions in a Reference Sequence" project under the Experience section).

Technologies: Genomics, Python, R, Algorithms, Bioinformatics, Computational Biology, Data Science, Data Analysis, Statistical Data Analysis, Statistical Modeling, Statistical Analysis

Research Assistant | Bioinformatics

2010 - 2011

University of Helsinki

Developed an early somatic exome sequencing pipeline for analyzing cancer samples.
Performed somatic structural variation analysis using cancer whole-genome sequencing data during my master's degree project.
Analyzed a gene expression microarray and somatic copy number data in cancer samples.

Technologies: Biology, Genomics, Linux, R, Perl, Computational Biology

Experience

Algorithm for Detecting Repeated Genomic Regions

https://patents.google.com/patent/US20190214110A1/

I developed an algorithm for detecting regions in the human genome that are repeated and thus error-prone. I conceived the method and implemented it in Python.

A patent for the method has been filed (see the link mentioned above).

Detection of Insufficient Homology Regions in a Reference Sequence

https://patents.google.com/patent/US10545792B2/

An algorithm for memoizing scientific data analysis workflows.

I designed an algorithm for using a Merkle tree-like data structure to track the provenance of a workflow's intermediate files and final results. The memoization algorithm allows interrupting workflows to be rapidly restarted. Furthermore, uniquely identifying object hashes allow intermediate and final data files to be stored platform-wide, allowing redundant computational steps to be avoided in a completely transparent fashion.

Education

2011 - 2015

Ph.D in Bioinformatics and Cancer Genomics (Conferred by the University of Cambridge)

The Wellcome Sanger Institute - Cambridge, UK

2010 - 2011

Master's Degree in Bioinformatics

University of Helsinki - Helsinki, Finland

Skills

Paradigms

Data Science

Industry Expertise

Bioinformatics

Platforms

Linux

Languages

Python, R, Perl

Other

Biology, Transcriptomics Technologies, Genomics, Computational Biology, Molecular Biology, Data Analysis, Single-cell RNA Sequencing, CRISPR/Cas9, Machine Learning, Algorithms, Oncology & Cancer Treatment, Workflow, Statistical Analysis, Statistical Data Analysis, Statistical Modeling

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring