Yilong Li, Data Scientist and Developer in Cambridge, MA, United States
Yilong Li

Data Scientist and Developer in Cambridge, MA, United States

Member since July 29, 2021
Yilong is a seasoned data scientist specialized in oncology and cancer genomics research. He completed his bioinformatics PhD at the University of Cambridge and has several publications in top scientific journals such as Nature, Science and Cell. After that, he worked in various R&D roles in the industry, focusing on genomics algorithm development, bioinformatics data analysis, and machine learning. Yilong follows best programming practices in his coding and data science projects.
Yilong is now available for hire

Portfolio

  • AbbVie Inc.
    Single-cell RNA-seq, CRISPR/Cas9, Transcriptomics, Machine Learning...
  • Totient
    Biology, CRISPR/Cas9, Machine Learning, Genomics, Python...
  • Seven Bridges Genomics
    Genomics, Python, R, Algorithms, Bioinformatics, Computational Biology...

Experience

  • R 12 years
  • Genomics 12 years
  • Transcriptomics 12 years
  • Oncology & Cancer Treatment 12 years
  • Computational Biology 12 years
  • Bioinformatics 12 years
  • CRISPR/Cas9 8 years
  • Python 5 years

Location

Cambridge, MA, United States

Availability

Part-time

Preferred Environment

Python, R, Linux

The most amazing...

...study I've published as part of a large international cancer genome sequencing involved analyzing several terabytes of cancer genome sequencing data.

Employment

  • Senior Scientist | Bioinformatics

    2020 - 2021
    AbbVie Inc.
    • Studied cell type changes in immunological diseases using single-cell RNA sequencing.
    • Analyzed in vivo genome-wide CRISPR/Cas9 knock-out screening data.
    • Explored differential gene expression data in clinical datasets.
    Technologies: Single-cell RNA-seq, CRISPR/Cas9, Transcriptomics, Machine Learning, Computational Biology, Data Science, Statistical Analysis, Statistical Data Analysis, Statistical Modeling
  • VP | Platform and Collaborations

    2017 - 2021
    Totient
    • Used large-scale genomics to identify two new target genes that were entered into drug development programs.
    • Designed and led the development of a cloud-based data infrastructure for harmonizing and storing genomic data.
    • Led the development of machine learning algorithms to deconvolute gene expression data from bulk tissue into its constituent cell types.
    Technologies: Biology, CRISPR/Cas9, Machine Learning, Genomics, Python, Oncology & Cancer Treatment, R, Computational Biology, Data Science, Data Analysis, Statistical Data Analysis, Statistical Analysis, Statistical Modeling
  • Principal Scientist | R&D

    2016 - 2017
    Seven Bridges Genomics
    • Developed novel bioinformatics algorithms for identifying different genomic patterns (see projects under the Experience section).
    • Created a suite of quality control algorithms for production-grade analysis of whole-genome sequencing data.
    • Built an algorithm for memoizing scientific data analysis workflows (see "Detection of Insufficient Homology Regions in a Reference Sequence" project under the Experience section).
    Technologies: Genomics, Python, R, Algorithms, Bioinformatics, Computational Biology, Data Science, Data Analysis, Statistical Data Analysis, Statistical Modeling, Statistical Analysis
  • Research Assistant | Bioinformatics

    2010 - 2011
    University of Helsinki
    • Developed an early somatic exome sequencing pipeline for analyzing cancer samples.
    • Performed somatic structural variation analysis using cancer whole-genome sequencing data during my master's degree project.
    • Analyzed a gene expression microarray and somatic copy number data in cancer samples.
    Technologies: Biology, Genomics, Linux, R, Perl, Computational Biology

Experience

  • Algorithm for Detecting Repeated Genomic Regions
    https://patents.google.com/patent/US20190214110A1/

    I developed an algorithm for detecting regions in the human genome that are repeated and thus error-prone. I conceived the method and implemented it in Python.

    A patent for the method has been filed (see the link mentioned above).

  • Detection of Insufficient Homology Regions in a Reference Sequence
    https://patents.google.com/patent/US10545792B2/

    An algorithm for memoizing scientific data analysis workflows.

    I designed an algorithm for using a Merkle tree-like data structure to track the provenance of a workflow's intermediate files and final results. The memoization algorithm allows interrupting workflows to be rapidly restarted. Furthermore, uniquely identifying object hashes allow intermediate and final data files to be stored platform-wide, allowing redundant computational steps to be avoided in a completely transparent fashion.

Skills

  • Paradigms

    Data Science
  • Industry Expertise

    Bioinformatics
  • Other

    Biology, Transcriptomics, Genomics, Computational Biology, Molecular Biology, Data Analysis, Single-cell RNA-seq, CRISPR/Cas9, Machine Learning, Algorithms, Oncology & Cancer Treatment, Workflow, Statistical Analysis, Statistical Data Analysis, Statistical Modeling
  • Languages

    Python, R, Perl
  • Platforms

    Linux

Education

  • Ph.D in Bioinformatics and Cancer Genomics (Conferred by the University of Cambridge)
    2011 - 2015
    The Wellcome Sanger Institute - Cambridge, UK
  • Master's Degree in Bioinformatics
    2010 - 2011
    University of Helsinki - Helsinki, Finland

To view more profiles

Join Toptal
Share it with others