Charles is available for hire

Charles Yee

Verified Expert in Product Management

Product Manager

Location

Nashville, TN, United States

Toptal Member Since

August 26, 2020

Charles is an industry leader in healthcare NLP with over a decade of experience as a data-science technical manager. As a player-coach, Charles has led multimillion-dollar projects in health-tech Fortune 500s, such as United Health, Philips, and AstraZeneca. Charles also has a PhD in computational linguistics. With over a dozen publications and patents in his field, Charles is an authority in machine learning for named entity extraction and classification.

Artificial Intelligence (AI)Data Analytics Software as a Service (SaaS)Project Management Product Management Healthcare Product Manager SaaS Product Management B2B Product Management Healthcare Agile Agile Project Management Jira Agile Product Management Product Owner Google Cloud Platform (GCP)

Project Highlights

LSTM-enabled Clinical Trial Matching for Precision Medicine

Led a team of eight cross-functional specialists to develop and deploy Philips' Intellispace Genomics product, including a US $6 million project, clinical trial matching for cancer patients, at the nation's tier-one cancer centers.

Expertise

Data Science Deep Learning GPT Generative Pre-trained Transformers (GPT)Healthcare IT Machine Learning Natural Language Processing (NLP)Python 3

Work Experience

AI Machine Learning Director

2023 - PRESENT

Inovalon

Collaborated with the CTO on the company data analytics business strategy.
Managed team members from various services, including payer, provider, and pharmacy business units.
Consolidated data from multiple company databases and built deep learning models to serve multiple business use cases.

Principal Scientist

2021 - 2023

AstraZeneca

Spearheaded R&D in COVID-19 vaccine adverse event detection and prevention.
Oversaw AZD1222's pharmacovigilance reporting, particularly those related to coagulopathy and thrombocytopenia. Results directly reached corporate executives and regulatory agencies.
Implemented microservices to discover and detect vaccine production lots related to unexpected outcomes.
Developed automation tools for relevant entity detection of drug-induced liver injury used by IQ DILI Consortium.

Data Science Consultant

2021 - 2022

Insidetracker

Preprocessed Insidetracker's Fitbit dataset, grouping users and dates and their workout activity type.
Created correlation models and implemented an anomaly detection module to flag abnormal resting heart rates and sleep patterns.
Enabled the correlations to show the user what type of changes in their habits, like diet and exercise, can have the most impact on improving their health.
Developed anomaly detection to alert the user of certain live events happening that are affecting their well-being.

Director of Data Science

2019 - 2020

iQuartic

Responsible for $4 million in revenue by managing and expanding the company's front- and back-end microservice architecture (health insurance risk adjustment. streamline the daily tagging of 10,000 pages of electronic medical records with ICD10).
Led and supervised seven contractors and direct reports and served as the product owner for all NLP and machine learning-related company offerings including EMR optical character recognition, handwriting detection, and NLP disease term extraction.
Spearhead code reviews, oversaw the evaluation and hyperparameter tuning of deep learning models, and orchestrated CI/CD, end-to-end user acceptance (UAT), and operational acceptance testing (OAT).
Recruited, tech-screened, and facilitated onboarding for full-stack developers, NLP engineers, data scientists, and DevOps.

Senior Biomedical Informatics Scientist | Project Leader

2016 - 2019

Philips

Led a research team at Philips Healthcare and delivered all of Philips’ oncology informatics and NLP solutions. Responsible for IntelliSpace clinical trial matching SaaS, grossing $6 million in annual revenue.
Served as the tech lead for an NLP algorithm for prototype clinical trial matching tool. Provided direction and hands-on expertise to key features such as clinical phenotype named entity recognition using the long short-term memory neural network.
Set up research exhibits at partner hospitals (MD Anderson, Dana-Farber, and Westchester Medical Center); benchmarking product performances and usability with clinicians.
Provided business development insights by leveraging technological know-how to internal ventures and hospital customers.
Guided the company's oncology solution vision and strategy, capitalizing on cutting-edge deep learning/neural network methods.

Co-founder | CTO

2016 - 2018

Twyla

Co-founded a Series-B startup that delivered chatbot AI architecture design (finite-state automaton with hybrid transition models) via both rule-based pattern recognition and ML that approximates semantic similarity with historical chat logs.
Implemented linguistic pre-processing, textual feature selection, and extraction by utilizing Scikit-learn, build regression, gradient boosting, random forest modules for customer chat intention detection.
Provided big data analysis (NGram, TF-IDF, cosine, Word2Vec) to enterprise clients including T-Mobile, HTC, Heineken, Cebu Pacific, which yielded insights on their customer behavior, product issues, and marketability.

NLP Engineer

2014 - 2016

United Health Group

Designed ontology-consistent feature structures and syntax-semantic interface to capture and harvest new concepts from unstructured data ranging from physician notes, claims, EMR, and EHR, spanning over 130 million American patients.
Specialized in extracting concepts ranging from a genetic mutation, chromosomal structural rearrangement, multiple myeloma, cancer staging, and tumor sizes. Other topics included neurostimulators, various pain scores, tumors, and pain locations.
Preprocessed, trained, and conducted diagnostics on Support Vector Machine (SVM) classification solutions for linguistic issues related to EMR and physician notes and prescriptions.
Worked on drug change action rationale, such as cost, side effects, efficacy, sentence boundary versus abbreviation recognition, generalities versus patient-centric data, and more.

Project History

LSTM-enabled Clinical Trial Matching for Precision Medicine

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6568095/

The problem of clinical trial matching is to extract the relevant eligibility criteria from more than 40,000 clinical trial protocols and then match them to the given cancer patient's profile. The relevant matching criteria include cancer type, staging, genetic mutations, the patient's demographics, and comorbidities. This is an extremely complex NLP problem applied across big data.

With my team, I evolved a naive, Elasticsearch approach to a pipeline using a hybrid of named entity recognition (NER) and logical satisfiability theory. We successfully trained a long short-term memory neural network (LSTM) with a conditional random field (CRF) output layer using clinical domain-informed corpora as word embedding.

As a result of our work, my team successfully achieved more than 95% accuracy in automated clinical trial matching—as validated by pathologists and oncologists.

The project yielded impressive results that saved tremendous time for clinicians. It is now a commercial success, replacing IBM Watson and deployed as part of the Philips Intellispace Genomics solution at the nation's top cancer institutions such as Dana-Farber, MD Anderson, and Boston Children's hospital.

Insurance Billing Code Extraction through Hybrid NLP Approaches

ICD10 code extraction is an essential component of insurance risk-adjustment in the United States private insurance industry. We built a system that processes daily >10 k Electronic Health Records (EHR), while classifying over 70k different ICD's

Charles led a team of seven to build a three-layered hybrid NLP ICD10 extraction microservice infrastructure. First, all pdf scans of EHR are OCR'ed into text using Tesseract. The content of EHR's is stored in JSON format in a MongoDB. These texts then go through a rule-based, hashmap-like system where every clinical term, acronyms, synonyms are converted to their subsequent ICD10 code. However, this alone has a high false-positive rate, albeit an extremely high recall (>98%). A layer of Transformer network using ClincalBERT and BioBERT is then applied on top of the rule-based output to perform binary classification, eliminating false positives to provide highly accurate ICD10 that is context-sensitive and clinically relevant.

A Method and Apparatus for Genome Spelling Correction and Acronym Standardization

Developed a genomic biomarker spelling correction used in a clinical trial for entity detection, commercially used in IntelliSpace Precision Medicine(ISPM) platform at various cancer hospital sites around the USA.

Various embodiments relate to a method and non-transitory computer-readable medium for genome spelling correction.

The method included the steps of performing pre-processing on a sentence:
1. Storing a first adjacent word to an unknown word and a second adjacent word to the unknown word
2. Generating a plurality of candidate words for the unknown word
3. Forming a plurality of trigrams with the first adjacent word to the unknown word and the second adjacent word to the unknown word and each of the plurality of candidate words
4. Searching a trigram table for each of the plurality of trigrams
5. Outputting the candidate word from the trigram with the highest trigram count in the trigram table

Education

2004 - 2010

Ph.D. in Natural Language Processing

Universität Stuttgart - Stuttgart, Germany

2001 - 2003

Master's Degree in Natural Language Processing

King's College London - London, United Kingdom

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring