Charles Yee
Verified Expert in Product Management
Product Manager
Charles is an industry leader in healthcare NLP with over a decade of experience as a data-science technical manager. As a player-coach, Charles has led multimillion-dollar projects in health-tech Fortune 500s, such as United Health, Philips, and AstraZeneca. Charles also has a PhD in computational linguistics. With over a dozen publications and patents in his field, Charles is an authority in machine learning for named entity extraction and classification.
Project Highlights
Expertise
Work Experience
AI Machine Learning Director
Inovalon
- Collaborated with the CTO on the company data analytics business strategy.
- Managed team members from various services, including payer, provider, and pharmacy business units.
- Consolidated data from multiple company databases and built deep learning models to serve multiple business use cases.
Principal Scientist
AstraZeneca
- Spearheaded R&D in COVID-19 vaccine adverse event detection and prevention.
- Oversaw AZD1222's pharmacovigilance reporting, particularly those related to coagulopathy and thrombocytopenia. Results directly reached corporate executives and regulatory agencies.
- Implemented microservices to discover and detect vaccine production lots related to unexpected outcomes.
- Developed automation tools for relevant entity detection of drug-induced liver injury used by IQ DILI Consortium.
Data Science Consultant
Insidetracker
- Preprocessed Insidetracker's Fitbit dataset, grouping users and dates and their workout activity type.
- Created correlation models and implemented an anomaly detection module to flag abnormal resting heart rates and sleep patterns.
- Enabled the correlations to show the user what type of changes in their habits, like diet and exercise, can have the most impact on improving their health.
- Developed anomaly detection to alert the user of certain live events happening that are affecting their well-being.
Director of Data Science
iQuartic
- Responsible for $4 million in revenue by managing and expanding the company's front- and back-end microservice architecture (health insurance risk adjustment. streamline the daily tagging of 10,000 pages of electronic medical records with ICD10).
- Led and supervised seven contractors and direct reports and served as the product owner for all NLP and machine learning-related company offerings including EMR optical character recognition, handwriting detection, and NLP disease term extraction.
- Spearhead code reviews, oversaw the evaluation and hyperparameter tuning of deep learning models, and orchestrated CI/CD, end-to-end user acceptance (UAT), and operational acceptance testing (OAT).
- Recruited, tech-screened, and facilitated onboarding for full-stack developers, NLP engineers, data scientists, and DevOps.
Senior Biomedical Informatics Scientist | Project Leader
Philips
- Led a research team at Philips Healthcare and delivered all of Philips’ oncology informatics and NLP solutions. Responsible for IntelliSpace clinical trial matching SaaS, grossing $6 million in annual revenue.
- Served as the tech lead for an NLP algorithm for prototype clinical trial matching tool. Provided direction and hands-on expertise to key features such as clinical phenotype named entity recognition using the long short-term memory neural network.
- Set up research exhibits at partner hospitals (MD Anderson, Dana-Farber, and Westchester Medical Center); benchmarking product performances and usability with clinicians.
- Provided business development insights by leveraging technological know-how to internal ventures and hospital customers.
- Guided the company's oncology solution vision and strategy, capitalizing on cutting-edge deep learning/neural network methods.
Co-founder | CTO
Twyla
- Co-founded a Series-B startup that delivered chatbot AI architecture design (finite-state automaton with hybrid transition models) via both rule-based pattern recognition and ML that approximates semantic similarity with historical chat logs.
- Implemented linguistic pre-processing, textual feature selection, and extraction by utilizing Scikit-learn, build regression, gradient boosting, random forest modules for customer chat intention detection.
- Provided big data analysis (NGram, TF-IDF, cosine, Word2Vec) to enterprise clients including T-Mobile, HTC, Heineken, Cebu Pacific, which yielded insights on their customer behavior, product issues, and marketability.
NLP Engineer
United Health Group
- Designed ontology-consistent feature structures and syntax-semantic interface to capture and harvest new concepts from unstructured data ranging from physician notes, claims, EMR, and EHR, spanning over 130 million American patients.
- Specialized in extracting concepts ranging from a genetic mutation, chromosomal structural rearrangement, multiple myeloma, cancer staging, and tumor sizes. Other topics included neurostimulators, various pain scores, tumors, and pain locations.
- Preprocessed, trained, and conducted diagnostics on Support Vector Machine (SVM) classification solutions for linguistic issues related to EMR and physician notes and prescriptions.
- Worked on drug change action rationale, such as cost, side effects, efficacy, sentence boundary versus abbreviation recognition, generalities versus patient-centric data, and more.
Project History
LSTM-enabled Clinical Trial Matching for Precision Medicine
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6568095/Led a team of eight cross-functional specialists to develop and deploy Philips' Intellispace Genomics product, including a US $6 million project, clinical trial matching for cancer patients, at the nation's tier-one cancer centers.
With my team, I evolved a naive, Elasticsearch approach to a pipeline using a hybrid of named entity recognition (NER) and logical satisfiability theory. We successfully trained a long short-term memory neural network (LSTM) with a conditional random field (CRF) output layer using clinical domain-informed corpora as word embedding.
As a result of our work, my team successfully achieved more than 95% accuracy in automated clinical trial matching—as validated by pathologists and oncologists.
The project yielded impressive results that saved tremendous time for clinicians. It is now a commercial success, replacing IBM Watson and deployed as part of the Philips Intellispace Genomics solution at the nation's top cancer institutions such as Dana-Farber, MD Anderson, and Boston Children's hospital.
Insurance Billing Code Extraction through Hybrid NLP Approaches
ICD10 code extraction is an essential component of insurance risk-adjustment in the United States private insurance industry. We built a system that processes daily >10 k Electronic Health Records (EHR), while classifying over 70k different ICD's
A Method and Apparatus for Genome Spelling Correction and Acronym Standardization
Developed a genomic biomarker spelling correction used in a clinical trial for entity detection, commercially used in IntelliSpace Precision Medicine(ISPM) platform at various cancer hospital sites around the USA.
The method included the steps of performing pre-processing on a sentence:
1. Storing a first adjacent word to an unknown word and a second adjacent word to the unknown word
2. Generating a plurality of candidate words for the unknown word
3. Forming a plurality of trigrams with the first adjacent word to the unknown word and the second adjacent word to the unknown word and each of the plurality of candidate words
4. Searching a trigram table for each of the plurality of trigrams
5. Outputting the candidate word from the trigram with the highest trigram count in the trigram table
Education
Ph.D. in Natural Language Processing
Universität Stuttgart - Stuttgart, Germany
Master's Degree in Natural Language Processing
King's College London - London, United Kingdom
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring