Charles Yee, Product Manager in Nashville, TN, United States
Charles is available for hire
Hire Charles

Charles Yee

Verified Expert  in Product Management

Bio

Charles is an industry leader in healthcare NLP with over a decade of experience as a data-science technical manager. As a player-coach, Charles has led multimillion-dollar projects in health-tech Fortune 500s, such as United Health, Philips, and AstraZeneca. Charles also has a PhD in computational linguistics. With over a dozen publications and patents in his field, Charles is an authority in machine learning for named entity extraction and classification.

Project Highlights

LSTM-enabled Clinical Trial Matching for Precision Medicine
Led a team of eight cross-functional specialists to develop and deploy Philips' Intellispace Genomics product, including a US $6 million project, clinical trial matching for cancer patients, at the nation's tier-one cancer centers.

Expertise

Work Experience

AI Machine Learning Director

2023 - PRESENT
Inovalon
  • Provided strategic leadership for the company's 80+ product portfolio. Responsibilities included prototyping and integrating machine learning solutions into various business use cases across provider, payer, and pharmacy business units.
  • Implemented domain-specific Q&A chatbot to assist customers in installing and deploying Inovalon’s flagship pharmacy SaaS product. This utilizes extensive retrieval-augmented generation (RAG), prompt engineering, AWS Bedrock (LLM), and Kendra.
  • Built XGBoost models to predict patients’ medication adherence to flag those not taking their medications (on time) for intervention plan, with 95% accuracy covering hundreds of thousands of patients.
  • Built optimization software to reduce the number of false positives from AWS Comprehend Medical output. This saves the medical coding team significant time and improves the accuracy of insurance risk adjustment.
  • Trained gradient boosting models to minimize the number of “Member not found” errors during insurance eligibility verification. This saves hospital customers money as payers financially penalize large volumes of non-existing member queries.

Principal Scientist

2021 - 2023
AstraZeneca
  • Spearheaded R&D in COVID-19 vaccine adverse event detection and prevention.
  • Oversaw AZD1222's pharmacovigilance reporting, particularly those related to coagulopathy and thrombocytopenia. Results directly reached corporate executives and regulatory agencies.
  • Implemented microservices to discover and detect vaccine production lots related to unexpected outcomes.
  • Developed automation tools for relevant entity detection of drug-induced liver injury used by IQ DILI Consortium.

Data Science Consultant

2021 - 2022
Insidetracker
  • Preprocessed Insidetracker's Fitbit dataset, grouping users and dates and their workout activity type.
  • Created correlation models and implemented an anomaly detection module to flag abnormal resting heart rates and sleep patterns.
  • Enabled the correlations to show the user what type of changes in their habits, like diet and exercise, can have the most impact on improving their health.
  • Developed anomaly detection to alert the user of certain live events happening that are affecting their well-being.

Director of Data Science

2019 - 2020
iQuartic
  • Responsible for $4 million in revenue by managing and expanding the company's front- and back-end microservice architecture (health insurance risk adjustment. streamline the daily tagging of 10,000 pages of electronic medical records with ICD10).
  • Led and supervised seven contractors and direct reports and served as the product owner for all NLP and machine learning-related company offerings including EMR optical character recognition, handwriting detection, and NLP disease term extraction.
  • Spearhead code reviews, oversaw the evaluation and hyperparameter tuning of deep learning models, and orchestrated CI/CD, end-to-end user acceptance (UAT), and operational acceptance testing (OAT).
  • Recruited, tech-screened, and facilitated onboarding for full-stack developers, NLP engineers, data scientists, and DevOps.

Senior Biomedical Informatics Scientist | Project Leader

2016 - 2019
Philips
  • Led a research team at Philips Healthcare and delivered all of Philips’ oncology informatics and NLP solutions. Responsible for IntelliSpace clinical trial matching SaaS, grossing $6 million in annual revenue.
  • Served as the tech lead for an NLP algorithm for prototype clinical trial matching tool. Provided direction and hands-on expertise to key features such as clinical phenotype named entity recognition using the long short-term memory neural network.
  • Set up research exhibits at partner hospitals (MD Anderson, Dana-Farber, and Westchester Medical Center); benchmarking product performances and usability with clinicians.
  • Provided business development insights by leveraging technological know-how to internal ventures and hospital customers.
  • Guided the company's oncology solution vision and strategy, capitalizing on cutting-edge deep learning/neural network methods.

Co-founder | CTO

2016 - 2018
Twyla
  • Co-founded a Series-B startup that delivered chatbot AI architecture design (finite-state automaton with hybrid transition models) via both rule-based pattern recognition and ML that approximates semantic similarity with historical chat logs.
  • Implemented linguistic pre-processing, textual feature selection, and extraction by utilizing Scikit-learn, build regression, gradient boosting, random forest modules for customer chat intention detection.
  • Provided big data analysis (NGram, TF-IDF, cosine, Word2Vec) to enterprise clients including T-Mobile, HTC, Heineken, Cebu Pacific, which yielded insights on their customer behavior, product issues, and marketability.

NLP Engineer

2014 - 2016
United Health Group
  • Designed ontology-consistent feature structures and syntax-semantic interface to capture and harvest new concepts from unstructured data ranging from physician notes, claims, EMR, and EHR, spanning over 130 million American patients.
  • Specialized in extracting concepts ranging from a genetic mutation, chromosomal structural rearrangement, multiple myeloma, cancer staging, and tumor sizes. Other topics included neurostimulators, various pain scores, tumors, and pain locations.
  • Preprocessed, trained, and conducted diagnostics on Support Vector Machine (SVM) classification solutions for linguistic issues related to EMR and physician notes and prescriptions.
  • Worked on drug change action rationale, such as cost, side effects, efficacy, sentence boundary versus abbreviation recognition, generalities versus patient-centric data, and more.

LSTM-enabled Clinical Trial Matching for Precision Medicine

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6568095/

Led a team of eight cross-functional specialists to develop and deploy Philips' Intellispace Genomics product, including a US $6 million project, clinical trial matching for cancer patients, at the nation's tier-one cancer centers.

The problem of clinical trial matching is to extract the relevant eligibility criteria from more than 40,000 clinical trial protocols and then match them to the given cancer patient's profile. The relevant matching criteria include cancer type, staging, genetic mutations, the patient's demographics, and comorbidities. This is an extremely complex NLP problem applied across big data.

With my team, I evolved a naive, Elasticsearch approach to a pipeline using a hybrid of named entity recognition (NER) and logical satisfiability theory. We successfully trained a long short-term memory neural network (LSTM) with a conditional random field (CRF) output layer using clinical domain-informed corpora as word embedding.

As a result of our work, my team successfully achieved more than 95% accuracy in automated clinical trial matching—as validated by pathologists and oncologists.

The project yielded impressive results that saved tremendous time for clinicians. It is now a commercial success, replacing IBM Watson and deployed as part of the Philips Intellispace Genomics solution at the nation's top cancer institutions such as Dana-Farber, MD Anderson, and Boston Children's hospital.

Insurance Billing Code Extraction through Hybrid NLP Approaches

ICD10 code extraction is an essential component of insurance risk-adjustment in the United States private insurance industry. We built a system that processes daily >10 k Electronic Health Records (EHR), while classifying over 70k different ICD's

Charles led a team of seven to build a three-layered hybrid NLP ICD10 extraction microservice infrastructure. First, all pdf scans of EHR are OCR'ed into text using Tesseract. The content of EHR's is stored in JSON format in a MongoDB. These texts then go through a rule-based, hashmap-like system where every clinical term, acronyms, synonyms are converted to their subsequent ICD10 code. However, this alone has a high false-positive rate, albeit an extremely high recall (>98%). A layer of Transformer network using ClincalBERT and BioBERT is then applied on top of the rule-based output to perform binary classification, eliminating false positives to provide highly accurate ICD10 that is context-sensitive and clinically relevant.

A Method and Apparatus for Genome Spelling Correction and Acronym Standardization

Developed a genomic biomarker spelling correction used in a clinical trial for entity detection, commercially used in IntelliSpace Precision Medicine(ISPM) platform at various cancer hospital sites around the USA.

Various embodiments relate to a method and non-transitory computer-readable medium for genome spelling correction.

The method included the steps of performing pre-processing on a sentence:
1. Storing a first adjacent word to an unknown word and a second adjacent word to the unknown word
2. Generating a plurality of candidate words for the unknown word
3. Forming a plurality of trigrams with the first adjacent word to the unknown word and the second adjacent word to the unknown word and each of the plurality of candidate words
4. Searching a trigram table for each of the plurality of trigrams
5. Outputting the candidate word from the trigram with the highest trigram count in the trigram table
2004 - 2010

Ph.D. in Natural Language Processing

Universität Stuttgart - Stuttgart, Germany

2001 - 2003

Master's Degree in Natural Language Processing

King's College London - London, United Kingdom

Tools

Jira, PyCharm, Slack, Jenkins, IntelliJ IDEA, Flask, ETL, Apache Maven, Gradle, R, AWS CLI, AWS SDK

Paradigms

Agile, Agile Product Management, Agile Project Management, DevOps

Industry Expertise

Healthcare, Pharmaceuticals

Platforms

Jupyter Notebook, Azure, Google Cloud Platform (GCP)

Other

TensorFlow, Scikit-learn, Keras, Machine Learning, Deep Learning, Natural Language Processing (NLP), Python 3, Java, Project Management, Product Management, Product Ownership, Healthcare IT, Healthcare Product Manager, Public Speaking, Software as a Service (SaaS), Deep Neural Networks, Business Strategy, Scaled Agile Framework (SAFe), Data Science, Technology, Technical Product Management, Artificial Intelligence (AI), Product Roadmaps, Product Strategy, Python, Data Modeling, SaaS Product Management, B2B Product Management, Data Engineering, Data Analytics, Data Analysis, Generative Pre-trained Transformers (GPT), Release Management, User Acceptance Testing (UAT), Pandas, APIs, Proof of Concept (POC), Prototyping, Learning Management Systems (LMS), Amazon S3 (AWS S3), RESTful Development, Analytics, Amplitude, Business Intelligence (BI), Telehealth, Data Visualization, MongoDB, Kubernetes, SQL, AWS DevOps, GraphQL, Spring Boot, Java 8, Product Leadership, Technical Direction, Product Owner, REST APIs, Amazon Web Services (AWS), Transformers, BERT, JSON, LSTM, Apache Kafka, Amazon Kinesis, Statistics, Chatbots, Computer Science, Software Engineering, Next.js, Direct to Consumer (D2C), Mobile UI, Mobile UX, PyTorch, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), Claude, OpenAI GPT-4 API

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring