Charles Yee
Verified Expert in Product Management
Product Manager
Nashville, TN, United States
Toptal member since August 26, 2020
Charles is an industry leader in healthcare NLP with over a decade of experience as a data-science technical manager. As a player-coach, Charles has led multimillion-dollar projects in health-tech Fortune 500s, such as United Health, Philips, and AstraZeneca. Charles also has a PhD in computational linguistics. With over a dozen publications and patents in his field, Charles is an authority in machine learning for named entity extraction and classification.
Project Highlights
Expertise
Work Experience
AI Machine Learning Director
Inovalon
- Provided strategic leadership for the company's 80+ product portfolio. Responsibilities included prototyping and integrating machine learning solutions into various business use cases across provider, payer, and pharmacy business units.
- Implemented domain-specific Q&A chatbot to assist customers in installing and deploying Inovalon’s flagship pharmacy SaaS product. This utilizes extensive retrieval-augmented generation (RAG), prompt engineering, AWS Bedrock (LLM), and Kendra.
- Built XGBoost models to predict patients’ medication adherence to flag those not taking their medications (on time) for intervention plan, with 95% accuracy covering hundreds of thousands of patients.
- Built optimization software to reduce the number of false positives from AWS Comprehend Medical output. This saves the medical coding team significant time and improves the accuracy of insurance risk adjustment.
- Trained gradient boosting models to minimize the number of “Member not found” errors during insurance eligibility verification. This saves hospital customers money as payers financially penalize large volumes of non-existing member queries.
Principal Scientist
AstraZeneca
- Spearheaded R&D in COVID-19 vaccine adverse event detection and prevention.
- Oversaw AZD1222's pharmacovigilance reporting, particularly those related to coagulopathy and thrombocytopenia. Results directly reached corporate executives and regulatory agencies.
- Implemented microservices to discover and detect vaccine production lots related to unexpected outcomes.
- Developed automation tools for relevant entity detection of drug-induced liver injury used by IQ DILI Consortium.
Data Science Consultant
Insidetracker
- Preprocessed Insidetracker's Fitbit dataset, grouping users and dates and their workout activity type.
- Created correlation models and implemented an anomaly detection module to flag abnormal resting heart rates and sleep patterns.
- Enabled the correlations to show the user what type of changes in their habits, like diet and exercise, can have the most impact on improving their health.
- Developed anomaly detection to alert the user of certain live events happening that are affecting their well-being.
Director of Data Science
iQuartic
- Responsible for $4 million in revenue by managing and expanding the company's front- and back-end microservice architecture (health insurance risk adjustment. streamline the daily tagging of 10,000 pages of electronic medical records with ICD10).
- Led and supervised seven contractors and direct reports and served as the product owner for all NLP and machine learning-related company offerings including EMR optical character recognition, handwriting detection, and NLP disease term extraction.
- Spearhead code reviews, oversaw the evaluation and hyperparameter tuning of deep learning models, and orchestrated CI/CD, end-to-end user acceptance (UAT), and operational acceptance testing (OAT).
- Recruited, tech-screened, and facilitated onboarding for full-stack developers, NLP engineers, data scientists, and DevOps.
Senior Biomedical Informatics Scientist | Project Leader
Philips
- Led a research team at Philips Healthcare and delivered all of Philips’ oncology informatics and NLP solutions. Responsible for IntelliSpace clinical trial matching SaaS, grossing $6 million in annual revenue.
- Served as the tech lead for an NLP algorithm for prototype clinical trial matching tool. Provided direction and hands-on expertise to key features such as clinical phenotype named entity recognition using the long short-term memory neural network.
- Set up research exhibits at partner hospitals (MD Anderson, Dana-Farber, and Westchester Medical Center); benchmarking product performances and usability with clinicians.
- Provided business development insights by leveraging technological know-how to internal ventures and hospital customers.
- Guided the company's oncology solution vision and strategy, capitalizing on cutting-edge deep learning/neural network methods.
Co-founder | CTO
Twyla
- Co-founded a Series-B startup that delivered chatbot AI architecture design (finite-state automaton with hybrid transition models) via both rule-based pattern recognition and ML that approximates semantic similarity with historical chat logs.
- Implemented linguistic pre-processing, textual feature selection, and extraction by utilizing Scikit-learn, build regression, gradient boosting, random forest modules for customer chat intention detection.
- Provided big data analysis (NGram, TF-IDF, cosine, Word2Vec) to enterprise clients including T-Mobile, HTC, Heineken, Cebu Pacific, which yielded insights on their customer behavior, product issues, and marketability.
NLP Engineer
United Health Group
- Designed ontology-consistent feature structures and syntax-semantic interface to capture and harvest new concepts from unstructured data ranging from physician notes, claims, EMR, and EHR, spanning over 130 million American patients.
- Specialized in extracting concepts ranging from a genetic mutation, chromosomal structural rearrangement, multiple myeloma, cancer staging, and tumor sizes. Other topics included neurostimulators, various pain scores, tumors, and pain locations.
- Preprocessed, trained, and conducted diagnostics on Support Vector Machine (SVM) classification solutions for linguistic issues related to EMR and physician notes and prescriptions.
- Worked on drug change action rationale, such as cost, side effects, efficacy, sentence boundary versus abbreviation recognition, generalities versus patient-centric data, and more.
Project History
LSTM-enabled Clinical Trial Matching for Precision Medicine
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6568095/Led a team of eight cross-functional specialists to develop and deploy Philips' Intellispace Genomics product, including a US $6 million project, clinical trial matching for cancer patients, at the nation's tier-one cancer centers.
With my team, I evolved a naive, Elasticsearch approach to a pipeline using a hybrid of named entity recognition (NER) and logical satisfiability theory. We successfully trained a long short-term memory neural network (LSTM) with a conditional random field (CRF) output layer using clinical domain-informed corpora as word embedding.
As a result of our work, my team successfully achieved more than 95% accuracy in automated clinical trial matching—as validated by pathologists and oncologists.
The project yielded impressive results that saved tremendous time for clinicians. It is now a commercial success, replacing IBM Watson and deployed as part of the Philips Intellispace Genomics solution at the nation's top cancer institutions such as Dana-Farber, MD Anderson, and Boston Children's hospital.
Insurance Billing Code Extraction through Hybrid NLP Approaches
ICD10 code extraction is an essential component of insurance risk-adjustment in the United States private insurance industry. We built a system that processes daily >10 k Electronic Health Records (EHR), while classifying over 70k different ICD's
A Method and Apparatus for Genome Spelling Correction and Acronym Standardization
Developed a genomic biomarker spelling correction used in a clinical trial for entity detection, commercially used in IntelliSpace Precision Medicine(ISPM) platform at various cancer hospital sites around the USA.
The method included the steps of performing pre-processing on a sentence:
1. Storing a first adjacent word to an unknown word and a second adjacent word to the unknown word
2. Generating a plurality of candidate words for the unknown word
3. Forming a plurality of trigrams with the first adjacent word to the unknown word and the second adjacent word to the unknown word and each of the plurality of candidate words
4. Searching a trigram table for each of the plurality of trigrams
5. Outputting the candidate word from the trigram with the highest trigram count in the trigram table
Education
Ph.D. in Natural Language Processing
Universität Stuttgart - Stuttgart, Germany
Master's Degree in Natural Language Processing
King's College London - London, United Kingdom
Skills
Tools
Jira, PyCharm, Slack, Jenkins, IntelliJ IDEA, Flask, ETL, Apache Maven, Gradle, R, AWS CLI, AWS SDK
Paradigms
Agile, Agile Product Management, Agile Project Management, DevOps
Industry Expertise
Healthcare, Pharmaceuticals
Platforms
Jupyter Notebook, Azure, Google Cloud Platform (GCP)
Other
TensorFlow, Scikit-learn, Keras, Machine Learning, Deep Learning, Natural Language Processing (NLP), Python 3, Java, Project Management, Product Management, Product Ownership, Healthcare IT, Healthcare Product Manager, Public Speaking, Software as a Service (SaaS), Deep Neural Networks, Business Strategy, Scaled Agile Framework (SAFe), Data Science, Technology, Technical Product Management, Artificial Intelligence (AI), Product Roadmaps, Product Strategy, Python, Data Modeling, SaaS Product Management, B2B Product Management, Data Engineering, Data Analytics, Data Analysis, Generative Pre-trained Transformers (GPT), Release Management, User Acceptance Testing (UAT), Pandas, APIs, Proof of Concept (POC), Prototyping, Learning Management Systems (LMS), Amazon S3 (AWS S3), RESTful Development, Analytics, Amplitude, Business Intelligence (BI), Telehealth, Data Visualization, MongoDB, Kubernetes, SQL, AWS DevOps, GraphQL, Spring Boot, Java 8, Product Leadership, Technical Direction, Product Owner, REST APIs, Amazon Web Services (AWS), Transformers, BERT, JSON, LSTM, Apache Kafka, Amazon Kinesis, Statistics, Chatbots, Computer Science, Software Engineering, Next.js, Direct to Consumer (D2C), Mobile UI, Mobile UX, PyTorch, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), Claude, OpenAI GPT-4 API
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring