Yihua Liu, Data Science Developer in Orlando, FL, United States
Yihua Liu

Data Science Developer in Orlando, FL, United States

Member since July 28, 2021
Yihua is a lead data scientist with nearly a decade of experience across various companies and teams. With several industry journal publications, speaking engagements, and extensive client-facing experience, he enjoys sharing and discussing his work with audiences of all backgrounds, including C-suite executives and non-technical stakeholders.
Yihua is now available for hire


  • Deep Labs
    Artificial Intelligence (AI), Machine Learning, Python, SQL
  • SimIS
    SQL, Machine Learning, Artificial Intelligence (AI), Python
  • Vonly
    SQL, Python, Machine Learning, Artificial Intelligence (AI)



Orlando, FL, United States



Preferred Environment

Python 3, SQL, Machine Learning, Analytics, Artificial Intelligence (AI), Tableau

The most amazing...

...and highest-impact project I've worked on is Covered California, the state of California's health insurance marketplace.


  • Lead Data Scientist

    2021 - PRESENT
    Deep Labs
    • Developed novel approaches to persona-based artificial intelligence, reducing fraud and identity theft.
    • Processed behavioral and contextual signals in real time to assess risk on events and transactions.
    • Tuned risk-based decision models to enhance the user experience for identity verification and authentication.
    Technologies: Artificial Intelligence (AI), Machine Learning, Python, SQL
  • Senior Data Scientist

    2018 - 2021
    • Improved learner behavior prediction accuracy from a 21% baseline (recommended next action) to 66% on unseen test data via a long short-term memory recurrent neural network (LSTM RNN) model.
    • Predicted course completion with Matthews correlation coefficient 0.51 using Experience API (xAPI) student log data and built the corresponding explanatory model via factor analysis.
    • Co-authored an e-learning metadata analytics strategy distributed across the Department of Defense and chaired the stakeholder working group on its adoption and implementation.
    Technologies: SQL, Machine Learning, Artificial Intelligence (AI), Python
  • Data Analyst

    2017 - 2018
    • Designed a proprietary machine learning model to predict sales by assessing digital media storefront placement quality, achieving 91% correlation on unseen test data.
    • Composed Python and SQL scripts to automatically extract and transform model features from multiple databases to deliver real-time updates to users.
    • Crafted and executed test cases to ensure data integrity of MySQL databases comprising billions of records.
    Technologies: SQL, Python, Machine Learning, Artificial Intelligence (AI)
  • Data Analyst

    2013 - 2016
    Berida, Inc.
    • Oversaw an online storefront A/B testing campaign that boosted monthly revenue by 13%.
    • Refined the market segmentation strategy for business-to-business expansion.
    • Led client operations for the highest-grossing account in the company history.
    Technologies: SQL, Python, Tableau, Machine Learning, Artificial Intelligence (AI)
  • Business Analyst

    2012 - 2013
    • Eliminated test case redundancies via Excel data analysis, cutting testing time by nearly 15%.
    • Led deliverable review sessions with high-level stakeholders across multiple teams to ensure business requirement compliance.
    • Performed ad hoc defect analysis to facilitate efficient prioritization of cross-functional effort.
    Technologies: Tableau, SQL


  • Educational Outcomes Prediction

    This project examines educational data—including student demographic information and academic records, school attributes, and teacher data—from kindergarten through third grade for a diverse cohort of students.

    In the exploratory phase, we find methods to reduce the minority achievement gap and improve all students' outcomes. Next, we attempt to predict future test scores via several regression models. Finally, we predict whether students will graduate from high school and whether they will take a college entrance examination—SAT or ACT.

    Although these events occur nearly a decade after third grade for most students, we were able to perform relatively well, with ROC AUC (area under the curve) scores between 0.7 and 0.8 on unseen test data.


  • Languages

    SQL, Python 3, Python
  • Other

    Machine Learning, Analytics, Artificial Intelligence (AI), Mathematics, Applied Mathematics, Statistics, Big Data
  • Tools



  • Master's Degree in Statistics
    2015 - 2016
    University of Central Florida - Orlando, FL
  • Bachelor's Degree in Mathematics
    2008 - 2011
    University of California, Berkeley - Berkeley, CA

To view more profiles

Join Toptal
Share it with others