Liang Kuang, Data Scientist and Developer in Germantown, MD, United States
Liang Kuang

Data Scientist and Developer in Germantown, MD, United States

Member since July 17, 2020
Liang has a Ph.D. with a strong background in numerical computation, machine learning, deep learning, neural network, big data mining, visualization, and multiple programming. He developed the largest financial regulatory database in the world and a Consolidated Audit Trail (CAT), handling up to 400 billion records per trade day. He brings deep technical insights in designing algorithms and end-to-end analytic platforms, including data lakes and predictive ML/AL models for system optimization.
Liang is now available for hire




Germantown, MD, United States



Preferred Environment

Amazon Web Services (AWS), Natural Language Processing (NLP), Scikit-learn, PyTorch, Keras, Docker, AWS, SQL, ETL, Spark, Scala, Python

The most amazing...

...project I built was a large-scale operational hurricane forecast warning system for NOAA and a graph-based fraud detection analytic data solution for FINRA.


  • Big Data, AI Developer

    2019 - PRESENT
    Financial Industry Regulatory Authority
    • Developed the largest financial regulatory database in the world. Consolidated Audit Trail (CAT) handling up to 400 billion records per trade day.
    • Developed and implemented a graph-based algorithm to link all market events and track its life cycle on the scale of billions of records using Spark and AWS.
    • Created an end-to-end graph-based analytic solution for recommendation and fraud detection and an end-to-end people's analytics recommendation system using machine learning.
    Technologies: Amazon Web Services (AWS), TensorFlow, PyTorch, Scikit-learn, AWS, Python, PySpark, Scala
  • Senior Data Scientist

    2018 - 2019
    • Build a state-of-art end-to-end machine learning solution for the second-largest insurance company for 17 million customers.
    • Delivered an end-to-end machine learning tracking and verification pipeline using blockchain for better machine learning model lifecycle management.
    • Oversaw model deployment and designed an integrated pipeline for continuously monitoring model performance and online learning.
    Technologies: Azure, PySpark, Deep Learning, XGBoost, Scikit-learn, Python
  • Data Scientist

    2017 - 2018
    IHS Markit
    • Drove cultural change in engineering for the advanced analytic team to experiment and adopt more efficient analysis methodologies and tools.
    • Collaborated with the energy and maritime team to develop creative analytic solutions to their unique business challenges.
    • Streamlined the data mining process and standardized all methodologies for sharing and validating analysis. Automated daily data analysis pipeline, SQL search, and R code review with web-based applications.
    • Designed and experimented with various popular machine learning models for predicting oil price, major finance events using ARIMA, VAR, state-space model, regression, neural network, random forest, elastic neural net, RBM, and other similar methods.
    • Translated billions of maritime trip data into valuable business insight by pattern recognition and modeling on AWS environment.
    • Provided in-team technical assistance and knowledge-sharing on best machine learning and coding practices.
    Technologies: D3.js, Plotly, Dash, Machine Learning, Python, R
  • Operational Storm Surge Model Developer

    2015 - 2017
    NOAA: National Oceanic & Atmospheric Administration
    • Built a national hurricane database and perform category analysis.
    • Developed and maintained risk scoring for regions with different levels of flooding risk.
    • Designed, developed, implemented, and validated a deterministic and ensemble storm surge model for the North Atlantic Ocean.
    • Developed statistics metrics and visualization in Python for evaluating model performance.
    • Designed an algorithm to deploy an operational storm surge model on Unix cloud clusters and code in Perl and Shell Scripts.
    • Delivered a Python-based opensource library for automatically generating model grids, pre-processing, and post-analyzing model results.
    • Developed signal processing algorithms for short and long-term water level time series using sophisticated statistic methods: Fourier transform, PCA, multivariate dimensional analysis, and regression analysis, to name a few.
    Technologies: Microsoft SQL Server, Linux, Fortran, MATLAB, Python, Microsoft HPC
  • Numerical Modeler and Data Scientist

    2012 - 2015
    Environmental Resource Management
    • Developed and quantitatively validated the coupled four-dimensional numerical coastal ocean models and water quality model for global oceans.
    • Designed algorithms for four dimensional fluid dynamic models and deployed it for various water-bodies, from ponds, rivers, to ocean waters.
    • Worked on international projects for oil & gas, mining, and the hydro power industries, where my role was to use various sophisticated hydrodynamic, environmental models, and data analytic tools to assess its impact on the receiving environment.
    • Deployed a sophisticated four-dimensional operational hydrodynamic modeling system for the Bohai Sea ( using Java, JavaScript, PHP, HTML5, SQL, and Amazon EC2.
    Technologies: Fortran, Microsoft SQL Server, Python, MATLAB


  • 1000 Faces

    This project is a recommendation system for people's analytics to recommend and build a stronger-bonded team within organizations using machine learning. It's an end-to-end ML recommendation system with the front end and back end running on the AWS cloud.

  • Web-based Application for Auto-configuring Spark Jobs

    A web-based application to provide the best Spark-submit configuration. When we're running Spark jobs on EMR, we often ask what's the best/top-recommended values for --num-executors, --executor-cores, --executor-memory. These three numbers play a big role in our MapReduce Job performances.

  • Machine Learning Pipeline: Challenges and Verification with Blockchain

    A typical machine learning development process goes through two stages; training and production. During the two stages, tangled with various platforms, it will introduce room for errors such as mismatch config files, trained model objects, or test data corruption. This can also happen during file transmission. This code demonstrates the typical model train --> production framework and where it can go wrong. Most importantly, how to fix this problem with a blockchain process. I have built a blockchain class to track the process and included a demo.

  • ADCIRC: Python-based Library for Ocean Model Pre-post Processing

    A Python-based library for streamlining pre- and post-processes for the ADCIRC model, including grid generation, boundary initialization, and post-analysis. It includes methods for processing temporal and spatial data.

  • Smart Ocean Platform

    This is a sophisticated data platform for managing overgrowing scientific observation data and enable users to assess analytics and develop machine learning directly on the platform. The technology stack includes Java, JavaScript, D3.js, and MySQL, to name a few.


  • Paradigms

    Data Science, ETL
  • Other

    Machine Learning, Computational Fluid Dynamics (CFD), Forecasting, Computational Physics, Fluid Dynamics, Operations Research, Data Engineering, Neural Networks, Analytics, Unix Shell Scripting, AWS, Natural Language Processing (NLP), Deep Learning, Dash
  • Languages

    Python 3, Scala, SQL, Python, R, Fortran
  • Frameworks

  • Libraries/APIs

    Spark ML, Keras, PyTorch, Scikit-learn, PySpark, TensorFlow, XGBoost, D3.js, Microsoft HPC
  • Tools

    Spark SQL, Plotly, MATLAB
  • Platforms

    Docker, Azure, Linux, Amazon Web Services (AWS)
  • Industry Expertise

  • Storage

    Microsoft SQL Server


  • Doctor of Philosophy Degree (Ph.D) in Ocean Engineering
    2008 - 2012
    Stevens Institute of Technology - Hoboken, New Jersey, USA


  • Application Security and Secure Coding Training
    MAY 2020 - MAY 2021
  • Strategic Thinking
    MARCH 2019 - PRESENT
  • Hadoop: Data Analysis
  • Neural Network for Machine Learning
  • Big Data Analysis with Apache Spark
  • Machine Learning
    MAY 2015 - PRESENT
    Standford University

To view more profiles

Join Toptal
Share it with others