Liang Kuang, Developer in Germantown, MD, United States
Liang is available for hire
Hire Liang

Liang Kuang

Verified Expert  in Engineering

Data Scientist and Developer

Location
Germantown, MD, United States
Toptal Member Since
July 17, 2020

Liang has a Ph.D. with a strong background in numerical computation, machine learning, deep learning, neural network, big data mining, visualization, and multiple programming. He developed the largest financial regulatory database in the world and a Consolidated Audit Trail (CAT), handling up to 400 billion records per trade day. He brings deep technical insights in designing algorithms and end-to-end analytic platforms, including data lakes and predictive ML/AL models for system optimization.

Portfolio

Financial Industry Regulatory Authority
Amazon Web Services (AWS), TensorFlow, PyTorch, Scikit-learn, Python, PySpark...
GEICO
Azure, PySpark, Deep Learning, XGBoost, Scikit-learn, Python
IHS Markit
D3.js, Plotly, Dash, Machine Learning, Python, R

Experience

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Scikit-learn, PyTorch, Keras, Docker, SQL, ETL, Spark, Scala, Python

The most amazing...

...project I built was a large-scale operational hurricane forecast warning system for NOAA and a graph-based fraud detection analytic data solution for FINRA.

Work Experience

Big Data, AI Developer

2019 - PRESENT
Financial Industry Regulatory Authority
  • Developed the largest financial regulatory database in the world. Consolidated Audit Trail (CAT) handling up to 400 billion records per trade day.
  • Developed and implemented a graph-based algorithm to link all market events and track its life cycle on the scale of billions of records using Spark and AWS.
  • Created an end-to-end graph-based analytic solution for recommendation and fraud detection and an end-to-end people's analytics recommendation system using machine learning.
Technologies: Amazon Web Services (AWS), TensorFlow, PyTorch, Scikit-learn, Python, PySpark, Scala

Senior Data Scientist

2018 - 2019
GEICO
  • Build a state-of-art end-to-end machine learning solution for the second-largest insurance company for 17 million customers.
  • Delivered an end-to-end machine learning tracking and verification pipeline using blockchain for better machine learning model lifecycle management.
  • Oversaw model deployment and designed an integrated pipeline for continuously monitoring model performance and online learning.
Technologies: Azure, PySpark, Deep Learning, XGBoost, Scikit-learn, Python

Data Scientist

2017 - 2018
IHS Markit
  • Drove cultural change in engineering for the advanced analytic team to experiment and adopt more efficient analysis methodologies and tools.
  • Collaborated with the energy and maritime team to develop creative analytic solutions to their unique business challenges.
  • Streamlined the data mining process and standardized all methodologies for sharing and validating analysis. Automated daily data analysis pipeline, SQL search, and R code review with web-based applications.
  • Designed and experimented with various popular machine learning models for predicting oil price, major finance events using ARIMA, VAR, state-space model, regression, neural network, random forest, elastic neural net, RBM, and other similar methods.
  • Translated billions of maritime trip data into valuable business insight by pattern recognition and modeling on AWS environment.
  • Provided in-team technical assistance and knowledge-sharing on best machine learning and coding practices.
Technologies: D3.js, Plotly, Dash, Machine Learning, Python, R

Operational Storm Surge Model Developer

2015 - 2017
NOAA: National Oceanic & Atmospheric Administration
  • Built a national hurricane database and perform category analysis.
  • Developed and maintained risk scoring for regions with different levels of flooding risk.
  • Designed, developed, implemented, and validated a deterministic and ensemble storm surge model for the North Atlantic Ocean.
  • Developed statistics metrics and visualization in Python for evaluating model performance.
  • Designed an algorithm to deploy an operational storm surge model on Unix cloud clusters and code in Perl and Shell Scripts.
  • Delivered a Python-based opensource library for automatically generating model grids, pre-processing, and post-analyzing model results.
  • Developed signal processing algorithms for short and long-term water level time series using sophisticated statistic methods: Fourier transform, PCA, multivariate dimensional analysis, and regression analysis, to name a few.
Technologies: Microsoft SQL Server, Linux, Fortran, MATLAB, Python, Microsoft HPC

Numerical Modeler and Data Scientist

2012 - 2015
Environmental Resource Management
  • Developed and quantitatively validated the coupled four-dimensional numerical coastal ocean models and water quality model for global oceans.
  • Designed algorithms for four dimensional fluid dynamic models and deployed it for various water-bodies, from ponds, rivers, to ocean waters.
  • Worked on international projects for oil & gas, mining, and the hydro power industries, where my role was to use various sophisticated hydrodynamic, environmental models, and data analytic tools to assess its impact on the receiving environment.
  • Deployed a sophisticated four-dimensional operational hydrodynamic modeling system for the Bohai Sea (www.euler-tech.com) using Java, JavaScript, PHP, HTML5, SQL, and Amazon EC2.
Technologies: Fortran, Microsoft SQL Server, Python, MATLAB

1000 Faces

https://github.com/eulertech/1000Faces
This project is a recommendation system for people's analytics to recommend and build a stronger-bonded team within organizations using machine learning. It's an end-to-end ML recommendation system with the front end and back end running on the AWS cloud.

Web-based Application for Auto-configuring Spark Jobs

https://github.com/eulertech/spark-submitAutoConfig
A web-based application to provide the best Spark-submit configuration. When we're running Spark jobs on EMR, we often ask what's the best/top-recommended values for --num-executors, --executor-cores, --executor-memory. These three numbers play a big role in our MapReduce Job performances.

Machine Learning Pipeline: Challenges and Verification with Blockchain

https://github.com/eulertech/machine_learning_blockchain_verification_framework
A typical machine learning development process goes through two stages; training and production. During the two stages, tangled with various platforms, it will introduce room for errors such as mismatch config files, trained model objects, or test data corruption. This can also happen during file transmission. This code demonstrates the typical model train --> production framework and where it can go wrong. Most importantly, how to fix this problem with a blockchain process. I have built a blockchain class to track the process and included a demo.

ADCIRC: Python-based Library for Ocean Model Pre-post Processing

https://github.com/eulertech/ADCIRC
A Python-based library for streamlining pre- and post-processes for the ADCIRC model, including grid generation, boundary initialization, and post-analysis. It includes methods for processing temporal and spatial data.

Smart Ocean Platform

This is a sophisticated data platform for managing overgrowing scientific observation data and enable users to assess analytics and develop machine learning directly on the platform. The technology stack includes Java, JavaScript, D3.js, and MySQL, to name a few.

Paradigms

Data Science, ETL

Other

Machine Learning, Computational Fluid Dynamics (CFD), Forecasting, Computational Physics, Fluid Dynamics, Operations Research, Data Engineering, Neural Networks, Analytics, Unix Shell Scripting, Natural Language Processing (NLP), Deep Learning, Dash, GPT, Generative Pre-trained Transformers (GPT)

Languages

Python 3, Scala, SQL, Python, R, Fortran

Frameworks

Spark

Libraries/APIs

Spark ML, Keras, PyTorch, Scikit-learn, PySpark, TensorFlow, XGBoost, D3.js, Microsoft HPC

Tools

Spark SQL, Plotly, MATLAB

Platforms

Docker, Azure, Linux, Amazon Web Services (AWS)

Industry Expertise

Cybersecurity

Storage

Microsoft SQL Server

2008 - 2012

Doctor of Philosophy Degree (Ph.D) in Ocean Engineering

Stevens Institute of Technology - Hoboken, New Jersey, USA

MAY 2020 - MAY 2021

Application Security and Secure Coding Training

CODEBASHING, LTD.

MARCH 2019 - PRESENT

Strategic Thinking

LinkedIn

JANUARY 2018 - PRESENT

Hadoop: Data Analysis

LinkedIn

FEBRUARY 2017 - PRESENT

Neural Network for Machine Learning

Coursera

JANUARY 2017 - PRESENT

Big Data Analysis with Apache Spark

edX

MAY 2015 - PRESENT

Machine Learning

Standford University

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring