Big Data, AI Developer2019 - PRESENTFinancial Industry Regulatory Authority
Technologies: Amazon Web Services (AWS), TensorFlow, PyTorch, Scikit-learn, AWS, Python, PySpark, Scala
- Developed the largest financial regulatory database in the world. Consolidated Audit Trail (CAT) handling up to 400 billion records per trade day.
- Developed and implemented a graph-based algorithm to link all market events and track its life cycle on the scale of billions of records using Spark and AWS.
- Created an end-to-end graph-based analytic solution for recommendation and fraud detection and an end-to-end people's analytics recommendation system using machine learning.
Senior Data Scientist2018 - 2019GEICO
Technologies: Azure, PySpark, Deep Learning, XGBoost, Scikit-learn, Python
- Build a state-of-art end-to-end machine learning solution for the second-largest insurance company for 17 million customers.
- Delivered an end-to-end machine learning tracking and verification pipeline using blockchain for better machine learning model lifecycle management.
- Oversaw model deployment and designed an integrated pipeline for continuously monitoring model performance and online learning.
Data Scientist2017 - 2018IHS Markit
Technologies: D3.js, Plotly, Dash, Machine Learning, Python, R
- Drove cultural change in engineering for the advanced analytic team to experiment and adopt more efficient analysis methodologies and tools.
- Collaborated with the energy and maritime team to develop creative analytic solutions to their unique business challenges.
- Streamlined the data mining process and standardized all methodologies for sharing and validating analysis. Automated daily data analysis pipeline, SQL search, and R code review with web-based applications.
- Designed and experimented with various popular machine learning models for predicting oil price, major finance events using ARIMA, VAR, state-space model, regression, neural network, random forest, elastic neural net, RBM, and other similar methods.
- Translated billions of maritime trip data into valuable business insight by pattern recognition and modeling on AWS environment.
- Provided in-team technical assistance and knowledge-sharing on best machine learning and coding practices.
Operational Storm Surge Model Developer2015 - 2017NOAA: National Oceanic & Atmospheric Administration
Technologies: Microsoft SQL Server, Linux, Fortran, MATLAB, Python, Microsoft HPC
- Built a national hurricane database and perform category analysis.
- Developed and maintained risk scoring for regions with different levels of flooding risk.
- Designed, developed, implemented, and validated a deterministic and ensemble storm surge model for the North Atlantic Ocean.
- Developed statistics metrics and visualization in Python for evaluating model performance.
- Designed an algorithm to deploy an operational storm surge model on Unix cloud clusters and code in Perl and Shell Scripts.
- Delivered a Python-based opensource library for automatically generating model grids, pre-processing, and post-analyzing model results.
- Developed signal processing algorithms for short and long-term water level time series using sophisticated statistic methods: Fourier transform, PCA, multivariate dimensional analysis, and regression analysis, to name a few.
Numerical Modeler and Data Scientist2012 - 2015Environmental Resource Management
Technologies: Fortran, Microsoft SQL Server, Python, MATLAB
- Developed and quantitatively validated the coupled four-dimensional numerical coastal ocean models and water quality model for global oceans.
- Designed algorithms for four dimensional fluid dynamic models and deployed it for various water-bodies, from ponds, rivers, to ocean waters.
- Worked on international projects for oil & gas, mining, and the hydro power industries, where my role was to use various sophisticated hydrodynamic, environmental models, and data analytic tools to assess its impact on the receiving environment.