Shing Chan, Developer in Asuncion, Paraguay
Shing is available for hire
Hire Shing

Shing Chan

Verified Expert  in Engineering

Machine Learning Developer

Asuncion, Paraguay

Toptal member since October 27, 2021

Bio

Shing is a researcher/developer with extensive experience building ML systems across various industries: healthcare (risk scores, sensor analytics, epidemiology), marketing (CLV, churn models), finance (index replication, trading systems), sports analytics (NBA/NFL), geophysics (well placement, reservoir modeling), and aeronautics (GPU fluid sims). He holds a PhD (2018) in physics-informed ML and generative AI for oil and gas engineering and is currently at Oxford, advancing AI health analytics.

Portfolio

IDX Digital Assets
Amazon Web Services (AWS), Data Science, Machine Learning, Trading...
University of Oxford
Machine Learning, Deep Learning, Time Series Analysis, Wearables, Bash...
EMoodie
Data Scientist

Experience

  • Python - 10 years
  • Keras - 9 years
  • Physics Simulations - 9 years
  • PyTorch - 9 years
  • Machine Learning - 9 years
  • Deep Learning - 9 years
  • Generative Adversarial Networks (GANs) - 8 years
  • Time Series Analysis - 5 years

Availability

Part-time

Preferred Environment

Linux, Git, PyTorch, Keras, Bash, Fortran, Vim Text Editor, Amazon Web Services (AWS), Python

The most amazing...

...thing I've developed is a generative AI method for geomodelling, published in CompGeosci 2019.

Work Experience

Researcher

2022 - PRESENT
IDX Digital Assets
  • Developed a replication model for the Refinitiv Venture Capital Index.
  • Built a medium/low-frequency trading system for digital assets based on features derived from the price series and macroeconomic factors.
  • Implemented indicators for trend and risk management to reduce drawdown.
Technologies: Amazon Web Services (AWS), Data Science, Machine Learning, Trading, Scikit-learn, XGBoost, PyTorch

Researcher

2019 - PRESENT
University of Oxford
  • Created PyPI packages for wearable sensor analytics, used by hundreds of researchers as well as pharmaceutical companies (GlaxoSmithKline, Novo Nordisk, Johnson & Johnson).
  • Researched the added value of alternative data (wearable sensors) on existing clinical risk models (https://qrisk.org/).
  • Researched the use of deep learning methods to extract behavioral insights from wearable sensor data.
  • Co-developed the 1st tera-scale foundation model for accelerometer data (trained on 700,000 person-days of sensor data via self-supervised learning).
  • Built pipelines for large-scale multi-node multi-gpu training of deep learning models (over 20 terabytes of data).
  • Built comprehensive evaluation pipelines for accelerometer-based activity recognition, comparing different models (CNNs, LSTMs, HMMs, tree models) on various open datasets.
  • Applied time-to-event methods (Cox regression, survival forests) to model time to hospitalization or death based on patient characteristics and alternative data such as wearable sensor data (0.6-0.7 C-index).
Technologies: Machine Learning, Deep Learning, Time Series Analysis, Wearables, Bash, Fitness Trackers, Risk Models, Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Signal Processing, Python, TensorFlow, Keras, Git, Linux, Natural Language Processing (NLP), Scientific Computing, Health, Time Series, R, Data Analytics, Predictive Modeling, SQL, PostgreSQL, Data Mining, Artificial Intelligence (AI), Algorithms, Data Engineering, Electronic Health Records (EHR)

Senior Data Scientist

2022 - 2024
EMoodie
  • Prototyped mental health monitoring using speech-based emotion recognition and change-point detection.
  • Built pipelines for noise reduction, audio enhancement, and anomaly detection.
  • Co-wrote government grant proposals, successfully securing £500,000 in funding.
Technologies: Data Scientist

Researcher

2023 - 2023
subconscious.ai
  • Developed methodologies based on large language models for generating synthetic respondents for survey simulations, with an emphasis on realistic demographic profiles.
  • Built LLM-based tools to format and summarize academic papers on conjoint analysis and survey-based market research.
  • Conducted prompt engineering experiments to replicate findings from conjoint analysis and market research studies.
Technologies: WandB, OpenAI, Large Language Models (LLMs), Monte Carlo Simulations, Hugging Face

Machine Learning Engineer

2022 - 2023
Yofi
  • Adapted the Buy Till You Die model for customer churn and lifetime value, simplifying the Beta-Geometric/NBD formulation for efficient implementation in SQL for real-time eCommerce applications.
  • Enhanced a bot detection model by integrating features derived from telemetry and sensor data, reducing bad actors and low-value eCommerce customers.
  • Developed predictive models and built pipelines for model training.
Technologies: Machine Learning, Artificial Intelligence (AI), Customer Lifetime Value (CLV), Churn Analysis, Predictive Modeling, Forecasting, Anomaly Detection, Amazon Web Services (AWS), Elasticsearch, MongoDB, Shopify, AWS CloudFormation

Machine Learning Expert

2018 - 2019
KEG Systems LLC
  • Researched novel features (e.g., player-player, player-team, team-team interaction features) to predict game outcomes for sports betting (e.g., money line, over-under, and spread), emphasizing calibration to inform bet sizing and risk management.
  • Created reproducible pipelines for daily retraining, including feature selection, fine-tuning, and pruning.
  • Oversaw deployment and decision-making, betting with real money and tweaking metamodels based on feedback.
Technologies: Machine Learning, Data Science, Trading, TensorFlow, PyTorch, Keras, Git, Linux, Signal Processing, Python, Natural Language Processing (NLP), Sports, Scientific Computing, PostgreSQL, Time Series, Algorithmic Trading, Gambling, Data Analytics, Predictive Modeling, SQL, Data Mining, Artificial Intelligence (AI), Algorithms

PhD Candidate

2015 - 2018
Heriot-Watt University
  • Developed a physics-informed machine learning model to speed up computationally expensive Monte Carlo fluid simulations.
  • Developed a novel framework for geological reconstruction based on generative models (e.g., GANs, VAEs) to enhance geological realism for improved accuracy of oil production forecasts in Bayesian history matching.
  • Created Python packages for subsurface fluid simulations.
Technologies: Computational Fluid Dynamics (CFD), Machine Learning, Computer Vision, Generative Adversarial Networks (GANs), Variational Autoencoders, Physics Simulations, TensorFlow, PyTorch, Git, Linux, Python, Natural Language Processing (NLP), Scientific Computing, Time Series, Data Analytics, Predictive Modeling, Artificial Intelligence (AI), Algorithms

Engineering Intern

2014 - 2014
FAdeA
  • Assisted in the maintenance and repair of aircraft components.
  • Assessed the capabilities of aircraft repair stations, making sure tools and procedures were in order according to technical manuals.
  • Issued reports documenting deviations from technical manuals, including changes in procedures, the use of original equipment manufacturer (OEM), or refurbished parts.
Technologies: Aerospace & Defense, Aircraft & Airlines, Engineering, Scientific Computing, Time Series, Data Analytics, Predictive Modeling, Artificial Intelligence (AI)

Research and Development Intern

2013 - 2014
Instituto Universitario Aeronáutico
  • Contributed to an in-house software for viscous flow simulation, extending it with the arbitrary Lagrangian-Eulerian formulation on unstructured grids.
  • Identified bottlenecks in the simulation software and parallelized them with OpenMP where possible.
  • Ported code sections with CUDA Fortran to enable GPU acceleration, resulting in more than 10 times the speed-up.
Technologies: Computational Fluid Dynamics (CFD), Aerodynamics, Numerical Simulations, Fortran, GPU Computing, NVIDIA CUDA, OpenMP, Physics Simulations, Git, Linux, Python, MATLAB, Scientific Computing, Time Series, Data Analytics, Predictive Modeling, Artificial Intelligence (AI), Algorithms

Experience

Package for Processing and Analysis of Wearables' Data for Health Analytics

https://github.com/activityMonitoring/biobankAccelerometerAnalysis
This Python package extracts a wide range of clinically relevant statistics related to activity and sleep patterns obtained from wearable activity trackers. I contributed to all aspects of the software, from signal processing, feature engineering, machine learning to maintenance and packaging.

Numerical Optimization with Natural Evolution Strategies

https://github.com/chanshing/xnes
This is a simple-to-use Python module for optimization via natural evolution strategies. It is ideal for hard optimization problems involving highly non-linear, non-convex objective functions or when gradients are unavailable or difficult to compute.

Synthesis of Geological Images

https://github.com/chanshing/geocondition
This tool is used to generate geological images in a conditional manner from a generative neural network. It is useful for subsurface reconstruction to obtain more realistic geomodels, thus optimizing oil and gas exploration and extraction pipelines.

Physics-informed Machine Learning for Accelerated Simulations

https://www.sciencedirect.com/science/article/abs/pii/S0021999117307933?utm_medium=email
Developed a hybrid method that combines machine learning and physics to speed up fluid simulations. I trained a convolutional neural network to learn fluid dynamics from ground-truth snapshots, in which conservation laws further correct errors. The framework works under a multiscale finite volume formulation.

Education

2015 - 2018

PhD in Petroleum Engineering

Heriot-Watt University - Edinburgh, United Kingdom

2007 - 2014

Engineer's Degree in Aerospace Engineering

Instituto Universitario Aeronáutico - Cordoba, Argentina

Certifications

SEPTEMBER 2016 - PRESENT

Financial Markets

Yale University | via Coursera

APRIL 2014 - PRESENT

Heterogeneous Parallel Programming

University of Illinois | via Coursera

APRIL 2014 - PRESENT

Programming Mobile Applications for Android Handheld Systems

University of Maryland | via Coursera

Skills

Libraries/APIs

PyTorch, Keras, Scikit-learn, TensorFlow, OpenMP, XGBoost

Tools

Git, MATLAB, Vim Text Editor, AWS CloudFormation

Languages

Python, Java, Fortran, Bash, C, SQL, R

Platforms

Linux, NVIDIA CUDA, Android, Amazon Web Services (AWS), Shopify

Paradigms

Parallel Programming, Anomaly Detection

Storage

PostgreSQL, Elasticsearch, MongoDB

Other

Physics Simulations, Machine Learning, Deep Learning, Time Series Analysis, Generative Adversarial Networks (GANs), Artificial Intelligence (AI), Time Series, Predictive Modeling, Data Analytics, Algorithmic Trading, Algorithms, Signal Processing, Data Mining, Natural Language Processing (NLP), Numerical Methods, Numerical Analysis, Physics, GPU Computing, Computational Fluid Dynamics (CFD), Computer Vision, Data Science, Scientific Computing, Finance, Aerodynamics, Numerical Simulations, Optimization, Convolutional Neural Networks (CNNs), Wearables, Fitness Trackers, Risk Models, Variational Autoencoders, Recurrent Neural Networks (RNNs), Health, Aerospace & Defense, Aircraft & Airlines, Engineering, Trading, Gambling, Sports, Data Scientist, WandB, OpenAI, Large Language Models (LLMs), Monte Carlo Simulations, Hugging Face, Customer Lifetime Value (CLV), Churn Analysis, Forecasting, Data Engineering, Electronic Health Records (EHR)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring