Jason Li, Developer in Melbourne, Australia
Jason is available for hire
Hire Jason

Jason Li

Verified Expert  in Engineering

Machine Learning Developer

Location
Melbourne, Australia
Toptal Member Since
April 14, 2022

Jason is a data analyst and researcher interested in finding patterns in novel data sets. He specializes in using traditional statistical methods and contemporary machine learning (ML) to extract knowledge from data. His experiences range from ultra-high dimensional and big data to time-series and non-numerical data. Jason is a problem-solver with a proven ability to deliver in different team settings and to communicate with clients with diverse technical backgrounds.

Portfolio

Peter MacCallum Cancer Centre
R, RStudio, High-performance Computing, Cloud Computing, Python...
Online Freelance Agency
Data Analytics, Automation, Exploratory Data Analysis, Machine Learning...
Futures Trading Management Firm
Machine Learning, Data Science, Artificial Intelligence (AI), Deep Learning...

Experience

Availability

Part-time

Preferred Environment

MacOS, PyCharm, RStudio

The most amazing...

...data that I’ve dealt with is DNA sequencing data, developing statistical and novel data analytical methods to predict cancer-causing DNA mutations.

Work Experience

Head of Bioinformatics Core Facility and Senior Research Fellow

2013 - PRESENT
Peter MacCallum Cancer Centre
  • Introduced and supported Databricks and Spark to a number of AI projects, including NLP, computer vision, and multiomics-related projects. Secured $50,000 credits for proof of concept Databricks development.
  • Analyzed complex genomic and clinical data using ML, deep learning, statistical tests, correlation analyses, hypothesis testing, and survival analyses.
  • Researched and developed analytical methods for previously unseen data types derived from novel biotechnologies.
  • Provided data analytics and statistical consultation to researchers from diverse disciplinary backgrounds.
  • Collaborated with a team of more than five software engineers and bioinformaticians to provide ongoing services to research and clinical laboratories.
  • Established collaborations with computational scientists from external medical research and academic institutes on research engineering projects.
  • Developed novel methods for the analysis of exome sequencing data.
  • Utilized high-performance computing and cloud computing to optimize turnaround time for extensive data analysis.
  • Trained multiple young data scientists through undergraduate and postgraduate research placement programs.
  • Developed data visualization of different data types using R plots, interactive HTML, and Microsoft Power BI.
Technologies: R, RStudio, High-performance Computing, Cloud Computing, Python, Statistical Methods, Machine Learning, Genomics, RNA Sequencing, Exploratory Data Analysis, Data Visualization, Data Science, Artificial Intelligence (AI), Scikit-learn, PyTorch, Deep Learning, Data Modeling, Data Analytics, Data Reporting, Microsoft Power BI, Data Analysis, Data Pipelines, Data Engineering, Algorithms, Mathematics, Data, Mathematical Analysis, Spark, PySpark, SQL, Spark SQL, Databricks, Azure Databricks, Natural Language Processing (NLP), OpenAI GPT-4 API, Streaming, Databases, Predictive Analytics, Predictive Modeling, Computer Vision, Oncology & Cancer Treatment, TensorFlow, Google Cloud Platform (GCP), Time Series Analysis, Time Series, Bioinformatics

Data Scientist Freelancer

2020 - 2023
Online Freelance Agency
  • Created a Django-based platform that provides visualization of marketing data for a canned cocktail business.
  • Automated data streaming from Google Analytics Data API to perform real-time analysis against advertisement referrals to gauge ad effectiveness.
  • Worked closely with clients to generate a visualization of marketing data that can be used routinely by the operations and sales teams.
Technologies: Data Analytics, Automation, Exploratory Data Analysis, Machine Learning, Tableau, Google Analytics, Google Analytics API, Web App Development, Django, Google Cloud Platform (GCP), API Integration, Time Series Analysis, Time Series

ML Developer

2022 - 2022
Futures Trading Management Firm
  • Created data pipelines to automate the entire workflow of the algorithmic trading system, from data download and ingestion to model updating and parameter visualization to automated trading.
  • Analyzed trading strategies against a different cohort of historical data using custom metrics and plots.
  • Developed a production system to handle real-time data and trading via the API of the broker's software—CQG Client.
Technologies: Machine Learning, Data Science, Artificial Intelligence (AI), Deep Learning, Big Data, Data Visualization, Data Modeling, Data Analysis, Data Analytics, Data Pipelines, Financial Forecasting, Data Engineering, Statistical Analysis, Real-time Data, Real-time Streaming, Predictive Analytics, Predictive Modeling, Prediction Markets, Stock Analysis, Stock Trading, Visualization, Trading, Bots, Futures & Options, Finance, Quantitative Finance, Algorithmic Trading, API Integration, Crypto, Time Series Analysis, Time Series, High-frequency Trading (HFT)

Technical Lead (Partnership)

2017 - 2022
Algorithmic Futures Trading
  • Researched and developed trading strategies for index futures, currency, and commodity futures.
  • Developed frameworks in Python and R to facilitate strategic development, parameter optimization, and backtesting.
  • Created new metrics that are more useful than Sharpe and Sortino ratios to quantify performance consistency for our purpose.
  • Performed exploratory data analysis and generated different styles of plots, such as heatmaps, correlation and 3D scatter plots, histograms, and customized OHLC charts to help assess ideas and enable new observations to be made.
  • Implemented a fully automated Java program to make trades.
  • Used AWS cloud architecture and later migrated to Google Cloud Platform for production, live trading, and parameter optimization.
Technologies: Backtesting Trading Strategies, Automation, Interactive Brokers API, Java, Data Science, Finance APIs, Financial Markets, Financial Forecasting, Data Pipelines, Statistical Analysis, Prediction Markets, Stock Trading, Stock Analysis, Visualization, Trading, Bots, Futures & Options, Finance, Quantitative Research, Quantitative Finance, Algorithmic Trading, API Integration, Time Series Analysis, Time Series, High-frequency Trading (HFT)

Data Analyst and Bioinformatician

2007 - 2013
Peter MacCallum Cancer Centre
  • Analyzed data derived from biological experiments using custom methods and published methods.
  • Developed and implemented new algorithms to identify peaks in signal data efficiently.
  • Managed the logistics of huge data files on Linux systems.
  • Implemented workflow systems to manage analysis pipelines that have many third-party components and that take very long compute hours.
Technologies: High-performance Computing, Big Data, Statistical Analysis, Genomics, R, Python, Linux, Mathematics, Data Modeling, Predictive Analytics, Predictive Modeling, Time Series Analysis, Time Series, Bioinformatics

PhD Candidate

2004 - 2009
University of Melbourne
  • Developed a new training algorithm for Support Vector Machine, one of the most reliable machine learning methods in discriminative classification. The work was published at ieeexplore.ieee.org/document/4524109.
  • Devised a novel overlapping subspace clustering algorithm, allowing the detection of sub-dimensional related clusters. It was implemented in C# and Java. This work was published at bmcecolevol.biomedcentral.com/articles/10.1186/1471-2148-8-116.
  • Developed a Windows desktop application using .NET and C# to allow researchers to interact with genomes rearrangements derived from machine learning results.
Technologies: C#, Java, Support Vector Machines (SVM), Machine Learning, Neural Networks, Clustering, Supervised Learning, Unsupervised Learning, Data Modeling, Mathematics, Statistical Analysis, Time Series Analysis, Time Series, Genomics, Bioinformatics

Automated Index Futures Trading

I developed a fully automated system to trade S&P 500 E-mini futures (ES) using the service by Interactive Brokers where I analyzed all historical OHLCV data of ES since the inception date to identify low-risk, short-term trading strategies for index futures.

I developed new indicators and amore useful metricl than the Sharpe ratio to measure performance consistencies. The framework I created has allowed ongoing research of new strategies to be quickly deployed into the production system.

This project involved using Python, R, Java, a headless set up for Ubuntu, and cloud computing for complete hands-off automated trading (for regions where Interactive Brokers require 2-step authentications, my system still requires a weekly manual authentication). While trading can arguably be good or evil, data science is truly nothing but fascinating.

Big Data R&D in Genomics

https://scholar.google.com.au/citations?user=zSI2d_sAAAAJ
I published 100+ papers around deciphering genomics big data. A number of them are first-authored and senior-authored papers. An extensive list of the papers is on Google Scholar. It covers various topics in genomics, including the development of Deep Learning methods to perform segmentation on CT scans, the use of statistical methods to predict DNA mutations, and the development of pipeline frameworks to facilitate big data analysis automation.

Full-stack Django Bootstrap4 Web App for Interactive Genomics

https://biscut.seqliner.org
Designed and developed a web-based platform housing a number of genomics tools to be used by researchers with no programming skills. The tools are accessible via UIs and forms and heavy-computing tasks are run in the background using Celery back-end tasks. Wrappers to published methods are constantly added as tools. The site serves 500+ internal researchers but is not open to public.

Databricks and NLP | Public Grant Application Text Data Processing

https://github.com/jtjli/nih_reporter
A collaborative project to apply analytics on public grant data. My role is to develop pipelines and OpenAI tools using Databricks, where I demonstrate the use of streaming live tables and the power of spark. Using BLOOM and OpenAI APIs for fine-tuning ChatGPT models, I'm creating a tool that will write scientific grant applications from prompts containing only keywords.

Languages

Python, R, SQL, Java, JavaScript, Python 3, C#

Paradigms

Data Science, Automation, Quantitative Research, High-performance Computing

Platforms

MacOS, RStudio, Linux, Docker, Amazon Web Services (AWS), Databricks, Azure, Google Cloud Platform (GCP)

Industry Expertise

Bioinformatics, High-frequency Trading (HFT)

Other

Statistics, Genomics, Machine Learning, Statistical Methods, RNA Sequencing, Exploratory Data Analysis, Research, Statistical Analysis, Artificial Intelligence (AI), Data Modeling, Data Analytics, Data Analysis, Algorithms, Mathematics, Mathematical Analysis, Streaming, Real-time Data, Real-time Streaming, Predictive Analytics, Predictive Modeling, Computer Vision, Trading, Bots, Algorithmic Trading, API Integration, Crypto, Time Series Analysis, Time Series, Backtesting Trading Strategies, Deep Learning, Data Engineering, Futures & Options, Finance, Quantitative Finance, Engineering, Cloud Computing, Data Visualization, Full-stack, Support Vector Machines (SVM), Neural Networks, Clustering, Supervised Learning, Unsupervised Learning, Big Data, Finance APIs, Data Reporting, Financial Markets, Financial Forecasting, Data, Web App Development, Azure Databricks, Natural Language Processing (NLP), OpenAI GPT-4 API, Delta Lake, Oncology & Cancer Treatment, Prediction Markets, Stock Analysis, Stock Trading, Visualization

Libraries/APIs

Interactive Brokers API, jQuery, Scikit-learn, PyTorch, Google Analytics API, PySpark, TensorFlow

Tools

PyCharm, Docker Compose, Celery, Microsoft Power BI, Tableau, Google Analytics, Spark SQL

Frameworks

Django, Data Lakehouse, Spark

Storage

Elasticsearch, Data Pipelines, Databases

2004 - 2009

Ph.D. Degree in Engineering (Bioinformatics)

University of Melbourne - Melbourne, Australia

1999 - 2003

Bachelor's Degree in Computer Science and Mechatronics Engineering

University of Melbourne - Melbourne, Australia

MARCH 2023 - MARCH 2024

Academy Accreditation - Platform Administrator

Databricks

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring