Jason Li, Machine Learning Developer in Melbourne, Victoria, Australia
Jason Li

Machine Learning Developer in Melbourne, Victoria, Australia

Member since April 5, 2022
Jason is a data analyst and researcher interested in finding patterns in novel data sets. He specializes in using traditional statistical methods and contemporary machine learning (ML) to extract knowledge from data. His experiences range from ultra-high dimensional and big data to time-series and non-numerical data. Jason is a problem-solver with a proven ability to deliver in different team settings and to communicate with clients with diverse technical backgrounds.
Jason is now available for hire

Portfolio

Experience

Location

Melbourne, Victoria, Australia

Availability

Part-time

Preferred Environment

MacOS, PyCharm, RStudio

The most amazing...

...data that I’ve dealt with is DNA sequencing data, developing statistical and novel data analytical methods to predict cancer-causing DNA mutations.

Employment

  • Technical Lead

    2017 - PRESENT
    Algorithmic Futures Trading (Partnership)
    • Researched and developed trading strategies for index futures, currency, and commodity futures.
    • Developed frameworks in Python and R to facilitate strategic development, parameter optimization, and backtesting.
    • Developed new metrics to quantify performance consistency, metrics that are more useful than Sharpe and Sortino ratios for our purpose.
    • Performed exploratory data analysis and generated different styles of plots such as heatmaps, correlation plots, 3D scatter plots, histograms, and customized OHLC charts to help assess ideas and enable new observations to be made.
    • Implemented a fully-automated Java program to make trades.
    • Used AWS Cloud architecture and later migrated to Google Cloud Platform for production and live trading, and parameter optimization.
    Technologies: Backtesting Trading Strategies, Automation, Interactive Brokers API, Java, Data Science, Finance APIs
  • Head of Bioinformatics Core Facility and Senior Research Fellow

    2013 - PRESENT
    Peter MacCallum Cancer Centre
    • Analyzed complex genomic and clinical data using ML, deep learning, statistical tests, correlation analyses, hypothesis testing, and survival analyses.
    • Researched and developed analytical methods for previously unseen data types derived from novel biotechnologies.
    • Provided data analytics and statistical consultation to researchers from diverse disciplinary backgrounds.
    • Worked with a team of more than five software engineers and bioinformaticians to provide ongoing services to research and clinical laboratories.
    • Established collaborations with computational scientists from external medical research and academic institutes on research engineering projects.
    • Developed novel methods for the analysis of exome sequencing data.
    • Utilized high-performance computing and cloud computing to optimize turnaround time for extensive data analysis.
    • Trained multiple young data scientists through undergraduate and postgraduate research placement programs.
    Technologies: R, RStudio, High-performance Computing, Cloud Computing, Python, Statistical Methods, Machine Learning, Genomics, RNA Sequencing, Exploratory Data Analysis, Data Visualization, Data Science, Artificial Intelligence (AI), Scikit-learn, PyTorch, Deep Learning
  • Data Analyst and Bioinformatician

    2007 - 2013
    Peter MacCallum Cancer Centre
    • Analyzed data derived from biological experiments using custom methods and published methods.
    • Developed and implemented new algorithms to identify peaks in signal data efficiently.
    • Managed the logistics of huge data files on Linux systems.
    • Implemented workflow systems to manage analysis pipelines that have many third-party components and that take very long compute hours.
    Technologies: High-performance Computing, Big Data, Statistical Analysis, Genomics, R, Python, Linux
  • PhD Candidate

    2004 - 2009
    University of Melbourne
    • Developed a new training algorithm for Support Vector Machine, one of the most reliable machine learning methods in discriminative classification. The work was published at ieeexplore.ieee.org/document/4524109.
    • Devised a novel overlapping subspace clustering algorithm, allowing the detection of sub-dimensional related clusters. It was mplemented in C# and Java. This work was published at bmcecolevol.biomedcentral.com/articles/10.1186/1471-2148-8-116.
    • Developed a Windows desktop application using .NET and C# to allow researchers to interact with genomes rearrangements derived from machine learning results.
    Technologies: C#, Java, Support Vector Machines (SVM), Machine Learning, Neural Networks, Clustering, Supervised Learning, Unsupervised Learning

Experience

  • Automated Index Futures Trading

    Developed a fully automated system to trade S&P 500 E-mini futures (ES) using the service by Interactive Brokers. I analyzed all historical OHLCV data of ES since the inception date to identify low-risk short-term trading strategies for index futures. I developed new indicators and a metric that is more useful than the Sharpe ratio to measure performance consistencies. The framework I created has allowed ongoing research of new strategies to be quickly deployed into the production system. This project involved using Python, R, Java, a headless set up for Ubuntu, and cloud computing for a complete hands-off automated trading (for regions where Interactive Brokers require 2-step authentications, my system still requires a weekly manual authentication). While trading can arguably be good or evil, data science is truly nothing but fascinating.

  • Big Data R&D in Genomics
    https://scholar.google.com.au/citations?user=zSI2d_sAAAAJ

    I published 100+ papers around deciphering genomics big data. A number of them are first-authored and senior-authored papers. An extensive list of the papers is on Google Scholar. It covers various topics in genomics, including the development of Deep Learning methods to perform segmentation on CT scans, the use of statistical methods to predict DNA mutations, and the development of pipeline frameworks to facilitate big data analysis automation.

  • Full-stack Django Bootstrap4 Web App for Interactive Genomics
    https://biscut.seqliner.org

    Designed and developed a web-based platform housing a number of genomics tools to be used by researchers with no programming skills. The tools are accessible via UIs and forms and heavy-computing tasks are run in the background using Celery back-end tasks. Wrappers to published methods are constantly added as tools. The site serves 500+ internal researchers but is not open to public.

Skills

  • Languages

    Python, R, Java, JavaScript, Python 3, C#
  • Paradigms

    Data Science, Automation, High-performance Computing
  • Platforms

    MacOS, RStudio, Linux, Docker
  • Industry Expertise

    Bioinformatics
  • Other

    Genomics, Machine Learning, Statistical Methods, RNA Sequencing, Exploratory Data Analysis, Artificial Intelligence (AI), Backtesting Trading Strategies, Deep Learning, Statistics, Engineering, Cloud Computing, Data Visualization, Research, Full-stack, AWS, Support Vector Machines (SVM), Neural Networks, Clustering, Supervised Learning, Unsupervised Learning, Big Data, Statistical Analysis, Finance APIs
  • Libraries/APIs

    Interactive Brokers API, jQuery, Scikit-learn, PyTorch
  • Tools

    PyCharm, Docker Compose, Celery
  • Frameworks

    Django
  • Storage

    Elasticsearch

Education

  • Ph.D. Degree in Engineering (Bioinformatics)
    2004 - 2009
    University of Melbourne - Melbourne, Australia
  • Bachelor's Degree in Computer Science and Mechatronics Engineering
    1999 - 2003
    University of Melbourne - Melbourne, Australia

To view more profiles

Join Toptal
Share it with others