Jason Li
Verified Expert in Engineering
Machine Learning Developer
Melbourne, Australia
Toptal member since April 14, 2022
Jason is a data analyst and researcher interested in finding patterns in novel data sets. He specializes in using traditional statistical methods and contemporary machine learning (ML) to extract knowledge from data. His experiences range from ultra-high dimensional and big data to time-series and non-numerical data. Jason is a problem-solver with a proven ability to deliver in different team settings and to communicate with clients with diverse technical backgrounds.
Portfolio
Experience
- Machine Learning - 18 years
- Statistics - 15 years
- Statistical Methods - 15 years
- R - 15 years
- Exploratory Data Analysis - 15 years
- Data Science - 15 years
- Mathematical Analysis - 15 years
- Data Analysis - 15 years
Availability
Preferred Environment
MacOS, PyCharm, RStudio
The most amazing...
...data that I’ve dealt with is DNA sequencing data, developing statistical and novel data analytical methods to predict cancer-causing DNA mutations.
Work Experience
Head of Bioinformatics Core Facility and Senior Research Fellow
Peter MacCallum Cancer Centre
- Introduced and supported Databricks and Spark to a number of AI projects, including NLP, computer vision, and multiomics-related projects. Secured $50,000 credits for proof of concept Databricks development.
- Analyzed complex genomic and clinical data using ML, deep learning, statistical tests, correlation analyses, hypothesis testing, and survival analyses.
- Researched and developed analytical methods for previously unseen data types derived from novel biotechnologies.
- Provided data analytics and statistical consultation to researchers from diverse disciplinary backgrounds.
- Collaborated with a team of more than five software engineers and bioinformaticians to provide ongoing services to research and clinical laboratories.
- Established collaborations with computational scientists from external medical research and academic institutes on research engineering projects.
- Developed novel methods for the analysis of exome sequencing data.
- Utilized high-performance computing and cloud computing to optimize turnaround time for extensive data analysis.
- Trained multiple young data scientists through undergraduate and postgraduate research placement programs.
- Developed data visualization of different data types using R plots, interactive HTML, and Microsoft Power BI.
Data Scientist Freelancer
Online Freelance Agency
- Created a Django-based platform that provides visualization of marketing data for a canned cocktail business.
- Automated data streaming from Google Analytics Data API to perform real-time analysis against advertisement referrals to gauge ad effectiveness.
- Worked closely with clients to generate a visualization of marketing data that can be used routinely by the operations and sales teams.
ML Developer
Futures Trading Management Firm
- Created data pipelines to automate the entire workflow of the algorithmic trading system, from data download and ingestion to model updating and parameter visualization to automated trading.
- Analyzed trading strategies against a different cohort of historical data using custom metrics and plots.
- Developed a production system to handle real-time data and trading via the API of the broker's software—CQG Client.
Technical Lead (Partnership)
Algorithmic Futures Trading
- Researched and developed trading strategies for index futures, currency, and commodity futures.
- Developed frameworks in Python and R to facilitate strategic development, parameter optimization, and backtesting.
- Created new metrics that are more useful than Sharpe and Sortino ratios to quantify performance consistency for our purpose.
- Performed exploratory data analysis and generated different styles of plots, such as heatmaps, correlation and 3D scatter plots, histograms, and customized OHLC charts to help assess ideas and enable new observations to be made.
- Implemented a fully automated Java program to make trades.
- Used AWS cloud architecture and later migrated to Google Cloud Platform for production, live trading, and parameter optimization.
Data Analyst and Bioinformatician
Peter MacCallum Cancer Centre
- Analyzed data derived from biological experiments using custom methods and published methods.
- Developed and implemented new algorithms to identify peaks in signal data efficiently.
- Managed the logistics of huge data files on Linux systems.
- Implemented workflow systems to manage analysis pipelines that have many third-party components and that take very long compute hours.
PhD Candidate
University of Melbourne
- Developed a new training algorithm for Support Vector Machine, one of the most reliable machine learning methods in discriminative classification. The work was published at ieeexplore.ieee.org/document/4524109.
- Devised a novel overlapping subspace clustering algorithm, allowing the detection of sub-dimensional related clusters. It was implemented in C# and Java. This work was published at bmcecolevol.biomedcentral.com/articles/10.1186/1471-2148-8-116.
- Developed a Windows desktop application using .NET and C# to allow researchers to interact with genomes rearrangements derived from machine learning results.
Experience
Automated Index Futures Trading
I developed new indicators and amore useful metricl than the Sharpe ratio to measure performance consistencies. The framework I created has allowed ongoing research of new strategies to be quickly deployed into the production system.
This project involved using Python, R, Java, a headless set up for Ubuntu, and cloud computing for complete hands-off automated trading (for regions where Interactive Brokers require 2-step authentications, my system still requires a weekly manual authentication). While trading can arguably be good or evil, data science is truly nothing but fascinating.
Big Data R&D in Genomics
https://scholar.google.com.au/citations?user=zSI2d_sAAAAJFull-stack Django Bootstrap4 Web App for Interactive Genomics
https://biscut.seqliner.orgDatabricks and NLP | Public Grant Application Text Data Processing
https://github.com/jtjli/nih_reporterEducation
Ph.D. Degree in Engineering (Bioinformatics)
University of Melbourne - Melbourne, Australia
Bachelor's Degree in Computer Science and Mechatronics Engineering
University of Melbourne - Melbourne, Australia
Certifications
Academy Accreditation - Platform Administrator
Databricks
Skills
Libraries/APIs
PyTorch, Pandas, Interactive Brokers API, jQuery, Scikit-learn, Google Analytics API, PySpark, TensorFlow
Tools
PyCharm, Docker Compose, Celery, Microsoft Power BI, Tableau, Google Analytics, Spark SQL
Languages
Python, R, SQL, Java, JavaScript, Python 3, C#
Paradigms
Business Intelligence (BI), Automation, Quantitative Research, High-performance Computing (HPC)
Platforms
MacOS, RStudio, Linux, Docker, Amazon Web Services (AWS), Databricks, Azure, Google Cloud Platform (GCP)
Industry Expertise
Bioinformatics, High-frequency Trading (HFT)
Frameworks
Django, Data Lakehouse, Spark
Storage
Elasticsearch, Data Pipelines, Databases
Other
Statistics, Genomics, Machine Learning, Statistical Methods, RNA Sequencing, Exploratory Data Analysis, Data Science, Research, Statistical Analysis, Artificial Intelligence (AI), Data Modeling, Data Analytics, Data Analysis, Algorithms, Mathematics, Mathematical Analysis, Streaming, Real-time Data, Real-time Streaming, Predictive Analytics, Predictive Modeling, Computer Vision, Trading, Bots, Algorithmic Trading, API Integration, Crypto, Time Series Analysis, Time Series, APIs, Quantitative Finance, Financial Modeling, Quantitative Analysis, Quantitative Modeling, DNA Sequencing, Medical Imaging, Quantitative Analysis, Backtesting Trading Strategies, Deep Learning, Data Engineering, Futures & Options, Finance, Quantitative Finance, Forex Trading, Distributed Systems, Engineering, Cloud Computing, Data Visualization, Full-stack, Support Vector Machines (SVM), Neural Networks, Clustering, Supervised Learning, Unsupervised Learning, Big Data, Finance APIs, Data Reporting, Financial Markets, Financial Forecasting, Data, Web App Development, Azure Databricks, Natural Language Processing (NLP), OpenAI GPT-4 API, Delta Lake, Oncology & Cancer Treatment, Prediction Markets, Stock Analysis, Stock Trading, Visualization
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring