Sheng Han Lim, Developer in George Town Penang, Malaysia
Sheng is available for hire
Hire Sheng

Sheng Han Lim

Verified Expert  in Engineering

Artificial Intelligence Developer

Location
George Town Penang, Malaysia
Toptal Member Since
July 29, 2022

Sheng Han is a senior machine learning (ML) engineer with 10 years of experience in the research, development, and application of various ML, AI, and industry 4.0 solutions. He has turned big data into valuable actions, with a demonstrated history of driving business efficiencies and cost reductions. He is proficient in computer vision, NLP, anomaly detection, and prediction tasks. Sheng Han passionately believes in AI and data science and their potential to positively change human life.

Portfolio

Malaysian Government Consultant
OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3)...
Intel
Computer Vision, Data Science, Data Visualization, Deep Learning, Gensim, Keras...
Invantest
Computer Vision, Django, Data Science, Deep Learning, Python, Vue, REST APIs...

Experience

Availability

Part-time

Preferred Environment

Pandas, NumPy, Scikit-learn, OpenCV, Keras, TensorFlow, Django, Natural Language Toolkit (NLTK), Gensim, Python

The most amazing...

...project I've completed as a data scientist, modeler, and back-end developer is COVIDNOW, Malaysia's official government portal for COVID-19 insights.

Work Experience

AI Consultant

2022 - PRESENT
Malaysian Government Consultant
  • Built an OpenAI GPT-3.5 turbo-powered AI chatbot to provide analytical responses to natural language queries on top of open datasets for the national open data portal. Used a LangChain, OpenAI embeddings, pgvector, and LlamaIndex pipeline.
  • Built an AI assistant for public API documentation for developers to set up and use the public API easily. Used OpenAI API and embeddings stored in pgvector for document retrieval.
  • Built a data pipeline to serve a real-time transport API in the GTFS and GTFS real-time spec from IoT databases. Real-time pipelines are managed by a Dagster-powered pipeline orchestrator and served on an HTTP REST API powered by Django.
Technologies: OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Generative Pre-trained Transformers (GPT), LangChain, Django, Data Pipelines, Data Engineering, Natural Language Processing (NLP), Software Development, REST APIs, Machine Learning Operations (MLOps), OpenAI, Chatbots, Full-stack Development, Minimum Viable Product (MVP), Amazon Web Services (AWS), SQL, Amazon EC2, Amazon S3 (AWS S3), Data Analysis, Weviate, Large Language Models (LLMs), Back-end Development, ChatGPT, HTML Integration, OpenAI GPT-4 API, Interactive JavaScript, Chatbot, Notion, Retrieval Augmented Generation (RAG), Web Scraping, Language Models, Predictive Modeling, Prompt Engineering

Senior Machine Learning Engineer

2021 - PRESENT
Intel
  • Developed an NLP-powered engine to classify root causes on millions of machine text log messages to improve tool utilization and save labor hours at Intel fabs. Built a Python text data processing pipeline to ingest data from an Elasticsearch API.
  • Built a CV anomaly detection solution to improve tool availability and quality issues related to maintenance at Intel fabs with object detection and deep learning models. Led improvements in image-capturing hardware and image-processing pipelines.
  • Led team efforts in defining and enabling an ML operations framework and platform for faster and highly scalable deployments and proliferation of AI/ML solutions across Intel products and factories worldwide.
Technologies: Computer Vision, Data Science, Data Visualization, Deep Learning, Gensim, Keras, Natural Language Toolkit (NLTK), Scikit-learn, OpenCV, Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Machine Learning Operations (MLOps), Machine Learning, TensorFlow, Software Design, Software Development, Matplotlib, Git, Jupyter Notebook, Image Processing, Parallel Programming, Docker, Windows, APIs, Object-oriented Programming (OOP), Convolutional Neural Networks, Artificial Intelligence (AI), Data Pipelines, Data Engineering, Data Aggregation, JupyterLab, XGBoost, Random Forests, REST APIs, Full-stack Development, Image Analysis, SQL, Data Analysis, Image Recognition, Computer Vision Algorithms, Back-end Development, ChatGPT, Chatbot, Notion, Retrieval Augmented Generation (RAG), Azure, Language Models, Predictive Modeling, Prompt Engineering

Senior Software Engineer

2017 - 2021
Invantest
  • Oversaw efforts in upholding and upgrading technical competency and standardization of the software development quality and data science areas across the company.
  • Architected and led the development of the AI Vision platform, providing an end-to-end machine learning pipeline for image-based projects with image annotation, image segmentation, classification model training, model performance tuning, and export.
  • Contributed as the key developer in designing and building an automated defect classification platform using Mask R-CNN deep learning to detect and classify semiconductor fabrication and assembly defects.
Technologies: Computer Vision, Django, Data Science, Deep Learning, Python, Vue, REST APIs, Redis, Memcached, Linux, Linux CentOS 7, Tcl, Shell, Bash, PostgreSQL, OpenCV, Scikit-learn, Machine Learning, Tornado, TensorFlow, Software Design, Software Development, Matplotlib, Git, Jupyter Notebook, Image Processing, Parallel Programming, Docker, Windows, APIs, Object-oriented Programming (OOP), Optimization, XGBoost, Back-end, Convolutional Neural Networks, Artificial Intelligence (AI), Data Engineering, Data Pipelines, Data Aggregation, R, Random Forests, Full-stack Development, Image Analysis, Minimum Viable Product (MVP), SQL, Amazon EC2, Data Analysis, Image Recognition, Computer Vision Algorithms, Back-end Development, HTML Integration, Interactive JavaScript, Web Scraping, Predictive Modeling

Advanced Analytics Software Engineer

2013 - 2017
Intel
  • Co-developed a rapid big data visual analytics system using Python and HDF5 technologies to perform parallel analytics and visualization on large manufacturing data in seconds and, ultimately, to turn information into cost-saving opportunities.
  • Co-created a novel method for learning and predicting highly imbalanced datasets to enable smoother factory operations. The methodology has been awarded an Intel Distinguished Invention.
  • Achieved a 13-fold improvement in execution time in cleansing datasets prior to data mining applications after performing the full code conversion of a density clustering algorithm from MATLAB to C#.
Technologies: Data Science, Python, C#.NET, Scikit-learn, MATLAB, Pandas, NumPy, REST APIs, Machine Learning, Tornado, Software Development, Matplotlib, Git, Jupyter Notebook, Parallel Programming, Windows, APIs, Object-oriented Programming (OOP), Back-end, Artificial Intelligence (AI), Data Engineering, Data Aggregation, Microservices, Data Pipelines, JupyterLab, XGBoost, Random Forests, Full-stack Development, SQL, Data Analysis, Back-end Development, Predictive Modeling

eCommerce OpenAI GPT3.5 Chatbot

https://mcourt.eu/
An OpenAI GPT-3.5-powered conversational retail assistant for product recommendation. As the machine learning engineer and back-end developer, I built vector pipelines to ingest the product database of an eCommerce website and a streaming back-end chat API deployed on AWS. I made it entirely on a Python FastAPI stack with vector DB and PostgreSQL.

Design and Development of a Government COVID-19 Data Portal

https://covidnow.moh.gov.my/
COVIDNOW is the official government portal for data and insights on COVID-19 from the Malaysian Ministry of Health to keep the public well and quickly informed on the state of the pandemic.

I designed and built the back end and data pipelines, data storytelling, visualization of key COVID-19 indicators, and modeling of vaccination projections.

The project also spotlighted public-private partnerships in technology toward an all-of-society approach to the Malaysian Government's crisis handling. It boosted the culture shift in open data initiatives and data democratization.

Full Stack, Modeling, and Projection for Malaysia's Vaccination Tracker

https://vax.tehcpeng.net/
MY Vax Tracker is Malaysia's vaccination progress tracker, which I built in Next.js and Tailwind CSS.

This was a personal project to demonstrate full-stack software development and data science in the form of modeling and projection of vaccination targets. A Python loader on the back-end pulls, aggregates, and models open vaccination data from the Malaysian Ministry of Health on GitHub.

Semiconductor Defect Detection and Classification

An end-to-end machine vision platform for semiconductor defect detection and classification. I was the architect and lead developer for the platform, allowing image dataset annotation and labeling, model training, tuning, and deployment. The masked-RCNN architecture was used to train an image segmentation model.

Languages

Python, C#.NET, Tcl, Bash, JavaScript, R, SQL

Libraries/APIs

Pandas, NumPy, Scikit-learn, OpenCV, REST APIs, Matplotlib, XGBoost, Keras, TensorFlow, Natural Language Toolkit (NLTK), Vue, PyTorch, React

Paradigms

Data Science, Parallel Programming, Object-oriented Programming (OOP), Microservices

Storage

Data Pipelines, PostgreSQL, Redis, Memcached, Amazon S3 (AWS S3)

Other

Tornado, Machine Learning, Data Aggregation, Back-end Development, Computer Vision, Natural Language Processing (NLP), Deep Learning, Data Visualization, Software Development, Image Processing, Artificial Intelligence (AI), Data Engineering, JupyterLab, Random Forests, Chatbots, Full-stack Development, Image Analysis, Minimum Viable Product (MVP), Data Analysis, Image Recognition, Computer Vision Algorithms, ChatGPT, HTML Integration, Retrieval Augmented Generation (RAG), Language Models, Predictive Modeling, Prompt Engineering, Software Design, Machine Learning Operations (MLOps), APIs, Optimization, Back-end, Convolutional Neural Networks, GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), LangChain, OpenAI, Weviate, Large Language Models (LLMs), OpenAI GPT-4 API, Interactive JavaScript, Chatbot, Web Scraping

Frameworks

Django, Next.js, Flask, Tailwind CSS, LlamaIndex

Tools

Git, Notion, Gensim, MATLAB, Shell

Platforms

Linux, Jupyter Notebook, Linux CentOS 7, Docker, Windows, Amazon Web Services (AWS), Amazon EC2, Azure

2007 - 2013

Bachelor's Degree in Computer Science

Swinburne University of Technology - Melbourne, Australia

MAY 2022 - PRESENT

Machine Learning Engineering for Production (MLOps) Specialization

Coursera

FEBRUARY 2022 - PRESENT

Deep Learning Nanodegree

Udacity

APRIL 2017 - PRESENT

Data Science Foundations - Level 1

IBM

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring