Sheng Han Lim, Developer in George Town Penang, Malaysia
Sheng is available for hire
Hire Sheng

Sheng Han Lim

Verified Expert  in Engineering

Bio

Sheng Han is a senior machine learning (ML) engineer with 10 years of experience in the research, development, and application of various ML, AI, and generative AI solutions. He has turned big data into valuable actions and has a demonstrated history of driving business efficiencies and cost reductions. Sheng is proficient in generative AI, prompt engineering, computer vision, natural language processing (NLP), anomaly detection, and prediction tasks.

Portfolio

Malaysian Government Consultant
OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3)...
Intel
Computer Vision, Data Science, Data Visualization, Deep Learning, Gensim, Keras...
Asym Labs
Artificial Intelligence (AI), Machine Learning, Large Language Models (LLMs)...

Experience

  • Artificial Intelligence (AI) - 10 years
  • Data Science - 10 years
  • Machine Learning - 9 years
  • OpenAI - 2 years
  • Prompt Engineering - 2 years
  • Large Language Models (LLMs) - 2 years
  • Retrieval-augmented Generation (RAG) - 1 year
  • LangChain - 1 year

Availability

Part-time

Preferred Environment

Scikit-learn, Django, Python, LangChain, Amazon Web Services (AWS), Google Cloud Platform (GCP), OpenAI

The most amazing...

...project I've completed as a data scientist, modeler, and back-end developer is COVIDNOW, Malaysia's official government portal for COVID-19 insights.

Work Experience

AI Consultant

2022 - PRESENT
Malaysian Government Consultant
  • Developed an OpenAI GPT-4-powered AI chatbot to provide analytical responses to natural language queries on top of open datasets for the national open data portal. Used LangChain, LangGraph, OpenAI embeddings, pgvector, and a LlamaIndex pipeline.
  • Built an AI assistant for public API documentation so developers could easily set up and use the public API. Used OpenAI API and embeddings stored in pgvector for document retrieval. Developed prompts with advanced prompt engineering techniques.
  • Contributed to front-end development of a census dashboard built in Next.js and TypeScript.
Technologies: OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Generative Pre-trained Transformers (GPT), LangChain, Django, Data Pipelines, Data Engineering, Natural Language Processing (NLP), Software Development, REST APIs, Machine Learning Operations (MLOps), OpenAI, Chatbots, Full-stack Development, Minimum Viable Product (MVP), Amazon Web Services (AWS), SQL, Amazon EC2, Amazon S3 (AWS S3), Data Analysis, Weaviate, Large Language Models (LLMs), Back-end Development, ChatGPT, HTML Integration, OpenAI GPT-4 API, Interactive JavaScript, Notion, Retrieval-augmented Generation (RAG), Web Scraping, Language Models, Predictive Modeling, Prompt Engineering, Generative Artificial Intelligence (GenAI), AI Chatbots, API Integration, Node.js, Make, Vector Databases, Pgvector, OpenAI API, Document Processing, Optical Character Recognition (OCR), Gemini, Team Leadership

Senior AI/ML Engineer

2021 - PRESENT
Intel
  • Developed an NLP-powered engine to classify root causes on millions of machine text log messages to improve tool utilization and save labor hours at Intel fabs. Built a Python text data processing pipeline to ingest data from an Elasticsearch API.
  • Built a CV anomaly detection solution to improve tool availability and quality issues related to maintenance at Intel fabs with object detection and deep learning models. Led improvements in image-capturing hardware and image-processing pipelines.
  • Led team efforts in defining and enabling an ML operations framework and platform for faster and highly scalable deployments and proliferation of AI/ML solutions across Intel products and factories worldwide.
Technologies: Computer Vision, Data Science, Data Visualization, Deep Learning, Gensim, Keras, Natural Language Toolkit (NLTK), Scikit-learn, OpenCV, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Machine Learning Operations (MLOps), Machine Learning, TensorFlow, Software Design, Software Development, Matplotlib, Git, Jupyter Notebook, Image Processing, Parallel Programming, Docker, Windows, APIs, Object-oriented Programming (OOP), Convolutional Neural Networks (CNNs), Artificial Intelligence (AI), Data Pipelines, Data Engineering, Data Aggregation, JupyterLab, XGBoost, Random Forests, REST APIs, Full-stack Development, Image Analysis, SQL, Data Analysis, Image Recognition, Computer Vision Algorithms, Back-end Development, ChatGPT, Chatbots, Notion, Retrieval-augmented Generation (RAG), Azure, Language Models, Predictive Modeling, Prompt Engineering, Generative Artificial Intelligence (GenAI), AI Chatbots, API Integration, Make, Sentiment Analysis, Vector Databases, Pgvector, Clustering Algorithms, DBSCAN, K-means Clustering, Clustering, Algorithms, OpenAI API, Optical Character Recognition (OCR), Team Leadership

AI Engineer (via Toptal)

2024 - 2024
Asym Labs
  • Developed and engineered prompts for the modification and synthesis of clauses in legal documents based on user questionnaires and inputs.
  • Built a framework and engineered prompts for a legal assistant bot that extracts relevant legal entities and information from user correspondence and documents and suggests relevant legal contracts.
  • Engineered prompts for a legal assistant bot to gather information, perform tasks, and interact with a legal platform's API.
Technologies: Artificial Intelligence (AI), Machine Learning, Large Language Models (LLMs), Prompt Engineering, AI Prompts, OpenAI GPT-4 API, OpenAI, LangChain, OpenAI GPT-3 API, Legal Technology (Legaltech), Generative Artificial Intelligence (GenAI), API Integration, Vector Databases, Pgvector, OpenAI API, Document Processing, Optical Character Recognition (OCR), Gemini

Senior Software Engineer

2017 - 2021
Invantest
  • Oversaw efforts in upholding and upgrading technical competency and standardization of the software development quality and data science areas across the company.
  • Architected and led the development of the AI Vision platform, providing an end-to-end machine learning pipeline for image-based projects with image annotation, image segmentation, classification model training, model performance tuning, and export.
  • Contributed as the key developer in designing and building an automated defect classification platform using Mask R-CNN deep learning to detect and classify semiconductor fabrication and assembly defects.
Technologies: Computer Vision, Django, Data Science, Deep Learning, Python, Vue, REST APIs, Redis, Memcached, Linux, Linux CentOS 7, Tcl, Shell, Bash, PostgreSQL, OpenCV, Scikit-learn, Machine Learning, Tornado, TensorFlow, Software Design, Software Development, Matplotlib, Git, Jupyter Notebook, Image Processing, Parallel Programming, Docker, Windows, APIs, Object-oriented Programming (OOP), Optimization, XGBoost, Back-end, Convolutional Neural Networks (CNNs), Artificial Intelligence (AI), Data Engineering, Data Pipelines, Data Aggregation, R, Random Forests, Full-stack Development, Image Analysis, Minimum Viable Product (MVP), SQL, Amazon EC2, Data Analysis, Image Recognition, Computer Vision Algorithms, Back-end Development, HTML Integration, Interactive JavaScript, Web Scraping, Predictive Modeling, API Integration, Make, Clustering Algorithms, DBSCAN, K-means Clustering, Clustering, Algorithms, Java, Team Leadership

Advanced Analytics Software Engineer

2013 - 2017
Intel
  • Co-developed a rapid big data visual analytics system using Python and HDF5 technologies to perform parallel analytics and visualization on large manufacturing data in seconds and, ultimately, to turn information into cost-saving opportunities.
  • Co-created a novel method for learning and predicting highly imbalanced datasets to enable smoother factory operations. The methodology has been awarded an Intel Distinguished Invention.
  • Achieved a 13-fold improvement in execution time in cleansing datasets prior to data mining applications after performing the full code conversion of a density clustering algorithm from MATLAB to C#.
Technologies: Data Science, Python, C#.NET, Scikit-learn, MATLAB, Pandas, NumPy, REST APIs, Machine Learning, Tornado, Software Development, Matplotlib, Git, Jupyter Notebook, Parallel Programming, Windows, APIs, Object-oriented Programming (OOP), Back-end, Artificial Intelligence (AI), Data Engineering, Data Aggregation, Microservices, Data Pipelines, JupyterLab, XGBoost, Random Forests, Full-stack Development, SQL, Data Analysis, Back-end Development, Predictive Modeling, Clustering Algorithms, DBSCAN, K-means Clustering, Clustering, Algorithms, Java

eCommerce OpenAI GPT3.5 Chatbot

https://mcourt.eu/
An OpenAI GPT-3.5-powered conversational retail assistant for product recommendation and as the AI/ML engineer and back-end developer, I built vector pipelines to ingest the product database of an eCommerce website and a streaming back-end chat API deployed on AWS. I made it entirely on a Python FastAPI stack with vector DB and PostgreSQL.

Design and Development of a Government COVID-19 Data Portal

https://covidnow.moh.gov.my/
COVIDNOW is the official government portal for data and insights on COVID-19 from the Malaysian Ministry of Health to keep the public well and quickly informed on the state of the pandemic.

I designed and built the back end and data pipelines, data storytelling, visualization of key COVID-19 indicators, and modeling of vaccination projections.

The project also spotlighted public-private partnerships in technology toward an all-of-society approach to the Malaysian Government's crisis handling. It boosted the culture shift in open data initiatives and data democratization.

Full Stack, Modeling, and Projection for Malaysia's Vaccination Tracker

https://vax.tehcpeng.net/
MY Vax Tracker is Malaysia's vaccination progress tracker, which I built in Next.js and Tailwind CSS.

This was a personal project to demonstrate full-stack software development and data science in the form of modeling and projection of vaccination targets. A Python loader on the back-end pulls, aggregates, and models open vaccination data from the Malaysian Ministry of Health on GitHub.

Semiconductor Defect Detection and Classification

An end-to-end machine vision platform for semiconductor defect detection and classification where I was the architect and lead developer for the platform, allowing image dataset annotation and labeling, model training, tuning, and deployment. The masked-RCNN architecture was used to train an image segmentation model.
2007 - 2013

Bachelor's Degree in Computer Science

Swinburne University of Technology - Melbourne, Australia

SEPTEMBER 2024 - SEPTEMBER 2026

Google Cloud Certified Professional Machine Learning Engineer Certification

Google Cloud

MAY 2022 - PRESENT

Machine Learning Engineering for Production (MLOps) Specialization

Coursera

FEBRUARY 2022 - PRESENT

Deep Learning Nanodegree

Udacity

APRIL 2017 - PRESENT

Data Science Foundations - Level 1

IBM

Libraries/APIs

Pandas, NumPy, Scikit-learn, OpenCV, REST APIs, Matplotlib, XGBoost, Google Cloud API, OpenAI API, Keras, TensorFlow, Natural Language Toolkit (NLTK), Vue, PyTorch, React, Node.js

Tools

Git, ChatGPT, Notion, Gensim, MATLAB, Shell, BigQuery, AI Prompts, Make

Languages

Python, C#.NET, Tcl, Bash, JavaScript, R, SQL, Java

Storage

Data Pipelines, PostgreSQL, Redis, Memcached, Amazon S3 (AWS S3), Google Cloud Storage, Google Cloud

Frameworks

Django, Next.js, Flask, Tailwind CSS, LlamaIndex

Paradigms

Parallel Programming, Object-oriented Programming (OOP), Microservices

Platforms

Linux, Jupyter Notebook, Google Cloud Platform (GCP), Vertex AI, Cloud Run, Linux CentOS 7, Docker, Windows, Amazon Web Services (AWS), Amazon EC2, Azure

Other

Tornado, Data Science, Machine Learning, Natural Language Processing (NLP), Artificial Intelligence (AI), Data Aggregation, Large Language Models (LLMs), Back-end Development, Retrieval-augmented Generation (RAG), API Integration, Computer Vision, Deep Learning, Data Visualization, Software Development, Image Processing, Data Engineering, JupyterLab, Random Forests, Chatbots, Full-stack Development, Image Analysis, Minimum Viable Product (MVP), Data Analysis, Image Recognition, Computer Vision Algorithms, HTML Integration, Language Models, Predictive Modeling, Prompt Engineering, Gemini API, Gemini, Google Cloud ML, Generative Artificial Intelligence (GenAI), AI Chatbots, Sentiment Analysis, Vector Databases, Pgvector, Clustering Algorithms, DBSCAN, K-means Clustering, Clustering, Algorithms, Document Processing, Optical Character Recognition (OCR), Team Leadership, Software Design, Machine Learning Operations (MLOps), APIs, Optimization, Back-end, Convolutional Neural Networks (CNNs), Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), LangChain, OpenAI, Weaviate, OpenAI GPT-4 API, Interactive JavaScript, Web Scraping, Google BigQuery, Google Cloud Build, Legal Technology (Legaltech)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring