Sheng Han Lim
Verified Expert in Engineering
Artificial Intelligence Developer
Sheng Han is a senior machine learning (ML) engineer with 10 years of experience in the research, development, and application of various ML, AI, and industry 4.0 solutions. He has turned big data into valuable actions, with a demonstrated history of driving business efficiencies and cost reductions. He is proficient in computer vision, NLP, anomaly detection, and prediction tasks. Sheng Han passionately believes in AI and data science and their potential to positively change human life.
Pandas, NumPy, Scikit-learn, OpenCV, Keras, TensorFlow, Django, Natural Language Toolkit (NLTK), Gensim, Python
The most amazing...
...project I've completed as a data scientist, modeler, and back-end developer is COVIDNOW, Malaysia's official government portal for COVID-19 insights.
Malaysian Government Consultant
- Built an OpenAI GPT-3.5 turbo-powered AI chatbot to provide analytical responses to natural language queries on top of open datasets for the national open data portal. Used a LangChain, OpenAI embeddings, pgvector, and LlamaIndex pipeline.
- Built an AI assistant for public API documentation for developers to set up and use the public API easily. Used OpenAI API and embeddings stored in pgvector for document retrieval.
- Built a data pipeline to serve a real-time transport API in the GTFS and GTFS real-time spec from IoT databases. Real-time pipelines are managed by a Dagster-powered pipeline orchestrator and served on an HTTP REST API powered by Django.
Senior Machine Learning Engineer
- Developed an NLP-powered engine to classify root causes on millions of machine text log messages to improve tool utilization and save labor hours at Intel fabs. Built a Python text data processing pipeline to ingest data from an Elasticsearch API.
- Built a CV anomaly detection solution to improve tool availability and quality issues related to maintenance at Intel fabs with object detection and deep learning models. Led improvements in image-capturing hardware and image-processing pipelines.
- Led team efforts in defining and enabling an ML operations framework and platform for faster and highly scalable deployments and proliferation of AI/ML solutions across Intel products and factories worldwide.
Senior Software Engineer
- Oversaw efforts in upholding and upgrading technical competency and standardization of the software development quality and data science areas across the company.
- Architected and led the development of the AI Vision platform, providing an end-to-end machine learning pipeline for image-based projects with image annotation, image segmentation, classification model training, model performance tuning, and export.
- Contributed as the key developer in designing and building an automated defect classification platform using Mask R-CNN deep learning to detect and classify semiconductor fabrication and assembly defects.
Advanced Analytics Software Engineer
- Co-developed a rapid big data visual analytics system using Python and HDF5 technologies to perform parallel analytics and visualization on large manufacturing data in seconds and, ultimately, to turn information into cost-saving opportunities.
- Co-created a novel method for learning and predicting highly imbalanced datasets to enable smoother factory operations. The methodology has been awarded an Intel Distinguished Invention.
- Achieved a 13-fold improvement in execution time in cleansing datasets prior to data mining applications after performing the full code conversion of a density clustering algorithm from MATLAB to C#.
eCommerce OpenAI GPT3.5 Chatbothttps://mcourt.eu/
Design and Development of a Government COVID-19 Data Portalhttps://covidnow.moh.gov.my/
I designed and built the back end and data pipelines, data storytelling, visualization of key COVID-19 indicators, and modeling of vaccination projections.
The project also spotlighted public-private partnerships in technology toward an all-of-society approach to the Malaysian Government's crisis handling. It boosted the culture shift in open data initiatives and data democratization.
Full Stack, Modeling, and Projection for Malaysia's Vaccination Trackerhttps://vax.tehcpeng.net/
This was a personal project to demonstrate full-stack software development and data science in the form of modeling and projection of vaccination targets. A Python loader on the back-end pulls, aggregates, and models open vaccination data from the Malaysian Ministry of Health on GitHub.
Semiconductor Defect Detection and Classification
Pandas, NumPy, Scikit-learn, OpenCV, REST APIs, Matplotlib, XGBoost, Keras, TensorFlow, Natural Language Toolkit (NLTK), Vue, PyTorch, React
Data Science, Parallel Programming, Object-oriented Programming (OOP), Microservices
Data Pipelines, PostgreSQL, Redis, Memcached, Amazon S3 (AWS S3)
Django, Next.js, Flask, Tailwind CSS, LlamaIndex
Git, Notion, Gensim, MATLAB, Shell
Linux, Jupyter Notebook, Linux CentOS 7, Docker, Windows, Amazon Web Services (AWS), Amazon EC2, Azure
Bachelor's Degree in Computer Science
Swinburne University of Technology - Melbourne, Australia
Machine Learning Engineering for Production (MLOps) Specialization
Deep Learning Nanodegree
Data Science Foundations - Level 1
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.Start hiring