Sheng Han Lim
Verified Expert in Engineering
Artificial Intelligence Developer
Sheng Han is a senior machine learning (ML) engineer with 10 years of experience in the research, development, and application of various ML, AI, and industry 4.0 solutions. He has turned big data into valuable actions, with a demonstrated history of driving business efficiencies and cost reductions. He is proficient in computer vision, NLP, anomaly detection, and prediction tasks. Sheng Han passionately believes in AI and data science and their potential to positively change human life.
Portfolio
Experience
Availability
Preferred Environment
Pandas, NumPy, Scikit-learn, OpenCV, Keras, TensorFlow, Django, Natural Language Toolkit (NLTK), Gensim, Python
The most amazing...
...project I've completed as a data scientist, modeler, and back-end developer is COVIDNOW, Malaysia's official government portal for COVID-19 insights.
Work Experience
AI Consultant
Malaysian Government Consultant
- Built an OpenAI GPT-3.5 turbo-powered AI chatbot to provide analytical responses to natural language queries on top of open datasets for the national open data portal. Used a LangChain, OpenAI embeddings, pgvector, and LlamaIndex pipeline.
- Built an AI assistant for public API documentation for developers to set up and use the public API easily. Used OpenAI API and embeddings stored in pgvector for document retrieval.
- Built a data pipeline to serve a real-time transport API in the GTFS and GTFS real-time spec from IoT databases. Real-time pipelines are managed by a Dagster-powered pipeline orchestrator and served on an HTTP REST API powered by Django.
Senior Machine Learning Engineer
Intel
- Developed an NLP-powered engine to classify root causes on millions of machine text log messages to improve tool utilization and save labor hours at Intel fabs. Built a Python text data processing pipeline to ingest data from an Elasticsearch API.
- Built a CV anomaly detection solution to improve tool availability and quality issues related to maintenance at Intel fabs with object detection and deep learning models. Led improvements in image-capturing hardware and image-processing pipelines.
- Led team efforts in defining and enabling an ML operations framework and platform for faster and highly scalable deployments and proliferation of AI/ML solutions across Intel products and factories worldwide.
Senior Software Engineer
Invantest
- Oversaw efforts in upholding and upgrading technical competency and standardization of the software development quality and data science areas across the company.
- Architected and led the development of the AI Vision platform, providing an end-to-end machine learning pipeline for image-based projects with image annotation, image segmentation, classification model training, model performance tuning, and export.
- Contributed as the key developer in designing and building an automated defect classification platform using Mask R-CNN deep learning to detect and classify semiconductor fabrication and assembly defects.
Advanced Analytics Software Engineer
Intel
- Co-developed a rapid big data visual analytics system using Python and HDF5 technologies to perform parallel analytics and visualization on large manufacturing data in seconds and, ultimately, to turn information into cost-saving opportunities.
- Co-created a novel method for learning and predicting highly imbalanced datasets to enable smoother factory operations. The methodology has been awarded an Intel Distinguished Invention.
- Achieved a 13-fold improvement in execution time in cleansing datasets prior to data mining applications after performing the full code conversion of a density clustering algorithm from MATLAB to C#.
Experience
eCommerce OpenAI GPT3.5 Chatbot
https://mcourt.eu/Design and Development of a Government COVID-19 Data Portal
https://covidnow.moh.gov.my/I designed and built the back end and data pipelines, data storytelling, visualization of key COVID-19 indicators, and modeling of vaccination projections.
The project also spotlighted public-private partnerships in technology toward an all-of-society approach to the Malaysian Government's crisis handling. It boosted the culture shift in open data initiatives and data democratization.
Full Stack, Modeling, and Projection for Malaysia's Vaccination Tracker
https://vax.tehcpeng.net/This was a personal project to demonstrate full-stack software development and data science in the form of modeling and projection of vaccination targets. A Python loader on the back-end pulls, aggregates, and models open vaccination data from the Malaysian Ministry of Health on GitHub.
Semiconductor Defect Detection and Classification
Education
Bachelor's Degree in Computer Science
Swinburne University of Technology - Melbourne, Australia
Certifications
Machine Learning Engineering for Production (MLOps) Specialization
Coursera
Deep Learning Nanodegree
Udacity
Data Science Foundations - Level 1
IBM
Skills
Libraries/APIs
Pandas, NumPy, Scikit-learn, OpenCV, REST APIs, Matplotlib, XGBoost, Keras, TensorFlow, Natural Language Toolkit (NLTK), Vue, PyTorch, React
Tools
Git, ChatGPT, Notion, Gensim, MATLAB, Shell
Languages
Python, C#.NET, Tcl, Bash, JavaScript, R, SQL
Storage
Data Pipelines, PostgreSQL, Redis, Memcached, Amazon S3 (AWS S3)
Paradigms
Data Science, Parallel Programming, Object-oriented Programming (OOP), Microservices
Platforms
Linux, Jupyter Notebook, Linux CentOS 7, Docker, Windows, Amazon Web Services (AWS), Amazon EC2, Azure
Frameworks
Django, Next.js, Flask, Tailwind CSS, LlamaIndex
Other
Tornado, Machine Learning, Data Aggregation, Back-end Development, Computer Vision, Natural Language Processing (NLP), Deep Learning, Data Visualization, Software Development, Image Processing, Artificial Intelligence (AI), Data Engineering, JupyterLab, Random Forests, Chatbots, Full-stack Development, Image Analysis, Minimum Viable Product (MVP), Data Analysis, Image Recognition, Computer Vision Algorithms, HTML Integration, Retrieval-augmented Generation (RAG), Language Models, Predictive Modeling, Prompt Engineering, Software Design, Machine Learning Operations (MLOps), APIs, Optimization, Back-end, Convolutional Neural Networks (CNN), GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), LangChain, OpenAI, Weviate, Large Language Models (LLMs), OpenAI GPT-4 API, Interactive JavaScript, Web Scraping
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring