Sheng Han Lim
Verified Expert in Engineering
Artificial Intelligence Developer
George Town Penang, Malaysia
Toptal member since July 29, 2022
Sheng Han is a senior machine learning (ML) engineer with 10 years of experience in the research, development, and application of various ML, AI, and generative AI solutions. He has turned big data into valuable actions and has a demonstrated history of driving business efficiencies and cost reductions. Sheng is proficient in generative AI, prompt engineering, computer vision, natural language processing (NLP), anomaly detection, and prediction tasks.
Portfolio
Experience
- Artificial Intelligence (AI) - 10 years
- Data Science - 10 years
- Machine Learning - 9 years
- OpenAI - 2 years
- Prompt Engineering - 2 years
- Large Language Models (LLMs) - 2 years
- Retrieval-augmented Generation (RAG) - 1 year
- LangChain - 1 year
Availability
Preferred Environment
Scikit-learn, Django, Python, LangChain, Amazon Web Services (AWS), Google Cloud Platform (GCP), OpenAI
The most amazing...
...project I've completed as a data scientist, modeler, and back-end developer is COVIDNOW, Malaysia's official government portal for COVID-19 insights.
Work Experience
AI Consultant
Malaysian Government Consultant
- Developed an OpenAI GPT-4-powered AI chatbot to provide analytical responses to natural language queries on top of open datasets for the national open data portal. Used LangChain, LangGraph, OpenAI embeddings, pgvector, and a LlamaIndex pipeline.
- Built an AI assistant for public API documentation so developers could easily set up and use the public API. Used OpenAI API and embeddings stored in pgvector for document retrieval. Developed prompts with advanced prompt engineering techniques.
- Contributed to front-end development of a census dashboard built in Next.js and TypeScript.
Senior AI/ML Engineer
Intel
- Developed an NLP-powered engine to classify root causes on millions of machine text log messages to improve tool utilization and save labor hours at Intel fabs. Built a Python text data processing pipeline to ingest data from an Elasticsearch API.
- Built a CV anomaly detection solution to improve tool availability and quality issues related to maintenance at Intel fabs with object detection and deep learning models. Led improvements in image-capturing hardware and image-processing pipelines.
- Led team efforts in defining and enabling an ML operations framework and platform for faster and highly scalable deployments and proliferation of AI/ML solutions across Intel products and factories worldwide.
AI Engineer (via Toptal)
Asym Labs
- Developed and engineered prompts for the modification and synthesis of clauses in legal documents based on user questionnaires and inputs.
- Built a framework and engineered prompts for a legal assistant bot that extracts relevant legal entities and information from user correspondence and documents and suggests relevant legal contracts.
- Engineered prompts for a legal assistant bot to gather information, perform tasks, and interact with a legal platform's API.
Senior Software Engineer
Invantest
- Oversaw efforts in upholding and upgrading technical competency and standardization of the software development quality and data science areas across the company.
- Architected and led the development of the AI Vision platform, providing an end-to-end machine learning pipeline for image-based projects with image annotation, image segmentation, classification model training, model performance tuning, and export.
- Contributed as the key developer in designing and building an automated defect classification platform using Mask R-CNN deep learning to detect and classify semiconductor fabrication and assembly defects.
Advanced Analytics Software Engineer
Intel
- Co-developed a rapid big data visual analytics system using Python and HDF5 technologies to perform parallel analytics and visualization on large manufacturing data in seconds and, ultimately, to turn information into cost-saving opportunities.
- Co-created a novel method for learning and predicting highly imbalanced datasets to enable smoother factory operations. The methodology has been awarded an Intel Distinguished Invention.
- Achieved a 13-fold improvement in execution time in cleansing datasets prior to data mining applications after performing the full code conversion of a density clustering algorithm from MATLAB to C#.
Experience
eCommerce OpenAI GPT3.5 Chatbot
https://mcourt.eu/Design and Development of a Government COVID-19 Data Portal
https://covidnow.moh.gov.my/I designed and built the back end and data pipelines, data storytelling, visualization of key COVID-19 indicators, and modeling of vaccination projections.
The project also spotlighted public-private partnerships in technology toward an all-of-society approach to the Malaysian Government's crisis handling. It boosted the culture shift in open data initiatives and data democratization.
Full Stack, Modeling, and Projection for Malaysia's Vaccination Tracker
https://vax.tehcpeng.net/This was a personal project to demonstrate full-stack software development and data science in the form of modeling and projection of vaccination targets. A Python loader on the back-end pulls, aggregates, and models open vaccination data from the Malaysian Ministry of Health on GitHub.
Semiconductor Defect Detection and Classification
Education
Bachelor's Degree in Computer Science
Swinburne University of Technology - Melbourne, Australia
Certifications
Google Cloud Certified Professional Machine Learning Engineer Certification
Google Cloud
Machine Learning Engineering for Production (MLOps) Specialization
Coursera
Deep Learning Nanodegree
Udacity
Data Science Foundations - Level 1
IBM
Skills
Libraries/APIs
Pandas, NumPy, Scikit-learn, OpenCV, REST APIs, Matplotlib, XGBoost, Google Cloud API, OpenAI API, Keras, TensorFlow, Natural Language Toolkit (NLTK), Vue, PyTorch, React, Node.js
Tools
Git, ChatGPT, Notion, Gensim, MATLAB, Shell, BigQuery, AI Prompts, Make
Languages
Python, C#.NET, Tcl, Bash, JavaScript, R, SQL, Java
Storage
Data Pipelines, PostgreSQL, Redis, Memcached, Amazon S3 (AWS S3), Google Cloud Storage, Google Cloud
Frameworks
Django, Next.js, Flask, Tailwind CSS, LlamaIndex
Paradigms
Parallel Programming, Object-oriented Programming (OOP), Microservices
Platforms
Linux, Jupyter Notebook, Google Cloud Platform (GCP), Vertex AI, Cloud Run, Linux CentOS 7, Docker, Windows, Amazon Web Services (AWS), Amazon EC2, Azure
Other
Tornado, Data Science, Machine Learning, Natural Language Processing (NLP), Artificial Intelligence (AI), Data Aggregation, Large Language Models (LLMs), Back-end Development, Retrieval-augmented Generation (RAG), API Integration, Computer Vision, Deep Learning, Data Visualization, Software Development, Image Processing, Data Engineering, JupyterLab, Random Forests, Chatbots, Full-stack Development, Image Analysis, Minimum Viable Product (MVP), Data Analysis, Image Recognition, Computer Vision Algorithms, HTML Integration, Language Models, Predictive Modeling, Prompt Engineering, Gemini API, Gemini, Google Cloud ML, Generative Artificial Intelligence (GenAI), AI Chatbots, Sentiment Analysis, Vector Databases, Pgvector, Clustering Algorithms, DBSCAN, K-means Clustering, Clustering, Algorithms, Document Processing, Optical Character Recognition (OCR), Team Leadership, Software Design, Machine Learning Operations (MLOps), APIs, Optimization, Back-end, Convolutional Neural Networks (CNNs), Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), LangChain, OpenAI, Weaviate, OpenAI GPT-4 API, Interactive JavaScript, Web Scraping, Google BigQuery, Google Cloud Build, Legal Technology (Legaltech)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring