Shih-hsuan Lee
Verified Expert in Engineering
Machine Learning Developer
Shih-Hsuan is an entrepreneur, data scientist, and top competitor in machine learning competitions. He specializes in analyzing data pipelines and modeling business problems to deliver data projects with business impact. He built a real-time analytics system that monitored national product roll-out and provided decision support. Shih-Hsuan excels at sales forecasting, niche image classification, short text classification, and conditional text generation, along with AI, ML, and statistics.
Portfolio
Experience
Availability
Preferred Environment
Julia, R, Python, TensorFlow, PyTorch, Linux
The most amazing...
...team I led developed a data analytics pipeline for monitoring a roll-out of a product across China in two weeks.
Work Experience
Data Scientist and Founder
Veritable Technology, Co.
- Won seventh place in the third YouTube Video Understanding Challenge and published a paper in its ICCV 2019 workshop.
- Assisted clients that required expertise in data science, machine learning, and artificial intelligence.
- Created open source research projects, indie data products, and public technical notes and tutorials to help democratize AI.
Chief Data Scientist
Baiwang
- Built data pipelines to merge data from different sources in the company to a data warehouse.
- Developed an automatic NLP merchandise classification system, including setting up an annotation procedure, data quality control, and experiment processes.
- Built a real-time analytics system that monitored national product roll-out and provided decision support.
Senior Data Scientist
Yongdata
- Developed a customer churn prediction system for a mobile phone company.
- Developed a monitoring and forecast system of sales and inventory for a smart vending machine company.
- Implemented anomaly detection algorithms in the company's analytics SaaS product.
Software Engineer
Soshio
- Maintained the back end of the company's NLP public opinion analysis product.
- Developed data visualization in the dashboard facing customers.
- Maintained the scrapping system and merged it with the firehoses from commercial data providers.
Experience
Seventh Place Solution to The Third YouTube-8M Video Understanding Challenge
https://github.com/ceshine/yt8m-2019Solution: To deal with the limited number of annotated segments, video-level models were pre-trained on the YouTube-8M frame-level features dataset to create meaningful video representations from frames. The weights of the two models were used to build two types of segment classifiers: context-aware and context-agnostic.
Paraphrasing English Sentences
https://github.com/ceshine/finetuning-t5Self-Supervised Domain Adaptation
https://blog.ceshine.net/post/byol-domain-adaptation/My preliminary experiments show visible improvements from the self-supervised domain adaptation approach using images from the downstream task. With longer pre-training and bigger unlabelled datasets, we can probably get further improvements.
Forecasting Challenges
https://github.com/ceshine/favorita_sales_forecasting1. Corporación Favorita Grocery Sales Forecasting: predicting sales for a large grocery chain—placed 20th out of 1,671 teams
2. Recruit Restaurant Visitor Forecasting: predicting how many future visitors a restaurant will receive—placed 21st out of 1,248 teams
3. Web Traffic Time Series Forecasting: forecasting future traffic to Wikipedia pages—placed 43rd out of 1,095 teams
Skills
Languages
Python, SQL, R, Julia, Scala, JavaScript
Libraries/APIs
PyTorch, Pandas, TensorFlow, XGBoost
Paradigms
Data Science, Data-driven Testing
Other
Statistical Modeling, Machine Learning, Deep Learning, Image Classification, Data Analytics, Time Series, Forecasting, Statistical Data Analysis, Data Modeling, Data Analysis, Bayesian Inference & Modeling, Statistics, Time Series Analysis, Gradient Boosting, Data Visualization, Natural Language Processing (NLP), Recommendation Systems, Big Data, Image Recognition, GPT, Generative Pre-trained Transformers (GPT), Experimental Design, Stochastic Modeling, Risk Models, Genetic Algorithms, Computer Vision
Frameworks
LightGBM
Platforms
Google Cloud Platform (GCP), Linux, Docker, Amazon Web Services (AWS)
Storage
Data Pipelines, PostgreSQL
Tools
Apache Airflow
Education
Master's Degree in Applied Statistics
National Australian University - Canberra, Australia
Bachelor of Science Degree in Computer Science and Information Engineering
National Taiwan University - Taipei, Taiwan
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring