Yi is available for hire

Yi Sheng Chan

Verified Expert in Engineering

Data Science and Machine Learning Developer

Location

London, United Kingdom

Toptal Member Since

September 22, 2020

Yi is currently working at Apple as a software engineer, building a platform and framework for training machine learning models on hundreds of millions of Apple devices in a privacy-preserving way. He has designed and built scalable ML systems and data infrastructure in cloud environments since 2014, and his expertise spans DevOps, ML, data engineering, both batch and streaming, and back-end web services. Yi's strongest skill is Python, Java, Spark, and SQL, coupled with good ML knowledge.

Machine Learning Big Data Data Engineering Distributed Systems Python Python 3 SQL ETL Amazon Web Services (AWS)Apache Spark Relational Databases Redshift Spark Data Pipelines Apache Kafka

Portfolio

Apple

Python, TensorFlow, PyTorch, Kubernetes, Java, Machine Learning, Deep Learning...

WorldRemit

ETL, Data Pipelines, APIs, Data Engineering, Data Science, Big Data...

Dressipi

ETL, Data Science, Apache Spark, Spark, Redshift, Machine Learning...

Experience

Data Engineering - 5 years Machine Learning - 5 years Python - 5 years SQL - 4 years Amazon Web Services (AWS) - 3 years Apache Spark - 3 years Apache Airflow - 2 years Stream Processing - 2 years

Availability

Part-time

Preferred Environment

Git, DataGrip, IntelliJ, PyCharm, Slack, Linux

The most amazing...

...project I've led, designed, and implemented was an end-to-end ML system that runs on production for a fintech company valued at a few billion dollars.

Work Experience

Senior Software Engineer

2021 - PRESENT

Apple

Designed and maintained a critical client Python library for training ML models on a massive scale.
Built secure data aggregation platform for massive-scale data aggregation.
Migrated critical web services for federated learning to run on Docker and Kubernetes.

Technologies: Python, TensorFlow, PyTorch, Kubernetes, Java, Machine Learning, Deep Learning, Docker, Amazon Web Services (AWS), Python 3

Senior Data Engineer

2018 - 2020

WorldRemit

Built a scalable data infrastructure fully on AWS, including data pipelines, a data warehouse, a data lake, a supporting spiky usage pattern, monitoring and alerting, and data processing initiatives across batch and streaming datasets.
Led, designed, and implemented an end-to-end machine learning system for internal use to optimize marketing efforts.
Reduced the training time required for a machine learning model by 95%, from 20 hours to one.
Created an exactly-once stream processing pipeline, enabling self-service push notifications for user-defined queries.

Technologies: ETL, Data Pipelines, APIs, Data Engineering, Data Science, Big Data, Amazon API Gateway, Amazon Athena, Amazon Elastic MapReduce (EMR), Spark, Redshift, NoSQL, GraphDB, Docker, Stream Processing, Apache Spark, Apache Airflow, Distributed Systems, Machine Learning, SQL, Amazon Web Services (AWS), Python, Python 3, PostgreSQL

Machine Learning Engineer

2017 - 2018

Dressipi

Optimized performance of a machine learning model training and evaluation process, reducing training time by 50%.
Improved the CTR on a recommendation system by 20% by implementing production-level code.
Provided architectural decision support by building proofs-of-concept and prototypes.

Technologies: ETL, Data Science, Apache Spark, Spark, Redshift, Machine Learning, Recommendation Systems, SQL, Amazon Web Services (AWS), Ruby, Python, Python 3, PostgreSQL

Data Engineer

2016 - 2016

Student.com

Designed and implemented a production-level stream processing pipeline in Scala, Akka, and Spark Streaming.
Implemented a real-time dashboard using Spark Streaming, Kafka, and server-sent events.
Conducted ad hoc data analysis, defined metrics, and produced data visualizations on a monitoring dashboard.

Technologies: ETL, Data Pipelines, APIs, Data Engineering, Spark, Stream Processing, Relational Databases, Docker, Scala, Apache Kafka, SQL, Apache Spark, PostgreSQL

Data Science Software Engineer

2014 - 2016

Etu Corporation

Designed and implemented Lambda architecture for a machine learning system, reducing refresh time from three hours to three minutes.
Initiated, researched, and built a data processing pipeline and NLP-based machine learning models to enhance the recommender system. This improved the CTR by 50%.
Improved the CTR by 30% by designing and implementing a new architecture for ensemble machine learning models.
Implemented and optimized a large-scale, production-level data pipeline with Spark.

Technologies: Data Engineering, Data Science, Spark, Big Data, Stream Processing, SQL, Relational Databases, Hadoop, Scala, Java, Apache Spark, Apache Hive, Machine Learning, Recommendation Systems, Python, Python 3

Experience

Churn Prediction System

An automated, scalable, machine learning system that processes hundreds of GBs of raw behavioral data and predicts the probabilities of user churning. The system is written in Python and runs fully automated daily batch jobs on AWS. It includes security compliance, networking, data processing, model training, and model serving by Web API. Each working component (i.e., data store, web service, data pipeline, data quality, and model predictions) is monitored using various metrics and linked to PagerDuty in order to meet the service level agreement for production.

Fraud Detection System

A highly complex and mission-critical system for fighting fraud in a unicorn fintech company. The system sources data from various data providers via APIs; stores the data in GraphDB, Apache Kafka, and a relational database in real time; and marks a transaction as fraud or non-fraud within one second, which is a service level agreement.

Lambda Architecture on a Recommendation System

Lambda architecture for a recommendation system as a service for a SaaS company. The system includes a batch layer for hourly aggregation of data, generation of a list of recommendations for each user, and a speedy layer for real-time data consumption and generation of recommendations for each user or session. The system is written in Python, Scala, Spark, and Spark Streaming with HDFS, and it uses HBase and Apache Kafka for storing the data and model output.

Skills

Languages

Python, SQL, Python 3, Java, Scala, Ruby

Frameworks

Apache Spark, Spark, Hadoop

Paradigms

Data Science, ETL

Platforms

Amazon Web Services (AWS), Apache Kafka, Docker, Linux, Kubernetes

Storage

Relational Databases, Redshift, Data Pipelines, NoSQL, PostgreSQL, Apache Hive

Other

Stream Processing, Machine Learning, Distributed Systems, Big Data, Data Engineering, Recommendation Systems, GraphDB, Amazon API Gateway, APIs, Deep Learning

Tools

Apache Airflow, Git, Amazon Elastic MapReduce (EMR), Amazon Athena

Libraries/APIs

TensorFlow, PyTorch

Education

2010 - 2014

Master of Science Degree in Finance

National Taiwan University - Taipei, Taiwan

2006 - 2009

Bachelor's Degree in International Business

National Cheng Chi University - Taipei, Taiwan

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring