Yi Sheng Chan, Data Science and Machine Learning Developer in London, United Kingdom
Yi Sheng Chan

Data Science and Machine Learning Developer in London, United Kingdom

Member since September 22, 2020
Yi is currently working at Apple as a software engineer, building a platform and framework for training machine learning models on hundreds of millions of Apple devices in a privacy-preserving way. He has designed and built scalable ML systems and data infrastructure in cloud environments since 2014, and his expertise spans DevOps, ML, data engineering, both batch and streaming, and back-end web services. Yi's strongest skill is Python, Java, Spark, and SQL, coupled with good ML knowledge.
Yi is now available for hire


  • Apple
    Python, TensorFlow, PyTorch, Kubernetes, Java, Machine Learning...
  • WorldRemit
    ETL, Data Pipelines, APIs, Data Engineering, Data Science, Big Data...
  • Dressipi
    ETL, Data Science, Apache Spark, Spark, Redshift, Machine Learning...



London, United Kingdom



Preferred Environment

Git, DataGrip, IntelliJ, PyCharm, Slack, Linux

The most amazing...

...project I've led, designed, and implemented was an end-to-end ML system that runs on production for a fintech company valued at a few billion dollars.


  • Senior Software Engineer

    2021 - PRESENT
    • Designed and maintained a critical client Python library for training ML models on a massive scale.
    • Built secure data aggregation platform for massive-scale data aggregation.
    • Migrated critical web services for federated learning to run on Docker and Kubernetes.
    Technologies: Python, TensorFlow, PyTorch, Kubernetes, Java, Machine Learning, Deep Learning, Docker, Amazon Web Services (AWS), Python 3
  • Senior Data Engineer

    2018 - 2020
    • Built a scalable data infrastructure fully on AWS, including data pipelines, a data warehouse, a data lake, a supporting spiky usage pattern, monitoring and alerting, and data processing initiatives across batch and streaming datasets.
    • Led, designed, and implemented an end-to-end machine learning system for internal use to optimize marketing efforts.
    • Reduced the training time required for a machine learning model by 95%, from 20 hours to one.
    • Created an exactly-once stream processing pipeline, enabling self-service push notifications for user-defined queries.
    Technologies: ETL, Data Pipelines, APIs, Data Engineering, Data Science, Big Data, Amazon API Gateway, Amazon Athena, Amazon Elastic MapReduce (EMR), Spark, Redshift, NoSQL, GraphDB, Docker, Stream Processing, Apache Spark, Apache Airflow, Distributed Systems, Machine Learning, SQL, Amazon Web Services (AWS), Python, Python 3, PostgreSQL
  • Machine Learning Engineer

    2017 - 2018
    • Optimized performance of a machine learning model training and evaluation process, reducing training time by 50%.
    • Improved the CTR on a recommendation system by 20% by implementing production-level code.
    • Provided architectural decision support by building proofs-of-concept and prototypes.
    Technologies: ETL, Data Science, Apache Spark, Spark, Redshift, Machine Learning, Recommendation Systems, SQL, Amazon Web Services (AWS), Ruby, Python, Python 3, PostgreSQL
  • Data Engineer

    2016 - 2016
    • Designed and implemented a production-level stream processing pipeline in Scala, Akka, and Spark Streaming.
    • Implemented a real-time dashboard using Spark Streaming, Kafka, and server-sent events.
    • Conducted ad hoc data analysis, defined metrics, and produced data visualizations on a monitoring dashboard.
    Technologies: ETL, Data Pipelines, APIs, Data Engineering, Spark, Stream Processing, Relational Databases, Docker, Scala, Apache Kafka, SQL, Apache Spark, PostgreSQL
  • Data Science Software Engineer

    2014 - 2016
    Etu Corporation
    • Designed and implemented Lambda architecture for a machine learning system, reducing refresh time from three hours to three minutes.
    • Initiated, researched, and built a data processing pipeline and NLP-based machine learning models to enhance the recommender system. This improved the CTR by 50%.
    • Improved the CTR by 30% by designing and implementing a new architecture for ensemble machine learning models.
    • Implemented and optimized a large-scale, production-level data pipeline with Spark.
    Technologies: Data Engineering, Data Science, Spark, Big Data, Stream Processing, SQL, Relational Databases, Hadoop, Scala, Java, Apache Spark, Apache Hive, HBase, Machine Learning, Recommendation Systems, Python, Python 3


  • Churn Prediction System

    An automated, scalable, machine learning system that processes hundreds of GBs of raw behavioral data and predicts the probabilities of user churning. The system is written in Python and runs fully automated daily batch jobs on AWS. It includes security compliance, networking, data processing, model training, and model serving by Web API. Each working component (i.e., data store, web service, data pipeline, data quality, and model predictions) is monitored using various metrics and linked to PagerDuty in order to meet the service level agreement for production.

  • Fraud Detection System

    A highly complex and mission-critical system for fighting fraud in a unicorn fintech company. The system sources data from various data providers via APIs; stores the data in GraphDB, Apache Kafka, and a relational database in real time; and marks a transaction as fraud or non-fraud within one second, which is a service level agreement.

  • Lambda Architecture on a Recommendation System

    Lambda architecture for a recommendation system as a service for a SaaS company. The system includes a batch layer for hourly aggregation of data, generation of a list of recommendations for each user, and a speedy layer for real-time data consumption and generation of recommendations for each user or session. The system is written in Python, Scala, Spark, and Spark Streaming with HDFS, and it uses HBase and Apache Kafka for storing the data and model output.


  • Languages

    Python, SQL, Python 3, Java, Scala, Ruby
  • Frameworks

    Apache Spark, Spark, Hadoop
  • Paradigms

    Data Science, ETL
  • Platforms

    Amazon Web Services (AWS), Apache Kafka, Docker, Linux, Kubernetes
  • Storage

    Relational Databases, Redshift, Data Pipelines, NoSQL, PostgreSQL, Apache Hive
  • Other

    Stream Processing, Machine Learning, Distributed Systems, Big Data, Data Engineering, Recommendation Systems, GraphDB, Amazon API Gateway, APIs, AWS, Deep Learning
  • Tools

    Apache Airflow, Git, Amazon Elastic MapReduce (EMR), Amazon Athena
  • Libraries/APIs

    TensorFlow, PyTorch


  • Master of Science Degree in Finance
    2010 - 2014
    National Taiwan University - Taipei, Taiwan
  • Bachelor's Degree in International Business
    2006 - 2009
    National Cheng Chi University - Taipei, Taiwan

To view more profiles

Join Toptal
Share it with others