Alan Zhou, Machine Learning Developer in Singapore, Singapore
Alan Zhou

Machine Learning Developer in Singapore, Singapore

Member since June 13, 2017
Alan is an ex-Google Senior Software Engineer with a PhD in Applied Math from MIT and a silver medal in the International Olympiad in Informatics. He learns new skills fast and delivers results promptly and efficiently. Alan has worked on large scale machine learning and data processing projects at Google, and database development efforts at Dgraph. He is also familiar with Node and React.js.
Alan is now available for hire

Portfolio

Experience

Location

Singapore, Singapore

Availability

Part-time

Preferred Environment

Subversion (SVN), Git, Cloud9, Sublime Text, Linux

The most amazing...

...project I have worked on involves developing a machine learning pipeline to process hundreds of billions of data points each day.

Employment

  • Software Engineer

    2016 - PRESENT
    HFT firm
    • Led the entire team in the Singapore office.
    • Took charge of connectivity with Asia markets, a major source of the company's revenue.
    Technologies: Systems, Kernel Bypass, Architecture
  • Software Engineer

    2016 - 2017
    Dgraph
    • Decreased database data loading time by 60%. Previously, we were iterating over a hash map while trying to mutate it. As a result, we had to use lock-free hash maps, which are inherently slower than normal hash maps. I restructured the code such that there is no longer this contention. The code becomes simpler and faster as we can now use normal hash maps.
    • Implemented many key features such as indexing, filtering, sorting, and pagination.
    • Implemented new development practices that increased usability without compromising efficiency. With the addition of new features, it became clear that flatbuffers, which are immutable, are too painful to work with. I realized the team was not using and benchmarking protocol buffers correctly in the past, and did some analysis to show that protocol buffers, when used correctly, are no slower than flatbuffers. We switched and everyone was so glad that we don't have to deal with flatbuffers anymore.
    • Piloted the Badger project, a Go LSM-tree key-value store. Implemented memtables, compaction, and the framework joining everything together. This project made it to the top ten list on HackerNews.
    Technologies: RocksDB, Raft Consensus Algorithm, gRPC, GoLand
  • Senior Software Engineer

    2013 - 2016
    Google (Mountain View)
    • Worked on a team that used large scale machine learning to rank and price search ads, which is Google's main source of revenue. The goal was to increase satisfaction for advertisers and users by showing more relevant ads. Data-driven analysis and experiments were key to understanding the impact of our changes.
    • Led a project to do offline feature computation with the help of query clustering. This led to an improvement in the AUC loss (a machine learning metric) by an impressive 26%, and an improvement in ads quality by 8% according to human raters.
    • Rewrote the MapReduce pipeline for archiving advertisement landing page data. The new version achieved a greater than 10x speedup, and led to approximately $150 million increase in revenue per year.
    • Reduced the amount of data being moved in our team's training pipeline, achieving a 2x speedup in the process.
    Technologies: Flume, MapReduce, Scikit-learn, C++, Machine Learning

Experience

  • Query-landing page relevance at Google

    Our team worked on the problem of whether an ad's landing page is good for the user's query.

    Latency is important. We cannot spend too much time computing features for our machine learning system.

    One way around this is to precompute feature values. The number of possibilities is too many, so we try to reduce it using various techniques, e.g., clustering. The project involves writing MapReduce pipelines to precompute these features. It also involves a lot of A/B testing and data analysis.

  • Dgraph
    https://github.com/dgraph-io/dgraph

    Dgraph is a graph database. It is essentially an inverted index that stores relationships. I implemented key features such as filtering, sorting, and pagination. By replacing lock-free hash maps with regular hash maps and simplifying the code, I reduced data loading time by 60%. I also helped implement the caching layer, essentially a diff over the posting list stored in RocksDB.

    My work on the Badger LSM key-value store is likely the fastest Go-based key-value store available. The Dgraph project was able to use Badger instead of RocksDB to avoid cgo issues.

  • Matrix pencil sparse Fourier transform

    This is one of my PhD projects. I have some new ideas recently (2017) and would like to wrap this up soon.

    The FFT operation takes O(n log n) time. However, if we know that the signal is sparse in frequency space, (i.e. only a few modes), then we want to be able to perform this FFT in O(s) time, ignoring log factors, where "s" is the sparsity or the number of modes we want to recover.

Skills

  • Languages

    Go, C++, Python, PHP, Ruby
  • Paradigms

    MapReduce, Database Design
  • Storage

    RocksDB
  • Other

    Numerical Programming, Statistics, Machine Learning, Cloud9, Raft Consensus Algorithm, Architecture, Kernel Bypass, Systems, GPU Computing, Deep Learning, Reinforcement Learning
  • Frameworks

    gRPC, Ruby on Rails (RoR)
  • Libraries/APIs

    Scikit-learn, React, Node.js
  • Tools

    Sublime Text, Git, Subversion (SVN), Flume, GoLand
  • Platforms

    Linux

Education

  • Ph.D. in Applied Mathematics
    2008 - 2013
    Massachusetts Institute of Technology - Cambridge, Massachusetts
  • Bachelor's degree in Mathematics
    2004 - 2007
    UC Berkeley - California
  • Bachelor's degree in Computer Science
    2004 - 2007
    UC Berkeley - California

To view more profiles

Join Toptal
Share it with others