Alan Zhou, Developer in Singapore, Singapore
Alan is available for hire
Hire Alan

Alan Zhou

Verified Expert  in Engineering

Machine Learning Developer

Location
Singapore, Singapore
Toptal Member Since
July 12, 2017

Alan is an ex-Google Senior Software Engineer with a PhD in Applied Math from MIT and a silver medal in the International Olympiad in Informatics. He learns new skills fast and delivers results promptly and efficiently. Alan has worked on large scale machine learning and data processing projects at Google, and database development efforts at Dgraph. He is also familiar with Node and React.js.

Portfolio

HFT firm
Systems, Kernel Bypass, Architecture
Dgraph
RocksDB, Raft Consensus Algorithm, gRPC, GoLand
Google (Mountain View)
Flume, MapReduce, Scikit-learn, C++, Machine Learning

Experience

Availability

Part-time

Preferred Environment

Subversion (SVN), Git, Cloud9, Sublime Text, Linux

The most amazing...

...project I have worked on involves developing a machine learning pipeline to process hundreds of billions of data points each day.

Work Experience

Software Engineer

2016 - PRESENT
HFT firm
  • Led the entire team in the Singapore office.
  • Took charge of connectivity with Asia markets, a major source of the company's revenue.
Technologies: Systems, Kernel Bypass, Architecture

Software Engineer

2016 - 2017
Dgraph
  • Decreased database data loading time by 60%. Previously, we were iterating over a hash map while trying to mutate it. As a result, we had to use lock-free hash maps, which are inherently slower than normal hash maps. I restructured the code such that there is no longer this contention. The code becomes simpler and faster as we can now use normal hash maps.
  • Implemented many key features such as indexing, filtering, sorting, and pagination.
  • Implemented new development practices that increased usability without compromising efficiency. With the addition of new features, it became clear that flatbuffers, which are immutable, are too painful to work with. I realized the team was not using and benchmarking protocol buffers correctly in the past, and did some analysis to show that protocol buffers, when used correctly, are no slower than flatbuffers. We switched and everyone was so glad that we don't have to deal with flatbuffers anymore.
  • Piloted the Badger project, a Go LSM-tree key-value store. Implemented memtables, compaction, and the framework joining everything together. This project made it to the top ten list on HackerNews.
Technologies: RocksDB, Raft Consensus Algorithm, gRPC, GoLand

Senior Software Engineer

2013 - 2016
Google (Mountain View)
  • Worked on a team that used large scale machine learning to rank and price search ads, which is Google's main source of revenue. The goal was to increase satisfaction for advertisers and users by showing more relevant ads. Data-driven analysis and experiments were key to understanding the impact of our changes.
  • Led a project to do offline feature computation with the help of query clustering. This led to an improvement in the AUC loss (a machine learning metric) by an impressive 26%, and an improvement in ads quality by 8% according to human raters.
  • Rewrote the MapReduce pipeline for archiving advertisement landing page data. The new version achieved a greater than 10x speedup, and led to approximately $150 million increase in revenue per year.
  • Reduced the amount of data being moved in our team's training pipeline, achieving a 2x speedup in the process.
Technologies: Flume, MapReduce, Scikit-learn, C++, Machine Learning

Query-landing page relevance at Google

Our team worked on the problem of whether an ad's landing page is good for the user's query.

Latency is important. We cannot spend too much time computing features for our machine learning system.

One way around this is to precompute feature values. The number of possibilities is too many, so we try to reduce it using various techniques, e.g., clustering. The project involves writing MapReduce pipelines to precompute these features. It also involves a lot of A/B testing and data analysis.

Dgraph

https://github.com/dgraph-io/dgraph
Dgraph is a graph database. It is essentially an inverted index that stores relationships. I implemented key features such as filtering, sorting, and pagination. By replacing lock-free hash maps with regular hash maps and simplifying the code, I reduced data loading time by 60%. I also helped implement the caching layer, essentially a diff over the posting list stored in RocksDB.

My work on the Badger LSM key-value store is likely the fastest Go-based key-value store available. The Dgraph project was able to use Badger instead of RocksDB to avoid cgo issues.

Matrix pencil sparse Fourier transform

This is one of my PhD projects. I have some new ideas recently (2017) and would like to wrap this up soon.

The FFT operation takes O(n log n) time. However, if we know that the signal is sparse in frequency space, (i.e. only a few modes), then we want to be able to perform this FFT in O(s) time, ignoring log factors, where "s" is the sparsity or the number of modes we want to recover.

Languages

Go, C++, Python, PHP, Ruby

Paradigms

MapReduce, Database Design

Storage

RocksDB

Other

Numerical Programming, Statistics, Machine Learning, Cloud9, Raft Consensus Algorithm, Architecture, Kernel Bypass, Systems, GPU Computing, Deep Learning, Reinforcement Learning

Frameworks

gRPC, Ruby on Rails (RoR)

Libraries/APIs

Scikit-learn, React, Node.js

Tools

Sublime Text, Git, Subversion (SVN), Flume, GoLand

Platforms

Linux

2008 - 2013

Ph.D. in Applied Mathematics

Massachusetts Institute of Technology - Cambridge, Massachusetts

2004 - 2007

Bachelor's Degree in Mathematics

UC Berkeley - California

2004 - 2007

Bachelor's Degree in Computer Science

UC Berkeley - California

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring