Alan Zhou
Verified Expert in Engineering
Machine Learning Developer
Alan is an ex-Google Senior Software Engineer with a PhD in Applied Math from MIT and a silver medal in the International Olympiad in Informatics. He learns new skills fast and delivers results promptly and efficiently. Alan has worked on large scale machine learning and data processing projects at Google, and database development efforts at Dgraph. He is also familiar with Node and React.js.
Portfolio
Experience
Availability
Preferred Environment
Subversion (SVN), Git, Cloud9, Sublime Text, Linux
The most amazing...
...project I have worked on involves developing a machine learning pipeline to process hundreds of billions of data points each day.
Work Experience
Software Engineer
HFT firm
- Led the entire team in the Singapore office.
- Took charge of connectivity with Asia markets, a major source of the company's revenue.
Software Engineer
Dgraph
- Decreased database data loading time by 60%. Previously, we were iterating over a hash map while trying to mutate it. As a result, we had to use lock-free hash maps, which are inherently slower than normal hash maps. I restructured the code such that there is no longer this contention. The code becomes simpler and faster as we can now use normal hash maps.
- Implemented many key features such as indexing, filtering, sorting, and pagination.
- Implemented new development practices that increased usability without compromising efficiency. With the addition of new features, it became clear that flatbuffers, which are immutable, are too painful to work with. I realized the team was not using and benchmarking protocol buffers correctly in the past, and did some analysis to show that protocol buffers, when used correctly, are no slower than flatbuffers. We switched and everyone was so glad that we don't have to deal with flatbuffers anymore.
- Piloted the Badger project, a Go LSM-tree key-value store. Implemented memtables, compaction, and the framework joining everything together. This project made it to the top ten list on HackerNews.
Senior Software Engineer
Google (Mountain View)
- Worked on a team that used large scale machine learning to rank and price search ads, which is Google's main source of revenue. The goal was to increase satisfaction for advertisers and users by showing more relevant ads. Data-driven analysis and experiments were key to understanding the impact of our changes.
- Led a project to do offline feature computation with the help of query clustering. This led to an improvement in the AUC loss (a machine learning metric) by an impressive 26%, and an improvement in ads quality by 8% according to human raters.
- Rewrote the MapReduce pipeline for archiving advertisement landing page data. The new version achieved a greater than 10x speedup, and led to approximately $150 million increase in revenue per year.
- Reduced the amount of data being moved in our team's training pipeline, achieving a 2x speedup in the process.
Experience
Query-landing page relevance at Google
Latency is important. We cannot spend too much time computing features for our machine learning system.
One way around this is to precompute feature values. The number of possibilities is too many, so we try to reduce it using various techniques, e.g., clustering. The project involves writing MapReduce pipelines to precompute these features. It also involves a lot of A/B testing and data analysis.
Dgraph
https://github.com/dgraph-io/dgraphMy work on the Badger LSM key-value store is likely the fastest Go-based key-value store available. The Dgraph project was able to use Badger instead of RocksDB to avoid cgo issues.
Matrix pencil sparse Fourier transform
The FFT operation takes O(n log n) time. However, if we know that the signal is sparse in frequency space, (i.e. only a few modes), then we want to be able to perform this FFT in O(s) time, ignoring log factors, where "s" is the sparsity or the number of modes we want to recover.
Skills
Languages
Go, C++, Python, PHP, Ruby
Paradigms
MapReduce, Database Design
Storage
RocksDB
Other
Numerical Programming, Statistics, Machine Learning, Cloud9, Raft Consensus Algorithm, Architecture, Kernel Bypass, Systems, GPU Computing, Deep Learning, Reinforcement Learning
Frameworks
gRPC, Ruby on Rails (RoR)
Libraries/APIs
Scikit-learn, React, Node.js
Tools
Sublime Text, Git, Subversion (SVN), Flume, GoLand
Platforms
Linux
Education
Ph.D. in Applied Mathematics
Massachusetts Institute of Technology - Cambridge, Massachusetts
Bachelor's Degree in Mathematics
UC Berkeley - California
Bachelor's Degree in Computer Science
UC Berkeley - California
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring