Michael Henning, Performance Tuning Developer in New York, NY, United States
Michael Henning

Performance Tuning Developer in New York, NY, United States

Member since June 28, 2021
Michael's professional experience includes deep learning and computer vision at Facebook and a venture-funded startup. He enjoys freelancing because of the flexibility and variety it provides. Michael is looking forward to working with clients who can use his specialized skills in deep learning and performant systems, as well as clients who need more general back-end development expertise.
Michael is now available for hire

Portfolio

  • Facebook
    C++, Python, PyTorch
  • GrokStyle
    C++, Python, PyTorch, Django, Amazon Web Services (AWS), Google Cloud
  • Google
    C++, Image Processing

Experience

  • C++ 3 years
  • Performance Tuning 3 years
  • Git 3 years
  • Python 3 years
  • Convolutional Neural Networks 2 years
  • PyTorch 2 years
  • C 2 years
  • Image Processing 1 year

Location

New York, NY, United States

Availability

Part-time

Preferred Environment

Linux, Visual Studio Code (VS Code), Git

The most amazing...

...improvement I've made to a production system cut search cost and latency by a factor of four at a startup where the search was our core focus.

Employment

  • Computer Vision Software Engineer

    2019 - 2021
    Facebook
    • Debugged and configured a production image inference system in C++ that processes thousands of queries per second. Solved a problem that negatively impacted accuracy for all image models and proactively caught a subtle use-after-free bug.
    • Contributed research directions and code to GrokNet, a model for fine-grained image recognition across several business verticals, including developing a novel way to aggregate contrastive loss pairs across GPUs.
    • Co-developed SimSearchNet++, a model for near-duplicate image detection.
    Technologies: C++, Python, PyTorch
  • Software Engineer

    2018 - 2019
    GrokStyle
    • Improved throughput of image loading by over eight times to support multi-GPU training.
    • Led migration of the core training and inference stack from Caffe to PyTorch.
    • Cut search cost and latency by four times by finding and implementing optimization opportunities in a core search routine, including replacing a previous BLAS-accelerated implementation with a custom C++ implementation.
    Technologies: C++, Python, PyTorch, Django, Amazon Web Services (AWS), Google Cloud
  • Software Engineering Intern

    2017 - 2017
    Google
    • Ported an image super-resolution algorithm to run on custom coprocessors.
    • Leveraged symmetries to fit large lookup tables into 16KB of on-chip memory.
    • Wrote a fixed-point square root approximation which achieved a two-time speedup.
    Technologies: C++, Image Processing

Experience

  • GrokNet

    I was integral to the development of a model for fine-grained image recognition across several business verticals.

    My contributions included writing a training system in Python that could effectively load images fast enough to maintain high GPU utilization during distributed model training runs while maintaining high developer flexibility, developing a novel way to aggregate contrastive loss pairs across GPUs that improved model accuracy, and catching several discrepancies between test systems and production which impacted the final accuracy of the model.

  • SimSearchNet++
    https://ai.facebook.com/blog/heres-how-were-using-ai-to-help-detect-misinformation

    I contributed accuracy and speed improvements to a neural network used for duplicate image detection, including comparisons between a variety of different architectures and quantization into int8. I was also involved in the deployment of this model to production traffic, where I proactively caught a subtle issue that caused early load testing results to underestimate resource usage by order of magnitude and also managed to track down a subtle bug related to receiving different results for a fraction of a percent of production traffic which ended up being related to differences in CPU microarchitectures in our server cluster.

Skills

  • Languages

    C++, C, Python, Rust, Python 3, Java, Go, OCaml
  • Tools

    Git
  • Other

    Memory Management, Performance Tuning, Deep Learning, Convolutional Neural Networks, Image Processing
  • Libraries/APIs

    PyTorch, OpenGL
  • Frameworks

    Django, Presto DB
  • Paradigms

    Functional Programming
  • Platforms

    Linux, Visual Studio Code (VS Code), Amazon Web Services (AWS)
  • Storage

    Google Cloud, PostgreSQL

Education

  • Bachelor's Degree in Computer Science
    2014 - 2018
    Cornell University - Ithaca, NY, USA

To view more profiles

Join Toptal
Share it with others