Steve Thomas, Software Developer in San Francisco, CA, United States
Steve Thomas

Software Developer in San Francisco, CA, United States

Member since July 9, 2020
Steve is a software engineer passionate about machine learning and automating data-intensive workflows. His goal is to free scientists and BI analysts from creating and maintaining complex cloud environments so they can focus on what they do best. He has experience collaborating with the interdisciplinary perception teams at some of the top autonomous vehicle companies and building tools to automate the model training lifecycle.
Steve is now available for hire

Portfolio

  • Engine ML
    Amazon Web Services (AWS), InfluxDB, Elasticsearch, PyTorch, TensorFlow...
  • Self-employed
    Amazon Web Services (AWS), AWS, Docker, PyTorch, TensorFlow, C++, Python

Experience

Location

San Francisco, CA, United States

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Linux, AWS, Kubernetes, Docker, IntelliJ, PyCharm

The most amazing...

...thing I've developed was my company's entire machine learning experiment tracking product offering so users could collaborate and visually compare their runs.

Employment

  • Software Engineer/Resident Deep Learning Expert

    2018 - 2020
    Engine ML
    • Designed and programmed our local, freemium offering that allowed users to run deep learning experiments on their own hardware, persist all relevant logs and metrics, and compare the results in the engine dashboard alongside their cloud jobs.
    • Led research showing how layer-wise optimizers (e.g. LAMB) can train object detectors with large batch sizes in a fraction of the time without performance degradation. Results can be found on our company blog at https://bit.ly/35gfM0P.
    • Designed and programmed an alerting service that notified users when their experiments entered a terminal state or when their experiments potentially entered a race condition by analyzing the experiment's log output and GPU utilization.
    • Designed and programmed a feature to pre-fetch training data from S3 buckets, storing it in an in-memory read-through cache using Alluxio and Alluxio’s FUSE-based POSIX API, resulting in up to a 5x speedup when reading a remote file.
    • Built a cat detector that was trained live in five minutes on 64 GPUs at VentureBeat Transform 2019 using a TensorFlow implementation of RetinaNet. The demonstration by our CEO can be found at https://bit.ly/2YdMbnr.
    Technologies: Amazon Web Services (AWS), InfluxDB, Elasticsearch, PyTorch, TensorFlow, Gradle, Kotlin, Python, AWS, Kubernetes, Docker
  • Independent Machine Learning Researcher

    2016 - 2018
    Self-employed
    • Achieved the status of “Top Contender” in The Lyft Perception Challenge 2018, a semantic segmentation competition, using a tweaked version of Google’s DeepLabV3 with ResNet-152 as the backbone.
    • Designed and integrated perception, behavior planning, trajectory generation, and controller modules so Udacity’s driverless car could safely navigate a road with traffic lights (https://github.com/sathomas2/CarND-Capstone-Solution).
    • Taught myself the major developments in deep learning and the mathematical theory behind them by reading and replicating papers.
    Technologies: Amazon Web Services (AWS), AWS, Docker, PyTorch, TensorFlow, C++, Python

Experience

  • Training Mask-RCNN 10x Faster with LAMB

    I ran experiments proving that large-batch training does not negatively impact the performance of object detection neural networks. I demonstrated that by choosing the correct optimizer as well as the correct learning rate schedule, training time can be reduced from weeks to less than a day by using a large cluster of distributed GPUs. Furthermore, I performed all my experiments on AWS spot instances, drastically reducing the computing costs.

  • Train a Cat Detector Live on 64 GPUs in Less Than Five Minutes
    https://www.youtube.com/watch?v=GKVbPFpEBHk&feature=youtu.be&t=6724

    I designed and coded a cat detector that was trained live in five minutes on 64 GPUs at VentureBeat Transform 2019 using a TensorFlow implementation of RetinaNet. The demonstration by our CEO can be found on YouTube. I was responsible for creating the model as well as creating the AWS cloud environment and Kubernetes cluster, where the model was trained.

Skills

  • Languages

    Python, Kotlin, C++
  • Libraries/APIs

    TensorFlow, PyTorch
  • Platforms

    Docker, Kubernetes, Amazon Web Services (AWS), Linux
  • Other

    Leadership, Communication, Teamwork, Deep Learning, AWS, Algorithms, Machine Learning, Object Detection, Computer Vision, Analytical Thinking, Robot Operating System (ROS), Localization, PID Controllers, Reinforcement Learning
  • Frameworks

    Django
  • Tools

    PyCharm, IntelliJ, Gradle
  • Storage

    Elasticsearch, InfluxDB

Education

  • Master's Degree in Philosophy
    2012 - 2014
    New York University - New York, NY
  • Bachelor's Degree in English and Economics
    2006 - 2010
    Bowdoin College - Brunswick, ME

Certifications

  • Self-Driving Car Engineer Nanodegree
    APRIL 2018 - PRESENT
    Udacity
  • Graph Search, Shortest Paths, and Data Structures
    APRIL 2018 - PRESENT
    Coursera
  • Divide and Conquer, Sorting and Searching, and Randomized Algorithms
    APRIL 2018 - PRESENT
    Coursera
  • Deep Learning Foundation Nanodegree
    SEPTEMBER 2017 - PRESENT
    Udacity
  • Machine Learning
    FEBRUARY 2017 - PRESENT
    Coursera

To view more profiles

Join Toptal
Share it with others