Nicholas Roth, Big Data Developer in Austin, TX, United States
Nicholas Roth

Big Data Developer in Austin, TX, United States

Member since January 25, 2020
Nicholas is an efficient deep learning engineer and data scientist with experience in data curation, classical machine learning algorithms, and statistical modeling. His initial focus is on data extraction. Once an algorithm (and intuition) shows the most crucial areas, he shifts gears to get the "juice" out of that data before testing and deploying models in a client's infrastructure, delivering exceptional and detail-oriented work.
Nicholas is now available for hire




Austin, TX, United States



Preferred Environment

Vim Text Editor, Jupyter, Linux, Amazon Web Services (AWS), Pandas, NumPy, PyTorch

The most amazing...

...project I've worked on is with the startup I designed and built a system that trains generic predictive models for nontechnical customers.


  • Principal Machine Learning Engineer

    2020 - PRESENT
    KUNGFU.AI Advanced Data Science Services
    • Characterized behavior of an unsupervised learning model meant for government use and implemented code and tests for its deployment.
    • Developed project plan and tracked using Asana with some agile principles.
    • Delivered regular status updates to client and facilitated requirements discussions.
    Technologies: Google Cloud Platform (GCP), TensorFlow, PyTorch, Python 3
  • Deep Learning Engineer (Hybrid Data Scientist and Data Engineer)

    2018 - 2020
    • Gave its first real AI capabilities--built a predictive modeling and analytics stack from the ground up with scalable AWS (ECS, EC2, Docker) and ML (feature engineering, RNNs, DNNs, and classifiers using PyTorch, XGBoost, Pandas) tools.
    • Improved Node's capabilities with new models and new features (e.g., LSTM/GRU sequence models, denoising autoencoder models for data enhancement and neural embeddings).
    • Consulted on coworker projects and acted as resident research-paper-reader.
    Technologies: PyTorch, Linux, Amazon Web Services (AWS), Amazon, Python
  • Research Assistant/Research Engineer

    2012 - 2018
    Oracle Labs
    • Expanded the market for Oracle's big data analytics offering by providing cutting-edge fraud detection capabilities.
    • Expanded Oracle PGX's analytics market share by building a linearly-scalable asynchronous query engine for its distributed execution mode (see GRADES17 paper).
    • Introduced Oracle PGX graph analytics to the big data market by building its first large-scale distributed execution mode in C++ and Node.js.
    • Gave fine-grained performance optimization capabilities to the Oracle Database team by writing a custom Linux kernel.
    • Provided the option to run Oracle Database in a safe Java-like environment with on-demand profile-guided optimization using a dynamic C/C++ LLVM runtime.
    • Demonstrated and designed a new research product for selling big-data tools; a graphical system for creating dataflow graphs.
    • Demonstrated Oracle Coherence to customers for writing message-oriented middleware by building a highly scalable Java application server.
    • Showcased Oracle Labs' new Truffle/GraalVM compiler technology for optimized, scalable big data operations in legacy languages by writing an HTML5 programmer's notebook in Node.js.
    Technologies: Oracle RDBMS, Linux Kernel, C, C++, Linux, TensorFlow


  • Embeddings for Music (Development)

    I worked with a team to scrape music video playlists from YouTube and used the data to learn Doc2vec embeddings with my custom implementation of the algorithm in TensorFlow. To build a music recommender system from this, I took an existing “playlist vector” from a user’s listening history and sorted the songs in our dataset by cosine distance, returning the top ones. Before leaving, I suggested that the next step might be to use content-based embeddings derived from the hidden layer of an LSTM trained on the videos’ soundtracks.


  • Languages

    C++, Python, C, Python 3
  • Libraries/APIs

    PyTorch, NumPy, Pandas, TensorFlow
  • Other

    Machine Learning, Artificial Intelligence (AI), Big Data, Algorithms, Natural Language Processing (NLP), Linux Kernel
  • Paradigms

    Data Science, Automation
  • Platforms

    Linux, Google Cloud Platform (GCP), Amazon, Amazon Web Services (AWS)
  • Tools

    Jupyter, Vim Text Editor
  • Storage

    Oracle RDBMS

To view more profiles

Join Toptal
Share it with others