Nicholas is available for hire

Nicholas Roth

Verified Expert in Engineering

Big Data Developer

Location

Austin, TX, United States

Toptal Member Since

March 10, 2020

Nicholas is an efficient machine learning engineer and data scientist with experience in data curation, classical machine learning algorithms, and statistical modeling. Starting an engagement, he searches for data and works with the customer on what to build. Once an algorithm and intuition show the most crucial areas, Nicholas shifts gears to get the "juice" out of that data before testing and deploying models in a client's infrastructure, delivering exceptional and detail-oriented work.

Big Data Machine Learning Artificial Intelligence (AI)C++PyTorch Algorithms Natural Language Processing (NLP)C Linux Python Automation Amazon Web Services (AWS)TensorFlow Linux Kernel Python 3

Portfolio

Google

Artificial Intelligence (AI), Python, C++, SQL, Hyperparameters

KUNGFU.AI Advanced Data Science Services

Google Cloud Platform (GCP), TensorFlow, PyTorch, Python 3

Node.io

PyTorch, Linux, Amazon Web Services (AWS), Amazon, Python

Experience

Linux - 9 years Big Data - 7 years Python - 6 years Algorithms - 5 years Artificial Intelligence (AI) - 5 years Machine Learning - 5 years Data Science - 3 years GPT - 3 years

Availability

Part-time

Preferred Environment

Vim Text Editor, Jupyter, Linux, Amazon Web Services (AWS), Pandas, NumPy, PyTorch, TensorFlow

The most amazing...

...project I've designed and built is a system for the startup Node.io that trains deep learning models for non-technical customers.

Work Experience

Software Engineer | Machine Learning

2022 - PRESENT

Google

Performed relevant work in a horizontal machine-learning team.
Collaborated with various stakeholders in different organizations from my own.
Built software which resulted essential to the company.

Technologies: Artificial Intelligence (AI), Python, C++, SQL, Hyperparameters

Senior Machine Learning Engineer

2020 - PRESENT

KUNGFU.AI Advanced Data Science Services

Characterized behavior of an unsupervised learning model meant for government use and implemented code and tests for its deployment.
Developed project plan and tracked using Asana with agile principles.
Delivered regular status updates to clients and facilitated requirements discussions.

Technologies: Google Cloud Platform (GCP), TensorFlow, PyTorch, Python 3

Deep Learning Engineer (Hybrid Data Scientist and Data Engineer)

2018 - 2020

Node.io

Gave Node.io its first real AI capabilities--built a predictive modeling and analytics stack from the ground up with scalable AWS (ECS, EC2, Docker) and ML (feature engineering, RNNs, DNNs, and classifiers using PyTorch, XGBoost, Pandas) tools.
Improved Node's capabilities with new models and new features (e.g., LSTM/GRU sequence models, denoising autoencoder models for data enhancement and neural embeddings).
Consulted on coworker projects and acted as resident research-paper-reader.

Technologies: PyTorch, Linux, Amazon Web Services (AWS), Amazon, Python

Research Assistant/Research Engineer

2012 - 2018

Oracle Labs

Expanded the market for Oracle's big data analytics offering by providing cutting-edge fraud detection capabilities.
Expanded Oracle PGX's analytics market share by building a linearly-scalable asynchronous query engine for its distributed execution mode (see GRADES17 paper).
Introduced Oracle PGX graph analytics to the big data market by building its first large-scale distributed execution mode in C++ and Node.js.
Gave fine-grained performance optimization capabilities to the Oracle Database team by writing a custom Linux kernel.
Provided the option to run Oracle Database in a safe Java-like environment with on-demand profile-guided optimization using a dynamic C/C++ LLVM runtime.
Demonstrated and designed a new research product for selling big-data tools; a graphical system for creating dataflow graphs.
Demonstrated Oracle Coherence to customers for writing message-oriented middleware by building a highly scalable Java application server.
Showcased Oracle Labs' new Truffle/GraalVM compiler technology for optimized, scalable big data operations in legacy languages by writing an HTML5 programmer's notebook in Node.js.

Technologies: Oracle RDBMS, Linux Kernel, C, C++, Linux, TensorFlow

Experience

Embeddings for Music

http://ambii.io

I worked with a team to scrape music video playlists from YouTube and used the data to learn Doc2vec embeddings with my custom implementation of the algorithm in TensorFlow. To build a music recommender system from this, I took an existing “playlist vector” from a user’s listening history and sorted the songs in our dataset by cosine distance, returning the top ones. Before leaving, I suggested that the next step might be to use content-based embeddings derived from the hidden layer of an LSTM trained on the videos’ soundtracks.

Skills

Languages

C++, Python, C, Python 3, SQL

Libraries/APIs

PyTorch, NumPy, Pandas, TensorFlow

Other

Machine Learning, Artificial Intelligence (AI), Big Data, Algorithms, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Linux Kernel, Hyperparameters

Paradigms

Data Science, Automation

Platforms

Linux, Google Cloud Platform (GCP), Amazon, Amazon Web Services (AWS)

Tools

Jupyter, Vim Text Editor

Storage

Oracle RDBMS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring