James Arnemann, Developer in San Francisco, CA, United States
James is available for hire
Hire James

James Arnemann

Verified Expert  in Engineering

Statistics Developer

San Francisco, CA, United States
Toptal Member Since
August 2, 2019

James is an experienced data scientist and machine learning engineer with several years of industry experience and publications in leading journals. He's held positions researching and deploying machine learning and deep learning models at UC Berkeley, Intel, National Laboratories, and others.



Preferred Environment


The most amazing...

...thing I've implemented was a deep learning algorithm looking at simulated dark matter distributions to predict cosmological parameters that govern our universe.

Work Experience

Machine Learning Engineer

2019 - 2021
  • Created models for behavioral biometrics and continuous multi-factor authentication in the field of cybersecurity.
  • Designed and implemented a typing signature machine learning model used in production.
  • Built deep learning models using accelerometer, GPS, and other mobile data to create multiple behavioral biometrics for a Department of Defense contract.
Technologies: Python, Amazon Web Services (AWS), Data Science, Machine Learning, Agile, SQL, Git, PyCharm

Program Director of Research Science

2018 - 2019
New York-Presbyterian Hospital
  • Built predictive models using historical data to predict the number of patients in the emergency departments at the different NYP hospitals.
  • Cleaned and parsed millions of electronic health records and determined hospital-acquired VTE (Venous thromboembolism) rates and metrics of how it's addressed by different hospitals and departments.
  • Developed analytics for oncology rates of the different departments and different cancer types throughout NYPs ambulatory care network.
  • Taught Python programming and data analysis courses to over 50 NYP employees.
Technologies: SQL, Python, Machine Learning, Data Science

Deep Learning Research Scientist

2017 - 2018
National Energy Research Scientific Computing Center (NERSC)
  • Implemented deep learning architectures on cosmology simulations to understand and predict the parameters that govern the evolution of the universe.
  • Collaborated with a diverse team from Lawrence Berkeley National Lab, Intel, and Cray, to run these models at state-of-the-art performance on the world's eighth-largest supercomputer.
  • Published in SC18 (The International Conference for High-Performance Computing, Networking, Storage, and Analysis).
Technologies: TensorFlow, Python

Graduate Student Researcher

2013 - 2018
UC Berkeley
  • Led multiple computational projects and developed novel algorithms in machine learning.
  • Developed a novel exploration algorithm using Bayesian non-parametric statistical analysis and information theory (accepted to NIPS 2014).
  • Trained an autoencoder neural network to learn temporal dynamics of cellular automata evolution.
  • Classified hand-written digits with an unsupervised neural network algorithm using only local learning rules.
  • Mentored research assistants to take on original research projects.
Technologies: TensorFlow, Python, Deep Learning, Artificial Intelligence (AI), Machine Learning, Data Science

Data Science Intern (Artificial Intelligence Group)

2017 - 2017
  • Implemented Neural Style Transfer with VGG-19 (Convolutional Neural Network).
  • Reconstructed audio spectrograms from hidden layer activations of Deep Speech 2 (many-layered Bidirectional Recurrent Neural Network model for speech to text).
  • Developed a novel approach for style transfer applied to audio signals.
Technologies: TensorFlow, Python

Behavioral Biometric Phone App for Continuous MFA

A background app that uses multimodal data (e.g. accelerometer, GPS, typing, etc.) for continuous multi-factor authentication. I was the machine learning engineer, where I analyzed data, developed models and built them into production. I worked with mobile and back-end teams to store the models and data in the cloud and deploy the product on Android.

CosmoFlow: Using Deep Learning to Learn the Universe at Scale

I worked with a team from Intel, Cray, and LBNL to build a deep learning model for cosmological data that ran on the entire supercomputer at Berkeley National Labs. I developed and trained the final model. Our work was published at Supercomputing Conference 2018.

You've Got Meal

As part of my experience as a data science fellow at Insight, I deployed a recipe recommendation web app using collaborative filtering and implicit feedback using purchasing and online recipe data.


Python, SQL, C++


Scikit-learn, Pandas, NumPy, Matplotlib, TensorFlow, SciPy


Data Science, Agile Software Development, Agile


Jupyter Notebook, Amazon Web Services (AWS), Linux, Unix


Machine Learning, Principal Component Analysis (PCA), Data, Linear Regression, Logistic Regression, Data Analytics, Data Analysis, Modeling, Data Modeling, Data Scraping, Web Scraping, Random Forests, Deep Learning, K-means Clustering, Statistics, Data Cleaning, Data Visualization, Naive Bayes, Convolutional Neural Networks, Bayesian Statistics, Statistical Modeling, Predictive Analytics, Big Data, Support Vector Machines (SVM), Agile Data Science, Gradient Boosting, Artificial Intelligence (AI), Programming, Dashboards


MATLAB, Git, Jira, PyCharm

2013 - 2018

Ph.D. Degree in Physics

University of California Berkeley - Berkeley, CA

2010 - 2013

Master's Degree in Physics

University of California Berkeley - Berkeley, CA

2006 - 2009

Bachelor's Degree in Mathematics

University of Illinois Urbana - Champaign - Urbana, IL