James Arnemann, Statistics Developer in San Francisco, CA, United States
James Arnemann

Statistics Developer in San Francisco, CA, United States

Member since May 25, 2019
James is an experienced data scientist and machine learning engineer with several years of industry experience and publications in leading journals. He's held positions researching and deploying machine learning and deep learning models at UC Berkeley, Intel, National Laboratories, and others.
James is now available for hire




San Francisco, CA, United States



Preferred Environment


The most amazing...

...thing I've implemented was a deep learning algorithm looking at simulated dark matter distributions to predict cosmological parameters that govern our universe.


  • Machine Learning Engineer

    2019 - 2021
    • Created models for behavioral biometrics and continuous multi-factor authentication in the field of cybersecurity.
    • Designed and implemented a typing signature machine learning model used in production.
    • Built deep learning models using accelerometer, GPS, and other mobile data to create multiple behavioral biometrics for a Department of Defense contract.
    Technologies: Python, Amazon Web Services (AWS), Data Science, Machine Learning, Agile, SQL, Git, PyCharm
  • Program Director of Research Science

    2018 - 2019
    New York-Presbyterian Hospital
    • Built predictive models using historical data to predict the number of patients in the emergency departments at the different NYP hospitals.
    • Cleaned and parsed millions of electronic health records and determined hospital-acquired VTE (Venous thromboembolism) rates and metrics of how it's addressed by different hospitals and departments.
    • Developed analytics for oncology rates of the different departments and different cancer types throughout NYPs ambulatory care network.
    • Taught Python programming and data analysis courses to over 50 NYP employees.
    Technologies: SQL, Python, Machine Learning, Data Science
  • Deep Learning Research Scientist

    2017 - 2018
    National Energy Research Scientific Computing Center (NERSC)
    • Implemented deep learning architectures on cosmology simulations to understand and predict the parameters that govern the evolution of the universe.
    • Collaborated with a diverse team from Lawrence Berkeley National Lab, Intel, and Cray, to run these models at state-of-the-art performance on the world's eighth-largest supercomputer.
    • Published in SC18 (The International Conference for High-Performance Computing, Networking, Storage, and Analysis).
    Technologies: TensorFlow, Python
  • Graduate Student Researcher

    2013 - 2018
    UC Berkeley
    • Led multiple computational projects and developed novel algorithms in machine learning.
    • Developed a novel exploration algorithm using Bayesian non-parametric statistical analysis and information theory (accepted to NIPS 2014).
    • Trained an autoencoder neural network to learn temporal dynamics of cellular automata evolution.
    • Classified hand-written digits with an unsupervised neural network algorithm using only local learning rules.
    • Mentored research assistants to take on original research projects.
    Technologies: TensorFlow, Python, Deep Learning, Artificial Intelligence (AI), Machine Learning, Data Science
  • Data Science Intern (Artificial Intelligence Group)

    2017 - 2017
    • Implemented Neural Style Transfer with VGG-19 (Convolutional Neural Network).
    • Reconstructed audio spectrograms from hidden layer activations of Deep Speech 2 (many-layered Bidirectional Recurrent Neural Network model for speech to text).
    • Developed a novel approach for style transfer applied to audio signals.
    Technologies: TensorFlow, Python


  • You've Got Meal

    As part of my experience as a data science fellow at Insight, I deployed a recipe recommendation web app using collaborative filtering and implicit feedback using purchasing and online recipe data.

  • Behavioral Biometric Phone App for Continuous MFA

    A background app that uses multimodal data (e.g. accelerometer, GPS, typing, etc.) for continuous multi-factor authentication. I was the machine learning engineer, where I analyzed data, developed models and built them into production. I worked with mobile and back-end teams to store the models and data in the cloud and deploy the product on Android.

  • CosmoFlow: Using Deep Learning to Learn the Universe at Scale

    I worked with a team from Intel, Cray, and LBNL to build a deep learning model for cosmological data that ran on the entire supercomputer at Berkeley National Labs. I developed and trained the final model. Our work was published at Supercomputing Conference 2018.


  • Languages

    Python, SQL, C++
  • Libraries/APIs

    Scikit-learn, Pandas, NumPy, Matplotlib, TensorFlow, SciPy
  • Paradigms

    Data Science, Agile Software Development, Agile
  • Platforms

    Jupyter Notebook, Amazon Web Services (AWS), Linux, Unix
  • Other

    Machine Learning, Principal Component Analysis (PCA), Data, Linear Regression, Logistic Regression, Data Analytics, Data Analysis, Modeling, Data Modeling, Data Analyst, Random Forests, Deep Learning, K-means Clustering, Statistics, Data Cleaning, Data Visualization, Naive Bayes, Convolutional Neural Networks, Bayesian Statistics, Statistical Modeling, Predictive Analytics, AWS, Big Data, Support Vector Machines (SVM), Agile Data Science, Gradient Boosting, Artificial Intelligence (AI), Programming
  • Tools

    MATLAB, Git, Jira, PyCharm


  • Ph.D. Degree in Physics
    2013 - 2018
    University of California Berkeley - Berkeley, CA
  • Master's Degree in Physics
    2010 - 2013
    University of California Berkeley - Berkeley, CA
  • Bachelor's Degree in Mathematics
    2006 - 2009
    University of Illinois Urbana - Champaign - Urbana, IL

To view more profiles

Join Toptal
Share it with others