Jacob Bieker
Verified Expert in Engineering
Artificial Intelligence Developer
London, United Kingdom
Toptal member since November 10, 2022
Jacob is a machine learning researcher with extensive experience going from research ideas and raw data to implementation and results, especially with machine learning methods. He has worked at Google, NASA, and Scale AI, building machine learning systems for astronomy, self-driving cars, and hyperspectral satellites.
Portfolio
Experience
- Computer Vision - 6 years
- Machine Learning - 6 years
- Data Pipelines - 6 years
- PyTorch - 3 years
- Medical Imaging - 2 years
- Geospatial Data - 2 years
- Satellite Images - 2 years
- Weather Research & Forecasting (WRF) - 1 year
Availability
Preferred Environment
Linux, PyCharm, PyTorch, Python
The most amazing...
...thing I've developed is a solar forecasting project that improved the forecasting performance by over 3x and is being used by the UK electric grid.
Work Experience
Machine Learning Research Engineer
Open Climate Fix
- Created a solar energy forecasting model that reduced error for the national UK forecast by 3x compared to the current approach.
- Designed and built a data pipeline to handle terabytes of near-real-time satellite data to train and produce solar and weather forecasting models.
- Researched and implemented state-of-the-art forecasting models for time series PV prediction and open-sourced data, training code, and models for the wider community.
- Mentored multiple junior team members and assisted them with research, including one having a paper accepted to Conference and Workshop on Neural Information Processing Systems (NeurIPs).
ML Engineer
Martian Learning Inc.
- Built LLM model routing experimentation framework and evaluated various model routing methods.
- Co-authored research paper on LLM evaluation and routing.
- Finetuned embedding and LLM models for model routing.
AI/ML Engineer
Pixelcut Inc.
- Developed and tested segmentation mask refinement workflows to speed up user-interactive segmentation.
- Designed and ran multiple experiments and reviewed the literature to determine the most effective approach to tackle the project.
- Built the open-source foundation models in PyTorch.
Tech Lead
Insight Optics
- Developed training and inference pipelines that improved per-frame classification and the quality of final images by two times.
- Built machine learning models that improved the stitching of iPhone video frames into a diagnosable quality image.
- Managed three junior team members, mentoring them on machine learning, production of models, and data processing.
Machine Learning Research Engineer
Scale AI
- Developed sensor fusion pipeline of LiDAR and camera images for autonomous vehicles, including for building and training object detection and attribute classification models.
- Built and used state-of-the-art models for pre-labeling self-driving vehicle data and linting outputs from human labelers.
- Created data pipelines and ETL for massive amounts of multi-sensor data for training and serving models.
- Collaborated across 2D and 3D teams to support internal and external customer requests.
Software Engineering Intern
- Developed and deployed a machine learning model for GSuite's growth team.
- Authored and presented various tech documents for improvements to the machine learning deployment for the team.
- Oversaw the launch of the machine learning model and initial data collection on the effectiveness of the model recommendations on worldwide users.
Machine Learning Intern
NASA
- Built machine learning technology for hyperspectral image segmentation for Earth satellite observations using Hyperion L1R data.
- Developed data pipeline to efficiently load large amounts of hyperspectral imagery in a high-performance computing (HPC) environment.
- Liaised with other team members to ensure that final models would work on the field programmable gate arrays (FPGA) hardware.
Experience
Solar Forecasting
https://github.com/openclimatefix/nowcastingWorked on creating the first public implementations of state-of-the-art Google MetNet, MetNet-2, and DeepMind DGMR models, as well as graph weather models for forecasting the weather and solar power output directly. Additionally, I built our data transformation pipeline, released our training data, and trained models on HuggingFace to enable easier access and research by the community. I worked with open source contributors to ensure the code I wrote was as widely functional as possible and that the performance of the data pipelines and models was as efficient as possible.
FACTNN
https://github.com/jacobbieker/factnnThis was my dissertation for my bachelor's degree. I built the data pipeline from the original compressed data format into an easier-to-work-with format, created the neural networks, and succeeded in outperforming the current state-of-the-art model for classifying the type of event and determining its original energy using 3D convolutional neural networks (CNNs).
I ended up presenting the research at the American Astronomical Society meeting.
Project Reslience
https://www.itu.int/en/ITU-T/extcoop/ai-data-commons/Pages/project-resilience.aspxI am on the MVP working groups for data and machine learning, where I have been pulling together public datasets and converting them to a standard format for use in later machine learning models.
Diabetic Retinopathy Image Analysis and Stitching
I built the training pipeline and took new research models to stitch the video frames into high-quality images. I also productionized the models to fit on more constrained hardware and helped deploy the models to AWS.
Machine Learning for On-board Hyperspectral Image Segmentation for NASA
I used TensorFlow to build U-Net-derived models that worked well to segment out clouds in the raw and uncalibrated hyperspectral images, allowing for no preprocessing to be done and a faster processing time.
LOFARNN
https://github.com/jacobbieker/lofarnnI built the data ingestion pipeline, analyzed the crowdsourced data to find labeling errors, and developed and trained models that could successfully associate black holes with optical counterparts for cases where other methods failed.
Education
Master's Degree in Astronomy and Data Science
Leiden University - Leiden, Netherlands
Bachelor's Degree in Physics
University of Oregon - Eugene, Oregon
Skills
Libraries/APIs
PyTorch, Pandas, TensorFlow, OpenCV, Keras, Scikit-learn, DeepSpeed, XGBoost
Tools
PyCharm, MongoDB Atlas
Languages
Python 3, Python, R, Fortran, C, Java
Platforms
Docker, Amazon EC2, Amazon Web Services (AWS), Linux, Google Cloud Platform (GCP)
Storage
Data Pipelines, Amazon S3 (AWS S3), Google Cloud
Paradigms
Distributed Computing
Other
Deep Learning, Machine Learning, Computer Vision, Data Analysis, Big Data, Data Science, Artificial Intelligence (AI), Image Processing, Neural Networks, Computer Vision Algorithms, Convolutional Neural Networks (CNNs), Data Loading, Deep Neural Networks (DNNs), Supervised Machine Learning, Videos, Data Engineering, Data Scientist, Programming, AI Programming, Research, Image Analysis, GPU Computing, Image Recognition, Recurrent Neural Networks (RNNs), AI Research, Open-source Software (OSS), AI Model Training, Large Language Models (LLMs), Generative Artificial Intelligence (GenAI), Open Source, Models, Model Tuning, Data Visualization, Weather Research & Forecasting (WRF), Satellite Images, Geospatial Data, Time Series, Time Series Analysis, Medical Imaging, Big Data Architecture, Data Inference, Fine-tuning, Object Tracking, Technical Leadership, Software Architecture, 3D Image Processing, 3D, Facial Recognition, Variational Autoencoders, Machine Learning Automation, Hugging Face, Machine Learning Operations (MLOps), Algorithms, Energy, Self-driving Cars, Cloud Architecture, Llama 2, Natural Language Processing (NLP), Google Colaboratory (Colab), Generative Pre-trained Transformers (GPT), Optimization, Transformers, OpenAI, Cloud Point, Open-source LLMs, Embeddings from Language Models (ELMo), Data Mining, Computational Biology, Graphs, Weather, Geospatial Analytics, Solar, Technical Writing, Sensor Data, Medical Software, Sensor Fusion, Autonomous Navigation, Autonomous Robots, Mentorship, Point Clouds, Object Detection, APIs, Optical Character Recognition (OCR), Google Cloud Machine Learning, Amazon Machine Learning, Integration, User Interface (UI), Autonomous AI
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring