Jacob is available for hire

Jacob Bieker

Verified Expert in Engineering

Artificial Intelligence Developer

Location

London, United Kingdom

Toptal Member Since

November 10, 2022

Jacob is a machine learning researcher with extensive experience going from research ideas and raw data to implementation and results, especially with machine learning methods. He has worked at Google, NASA, and Scale AI, building machine learning systems for astronomy, self-driving cars, and hyperspectral satellites.

Portfolio

Open Climate Fix

PyTorch, Graphs, Weather, Weather Research & Forecasting (WRF)...

Martian Learning Inc.

Machine Learning, Artificial Intelligence (AI), Python, PyTorch...

Pixelcut Inc.

Computer Vision, Python, PyTorch, Machine Learning, Data Loading...

Experience

Computer Vision - 6 years Machine Learning - 6 years Data Pipelines - 6 years PyTorch - 3 years Medical Imaging - 2 years Geospatial Data - 2 years Satellite Images - 2 years Weather Research & Forecasting (WRF) - 1 year

Availability

Part-time

Preferred Environment

Linux, PyCharm, PyTorch, Python

The most amazing...

...thing I've developed is a solar forecasting project that improved the forecasting performance by over 3x and is being used by the UK electric grid.

Work Experience

Machine Learning Research Engineer

2021 - PRESENT

Open Climate Fix

Created a solar energy forecasting model that reduced error for the national UK forecast by 3x compared to the current approach.
Designed and built a data pipeline to handle terabytes of near-real-time satellite data to train and produce solar and weather forecasting models.
Researched and implemented state-of-the-art forecasting models for time series PV prediction and open-sourced data, training code, and models for the wider community.
Mentored multiple junior team members and assisted them with research, including one having a paper accepted to Conference and Workshop on Neural Information Processing Systems (NeurIPs).

Technologies: PyTorch, Graphs, Weather, Weather Research & Forecasting (WRF), Satellite Images, Geospatial Data, Geospatial Analytics, Computer Vision, Data Mining, Data Visualization, Machine Learning, Deep Learning, Python 3, Amazon Web Services (AWS), Data Pipelines, Solar, Python, Big Data, Big Data Architecture, Data Science, DeepSpeed, Data Inference, Fine-tuning, Artificial Intelligence (AI), Image Processing, Technical Leadership, Software Architecture, APIs, Neural Networks, Computer Vision Algorithms, Convolutional Neural Networks (CNN), 3D Image Processing, 3D, Data Loading, Deep Neural Networks, Supervised Machine Learning, Machine Learning Automation, Videos, Data Engineering, Hugging Face, Machine Learning Operations (MLOps), Google Cloud Machine Learning, Amazon Machine Learning, Data Scientist, Algorithms, XGBoost, Energy, Distributed Computing, Programming, AI Programming, Research, Google Cloud, Google Cloud Platform (GCP), Data Analysis, Scikit-learn, Pandas, Amazon S3 (AWS S3), Amazon EC2, Image Analysis, GPU Computing, Cloud Architecture, Llama 2, Natural Language Processing (NLP), Recurrent Neural Networks (RNNs), Google Colaboratory (Colab), AI Research, Generative Pre-trained Transformers (GPT), Open-source Software (OSS), AI Model Training, Optimization, Transformers, Large Language Models (LLMs), Generative Artificial Intelligence (GenAI), Docker, Open-source LLMs, Open Source

ML Engineer

2023 - 2024

Martian Learning Inc.

Built LLM model routing experimentation framework and evaluated various model routing methods.
Co-authored research paper on LLM evaluation and routing.
Finetuned embedding and LLM models for model routing.

Technologies: Machine Learning, Artificial Intelligence (AI), Python, PyTorch, Distributed Computing, AI Model Training, Optimization, Transformers, Large Language Models (LLMs), Generative Artificial Intelligence (GenAI), Docker, OpenAI, Open-source LLMs, Open Source, MongoDB Atlas, Embeddings from Language Models (ELMo)

AI/ML Engineer

2023 - 2023

Pixelcut Inc.

Developed and tested segmentation mask refinement workflows to speed up user-interactive segmentation.
Designed and ran multiple experiments and reviewed the literature to determine the most effective approach to tackle the project.
Built the open-source foundation models in PyTorch.

Technologies: Computer Vision, Python, PyTorch, Machine Learning, Data Loading, Deep Neural Networks, Facial Recognition, Supervised Machine Learning, Videos, Data Engineering, Hugging Face, Machine Learning Operations (MLOps), Data Scientist, Programming, Integration, User Interface (UI), AI Programming, Research, Google Cloud, Google Cloud Platform (GCP), Data Analysis, Amazon S3 (AWS S3), Amazon EC2, Image Analysis, GPU Computing, Cloud Architecture, Image Recognition, AI Research, AI Model Training, Optimization, Transformers, Large Language Models (LLMs), Generative Artificial Intelligence (GenAI), Docker, Open Source

Tech Lead

2021 - 2023

Insight Optics

Developed training and inference pipelines that improved per-frame classification and the quality of final images by two times.
Built machine learning models that improved the stitching of iPhone video frames into a diagnosable quality image.
Managed three junior team members, mentoring them on machine learning, production of models, and data processing.

Technologies: Python 3, Machine Learning, Medical Imaging, Python, Deep Learning, Big Data, Data Inference, Fine-tuning, Artificial Intelligence (AI), Object Tracking, Image Processing, Technical Leadership, Software Architecture, APIs, Neural Networks, OpenCV, OCR, Computer Vision Algorithms, Convolutional Neural Networks (CNN), Data Loading, Deep Neural Networks, Supervised Machine Learning, Videos, Data Engineering, Hugging Face, Machine Learning Operations (MLOps), Google Cloud Machine Learning, Amazon Machine Learning, Data Scientist, Algorithms, XGBoost, Keras, Programming, Integration, User Interface (UI), AI Programming, Research, Google Cloud, Google Cloud Platform (GCP), Data Analysis, Scikit-learn, Pandas, Amazon S3 (AWS S3), Amazon EC2, Image Analysis, GPU Computing, Cloud Architecture, Image Recognition, Recurrent Neural Networks (RNNs), AI Research, Open-source Software (OSS), AI Model Training, Optimization, Transformers, Generative Artificial Intelligence (GenAI), Docker, Open Source, MongoDB Atlas

Machine Learning Research Engineer

2020 - 2021

Scale AI

Developed sensor fusion pipeline of LiDAR and camera images for autonomous vehicles, including for building and training object detection and attribute classification models.
Built and used state-of-the-art models for pre-labeling self-driving vehicle data and linting outputs from human labelers.
Created data pipelines and ETL for massive amounts of multi-sensor data for training and serving models.
Collaborated across 2D and 3D teams to support internal and external customer requests.

Technologies: PyTorch, Machine Learning, Sensor Fusion, Computer Vision, Autonomous Navigation, Autonomous Robots, Amazon Web Services (AWS), Mentorship, Point Clouds, Object Detection, Python, Deep Learning, Big Data, Big Data Architecture, Data Science, Data Inference, Fine-tuning, Artificial Intelligence (AI), Object Tracking, Image Processing, Technical Leadership, Software Architecture, APIs, Neural Networks, OpenCV, Computer Vision Algorithms, Convolutional Neural Networks (CNN), 3D Image Processing, 3D, Data Loading, Deep Neural Networks, Variational Autoencoders, Supervised Machine Learning, Machine Learning Automation, Videos, Data Engineering, Machine Learning Operations (MLOps), Amazon Machine Learning, Data Scientist, Algorithms, Distributed Computing, Programming, Integration, AI Programming, Research, Self-driving Cars, Data Analysis, Pandas, Amazon S3 (AWS S3), Amazon EC2, Image Analysis, GPU Computing, Cloud Architecture, Image Recognition, Recurrent Neural Networks (RNNs), AI Research, AI Model Training, Optimization, Transformers, Generative Artificial Intelligence (GenAI), Docker, Cloud Point, Open Source, MongoDB Atlas

Software Engineering Intern

2019 - 2019

Google

Developed and deployed a machine learning model for GSuite's growth team.
Authored and presented various tech documents for improvements to the machine learning deployment for the team.
Oversaw the launch of the machine learning model and initial data collection on the effectiveness of the model recommendations on worldwide users.

Technologies: Java, Machine Learning, Deep Learning, Big Data, Big Data Architecture, Data Science, Data Inference, Artificial Intelligence (AI), Software Architecture, APIs, Neural Networks, Data Loading, Supervised Machine Learning, Machine Learning Automation, Data Scientist, Algorithms, Distributed Computing, Programming, Integration, AI Programming, Google Cloud, Google Cloud Platform (GCP), Data Analysis, AI Model Training, Optimization, Docker, Open Source

Machine Learning Intern

2019 - 2019

NASA

Built machine learning technology for hyperspectral image segmentation for Earth satellite observations using Hyperion L1R data.
Developed data pipeline to efficiently load large amounts of hyperspectral imagery in a high-performance computing (HPC) environment.
Liaised with other team members to ensure that final models would work on the field programmable gate arrays (FPGA) hardware.

Technologies: Python 3, TensorFlow, Satellite Images, Geospatial Data, Python, Deep Learning, Big Data, Data Science, Data Inference, Artificial Intelligence (AI), Object Tracking, Image Processing, Software Architecture, Neural Networks, Computer Vision Algorithms, Convolutional Neural Networks (CNN), 3D Image Processing, Data Loading, Deep Neural Networks, Variational Autoencoders, Supervised Machine Learning, Geospatial Analytics, Data Engineering, Data Scientist, Algorithms, Keras, AI Programming, Research, Data Analysis, Scikit-learn, Pandas, Image Analysis, GPU Computing, Image Recognition, Recurrent Neural Networks (RNNs), AI Research, Open-source Software (OSS), AI Model Training, Optimization, Generative Artificial Intelligence (GenAI), Docker, Open Source

Experience

Solar Forecasting

https://github.com/openclimatefix/nowcasting

A short-term PV energy output forecasting project for the UK's National Grid. I was one of the research engineers who built the satellite data processing application that takes live EUMETSAT RSS imagery every five minutes and transforms it into a Zarr store.

Worked on creating the first public implementations of state-of-the-art Google MetNet, MetNet-2, and DeepMind DGMR models, as well as graph weather models for forecasting the weather and solar power output directly. Additionally, I built our data transformation pipeline, released our training data, and trained models on HuggingFace to enable easier access and research by the community. I worked with open source contributors to ensure the code I wrote was as widely functional as possible and that the performance of the data pipelines and models was as efficient as possible.

FACTNN

https://github.com/jacobbieker/factnn

A Tensorflow-based application for analyzing, classifying, and estimating the original energy of events detected at the First G-APD Cherenkov Telescope (FACT).

This was my dissertation for my bachelor's degree. I built the data pipeline from the original compressed data format into an easier-to-work-with format, created the neural networks, and succeeded in outperforming the current state-of-the-art model for classifying the type of event and determining its original energy using 3D convolutional neural networks (CNNs).

I ended up presenting the research at the American Astronomical Society meeting.

Project Reslience

https://www.itu.int/en/ITU-T/extcoop/ai-data-commons/Pages/project-resilience.aspx

A project run by the UN agency, the International Telecommunication Union (ITU), to help local and regional governments choose the most cost- or CO2-effective actions to reduce their carbon emissions.

I am on the MVP working groups for data and machine learning, where I have been pulling together public datasets and converting them to a standard format for use in later machine learning models.

Diabetic Retinopathy Image Analysis and Stitching

A PyTorch-based project for enabling iPhone videos of eyes to be stitched together and filtered to find diagnosable quality images to send to an ophthalmologist for diagnosis.

I built the training pipeline and took new research models to stitch the video frames into high-quality images. I also productionized the models to fit on more constrained hardware and helped deploy the models to AWS.

Machine Learning for On-board Hyperspectral Image Segmentation for NASA

I designed and built models to segment out clouds for onboard future hyperspectral Earth observation satellites at NASA.

I used TensorFlow to build U-Net-derived models that worked well to segment out clouds in the raw and uncalibrated hyperspectral images, allowing for no preprocessing to be done and a faster processing time.

LOFARNN

https://github.com/jacobbieker/lofarnn

A PyTorch-based master's dissertation to use machine learning to find supermassive black holes and outbursts in a radio map of the northern hemisphere and associate them with optical counterparts.

I built the data ingestion pipeline, analyzed the crowdsourced data to find labeling errors, and developed and trained models that could successfully associate black holes with optical counterparts for cases where other methods failed.

Education

2018 - 2021

Master's Degree in Astronomy and Data Science

Leiden University - Leiden, Netherlands

2014 - 2018

Bachelor's Degree in Physics

University of Oregon - Eugene, Oregon

Skills

Libraries/APIs

PyTorch, Pandas, TensorFlow, OpenCV, Keras, Scikit-learn, XGBoost

Tools

PyCharm, MongoDB Atlas

Languages

Python 3, Python, R, Fortran, C, Java

Platforms

Docker, Amazon EC2, Amazon Web Services (AWS), Linux, Google Cloud Platform (GCP)

Paradigms

Data Science, Distributed Computing

Storage

Data Pipelines, Amazon S3 (AWS S3), Google Cloud

Other

Deep Learning, Machine Learning, Computer Vision, Data Analysis, Big Data, Artificial Intelligence (AI), Image Processing, Neural Networks, Computer Vision Algorithms, Convolutional Neural Networks (CNN), Data Loading, Deep Neural Networks, Supervised Machine Learning, Videos, Data Engineering, Data Scientist, Programming, AI Programming, Research, Image Analysis, GPU Computing, Image Recognition, Recurrent Neural Networks (RNNs), AI Research, Open-source Software (OSS), AI Model Training, Generative Artificial Intelligence (GenAI), Open Source, Data Visualization, Weather Research & Forecasting (WRF), Satellite Images, Geospatial Data, Time Series, Time Series Analysis, Medical Imaging, Big Data Architecture, Data Inference, Fine-tuning, Object Tracking, Technical Leadership, Software Architecture, 3D Image Processing, 3D, Facial Recognition, Variational Autoencoders, Machine Learning Automation, Hugging Face, Machine Learning Operations (MLOps), Algorithms, Energy, Self-driving Cars, Cloud Architecture, Llama 2, Natural Language Processing (NLP), Google Colaboratory (Colab), Generative Pre-trained Transformers (GPT), Optimization, Transformers, Large Language Models (LLMs), OpenAI, Cloud Point, Open-source LLMs, Embeddings from Language Models (ELMo), Data Mining, Computational Biology, Graphs, Weather, Geospatial Analytics, Solar, Technical Writing, Sensor Data, Medical Software, Sensor Fusion, Autonomous Navigation, Autonomous Robots, Mentorship, Point Clouds, Object Detection, DeepSpeed, APIs, OCR, Google Cloud Machine Learning, Amazon Machine Learning, Integration, User Interface (UI), Autonomous AI

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring