
Ben Summers
Verified Expert in Engineering
Data Engineer and Machine Learning Developer
Uppsala, Sweden
Toptal member since June 27, 2019
With a PhD in pure maths, Ben would describe himself as an academic at heart, which means he is deeply passionate about his work. Since finishing his PhD in 2012, he has worked professionally as a back-end and data engineer for a large global company and a small startup. Since 2015, he has been obsessed with machine learning, especially neural networks, and enjoys applying these techniques to solve real-world problems. Ben has been freelancing via Toptal since 2019.
Portfolio
Experience
- SQL - 20 years
- Linux - 17 years
- Python - 16 years
- Machine Learning - 7 years
- Google Cloud Platform (GCP) - 5 years
- Apache Airflow - 4 years
- Apache Spark - 4 years
- BigQuery - 3 years
Availability
Preferred Environment
Linux, Git, PyCharm, Jupyter, Python, Python 3, Generative Pre-trained Transformers (GPT), Docker, GitHub, Google Cloud Platform (GCP), Google Cloud
The most amazing...
...project I've done is my Ph.D. thesis—writing didn't come naturally and it posed a real challenge.
Work Experience
AI and Data Consultant
Sonbol Consulting AB
- Developed a proof of concept in 3D machine learning using PyTorch and PyTorch3D.
- Developed a Shopify app for generating product descriptions using GPT.
- Created a sourdough monitor using modern computer vision techniques.
- Built out a data warehouse for a client using Fivetran and data build tool (dbt).
Principal Scientist
The AI Framework
- Developed a bespoke solution for the vehicle routing problem based on deep reinforcement learning with transformers and PPO, enabling new use cases.
- Deployed and supported HEAVY.AI for a large telecommunications client, which was impressive for various stakeholders.
- Set up VMs and bare-metal hardware for various purposes, resulting in much faster development to the POC stage.
- Set up Ollama, Flowise, and Qdrant for RAG prototyping.
Senior/Mid-senior Data Engineer
Birchbox
- Created pipelines to populate the data warehouse (Redshift) from various sources using Fivetran, including custom connectors in AWS Lambda with Terraform.
- Built out data warehouse (Redshift) with dbt for defining transformations.
- Created reverse ETL pipelines from Redshift into Braze using dbt and Airflow.
- Migrated data from a legacy Magento store into a new Shopify store using Python scripts.
Airflow Engineer (via Toptal)
Idelic
- Ported existing ETL jobs from a legacy Celery-based system to run on Airflow (Astronomer-hosted). The sources included Amazon S3, REST APIs, and SOAP APIs.
- Guided the team in employing Apache Airflow best practices/conventions.
- Strengthened already strong experience with PyCharm, Python, Apache Airflow, and Git.
3D Graphics Machine Learning Engineer
Toptal Client
- Designed and implemented a 3D reconstruction pipeline.
- Constructed a dataset for a high-quality 3D reconstruction.
- Reviewed literature to select the best approach for the client's requirements.
- Used Azure virtual machines to train machine learning models with Weights and Biases for experiment tracking.
Research Programmer
USC ISI (via Toptal)
- Improved cross-lingual query summarization system, resulting in the team winning during the evaluation period despite being in second place before the summarization stage.
- Increased the speed of experiment runs by using an approximate k-nearest neighbors algorithm for embedding lookups using the Annoy library after identifying the bottleneck using py-spy.
- Increased iteration speed and reliability by enforcing design decisions with tests and structuring code.
Data Scientist
Instabridge
- Migrated a data system from AWS to Google Cloud.
- Developed models to identify moving WiFi hotspots, e.g., those hotspots on trains or mobile devices.
- Built models to estimate locations of WiFi hotspots from scans and connections by Android devices.
- Wrote and deployed data models in/with dbt (data build tools).
- Produced various ad-hoc analyses for stakeholders.
- Deployed Snowplow event pipelines on the Google Cloud Platform (GCP) with Cloud Pub/Sub, Dataflow, BigQuery, and Google Compute Engine.
Back-end Developer
Instabridge
- Designed and implemented the back-end architecture utilizing Heroku, AWS, and GCP.
- Implemented data pipelines in Spark running on EMR scheduled with Airflow.
- Applied machine learning to solve core data problems such as estimating locations of WiFi hotspots, quality of hotspots, classifying hotspots as moving or stationary, public or private, and matching hotspots and venues.
- Implemented near real-time data pipelines using AWS Kinesis, lambda functions, and DynamoDB.
Solutions Engineer
Cadence Design Systems
- Developed internal productivity/process web applications for one of the two leading electronic design automation companies.
- Improved my ability to work effectively in teams.
- Developed communication skills.
- Evaluated and continuously ranked priorities based on the business value.
Associate Tutor
University of East Anglia
- Successfully communicated difficult concepts to a range of students.
- Marked coursework of undergraduate mathematics students.
- Helped undergraduate mathematics students with coursework problems.
Experience
Web-based Server Monitor and Admin Tool for Medal of Honor
Fivetran Custom Connectors for a Subscription Box Service
Shopify App for AI-generated Product Descriptions
https://www.sonbol.seEducation
B2 CEFR in Greek Language and Culture
University of Ioannina - Ioannina, Greece
Ph.D in Mathematics
University of East Anglia - Norwich, UK
Master's Degree in Mathematics
University of East Anglia - Norwich, UK
Skills
Libraries/APIs
LSTM, PyTorch, TensorFlow, Fast.ai, Spark ML, FFmpeg, Keras, PySpark, REST APIs, Scikit-learn, Natural Language Toolkit (NLTK), ZeroMQ, Pandas, NumPy, OpenCV, PyTorch3D, Requests, Beautiful Soup, Node.js
Tools
BigQuery, Amazon Elastic MapReduce (EMR), Spark SQL, Apache Airflow, Cron, Microsoft Excel, Jupyter, PyCharm, Git, Perforce, Gensim, Doccano, RabbitMQ, Google Compute Engine (GCE), Terraform, Cloud Dataflow, Google Cloud Composer, GitHub, OpenAI Gym, Amazon Athena, Looker, ChatGPT, AWS Glue, Open Neural Network Exchange (ONNX), Google Analytics
Languages
Python, SQL, Python 3, JavaScript, HTML, PHP, Haskell, Scala, Java, Stored Procedure, R
Paradigms
ETL, Database Design, Functional Programming, Object-oriented Programming (OOP), Business Intelligence (BI), Serverless Architecture, Agile, Search Engine Optimization (SEO), DevOps
Platforms
Linux, Google Cloud Platform (GCP), Amazon Web Services (AWS), AWS Lambda, Heroku, Jupyter Notebook, Oracle, Blackboard, Arduino, Anaconda, Azure, Docker, NVIDIA CUDA, Databricks, AWS IoT, Shopify
Storage
Amazon S3 (AWS S3), JSON, Databases, Database Structure, Database Transactions, Redshift, PostgreSQL, NoSQL, Data Pipelines, Data Integration, Redis, Data Lakes, Google Cloud, Google Cloud Storage, Data Validation, MySQL, MongoDB, PL/SQL
Frameworks
Apache Spark, Spark, Flask, Django, Ruby on Rails (RoR), Selenium, Hadoop, Next.js
Other
EMR, Convolutional Neural Networks (CNNs), Linear Algebra, Google BigQuery, Neural Networks, Deep Learning, Artificial Intelligence (AI), Machine Learning, Data Science, Data Engineering, Deep Neural Networks (DNNs), CSV, Cloud Storage, Data Analysis, APIs, Data Aggregation, Pipelines, Back-end, Data Analytics, AI Programming, Transactions, Data Architecture, Data, EDA, Exploratory Data Analysis, Technical Architecture, ETL Tools, Architecture, Research, API Design, Machine Learning Automation, Supervised Machine Learning, ClickStream, Data Migration, Analytics, Data Transformation, Natural Language Processing (NLP), Probability Theory, Stream Processing, IP Networks, Image Recognition, Statistics, Deep Reinforcement Learning, Computer Vision, Audio, Audio Processing, Digital Signal Processing, Data Modeling, Data Warehousing, Data Warehouse Design, Data Visualization, Data Build Tool (dbt), Infrastructure, Data Reporting, ELT, Generative Pre-trained Transformers (GPT), Web Development, Language Models, Software Architecture, CI/CD Pipelines, Security, Modeling, Dashboards, Computer Vision Algorithms, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Geospatial Data, Large-scale Data Migration, Amazon Redshift, Reporting, Dashboard Design, Dashboard Development, Data Warehouse Testing, Predictive Modeling, AI Model Training, RESTFul APIs, Machine Learning Algorithms, API Integration, Serverless, Big Data, Amazon API Gateway, Reinforcement Learning, Amazon Kinesis, Microsoft 365, Pen & Paper, Generative Adversarial Networks (GANs), Google Data Studio, Lambda Functions, Fivetran, Lean, OpenAI GPT-3 API, OpenAI GPT-4 API, Machine Learning Operations (MLOps), LangChain, Monitoring, FastAPI, Web Scraping, Generative Pre-trained Transformer 3 (GPT-3), Benchmarking, Transformer Models, Open-source LLMs, Large Language Model Operations (LLMOps), Optical Character Recognition (OCR), OpenAI, SaaS
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring