Ben Summers, Data Engineer and Machine Learning Developer in Uppsala, Sweden
Ben Summers

Data Engineer and Machine Learning Developer in Uppsala, Sweden

Member since May 16, 2019
With a Ph.D. in pure maths, Ben would describe himself as an academic at heart, which means he is deeply passionate about his work. Since finishing his Ph.D. in 2012, he has worked professionally as a back-end and data engineer for a large global company and a small startup. Since 2015, he has been obsessed with machine learning, especially neural networks, and enjoys applying these techniques to solve real-world problems. Ben has been freelancing via Toptal since 2019.
Ben is now available for hire

Portfolio

  • Idelic (via Toptal)
    Apache Airflow, Python, Python 3, Requests, Pandas, Git, GitHub, Docker, AWS...
  • Toptal Client
    PyTorch, PyTorch3D, Azure, Data Pipelines, Python, CSV, Machine Learning...
  • USC ISI (via Toptal)
    Doccano, Jupyter, PyCharm, ZeroMQ, Flask, Gensim, NLTK, Python, Python 3...

Experience

Location

Uppsala, Sweden

Availability

Full-time

Preferred Environment

Linux, Git, PyCharm, Jupyter, Python, Python 3

The most amazing...

...project I've done is my Ph.D. thesis—writing didn't come naturally and it posed a real challenge, thanks to which I learned that what I want to do is explore.

Employment

  • Airflow Engineer for a Data Management Platform

    2021 - 2022
    Idelic (via Toptal)
    • Ported existing ETL jobs from a legacy Celery-based system to run on Airflow (Astronomer-hosted). Sources included S3, REST APIs, and SOAP APIs.
    • Guided the team to employ Apache Airflow best practices/conventions.
    • Strengthened already strong experience with PyCharm, Python, Apache Airflow, and Git.
    Technologies: Apache Airflow, Python, Python 3, Requests, Pandas, Git, GitHub, Docker, AWS, Cloud Storage, Infrastructure, APIs, Data Integration, Amazon S3 (AWS S3), Data Aggregation, Pipelines, Beautiful Soup
  • 3D Graphics Machine Learning Engineer

    2020 - 2021
    Toptal Client
    • Designed and implemented a 3D reconstruction pipeline.
    • Constructed a dataset for a high-quality 3D reconstruction.
    • Reviewed literature to select the best approach for the client's requirements.
    Technologies: PyTorch, PyTorch3D, Azure, Data Pipelines, Python, CSV, Machine Learning, Python 3, Computer Vision, Data Science, Git, Linear Algebra, Convolutional Neural Networks, Neural Networks, Probability Theory, Image Recognition, Data Visualization, Cloud Storage, Infrastructure, Data Analysis, Data Reporting, APIs, Data Integration, Data Aggregation, Pipelines, Artificial Intelligence (AI), Deep Learning
  • Research Programmer

    2019 - 2020
    USC ISI (via Toptal)
    • Improved cross-lingual query summarization system, resulting in the team winning during the evaluation period despite being in second place before the summarization stage.
    • Increased the speed of experiment runs by using an approximate k-nearest neighbors algorithm for embedding lookups using the Annoy library after identifying the bottleneck using py-spy.
    • Increased iteration speed and reliability by enforcing design decisions with tests and structuring code.
    Technologies: Doccano, Jupyter, PyCharm, ZeroMQ, Flask, Gensim, NLTK, Python, Python 3, Linux, NumPy, Git, Data Pipelines, CSV, Machine Learning, Data Science, Probability Theory, Data Visualization, Infrastructure, Data Analysis, Data Reporting, APIs, Microsoft Excel, Data Integration, Data Aggregation, Beautiful Soup, Artificial Intelligence (AI), Deep Learning
  • Data Scientist

    2018 - 2019
    Instabridge
    • Migrated a data system from AWS to Google Cloud.
    • Developed models to identify moving WiFi hotspots, e.g., those hotspots on trains or mobile devices.
    • Built models to estimate locations of WiFi hotspots from scans and connections by Android devices.
    • Wrote and deployed data models in/with dbt (data build tools).
    • Produced various ad-hoc analyses for stakeholders.
    • Deployed Snowplow event pipelines on the Google Cloud Platform (GCP) with Cloud Pub/Sub, Dataflow, BigQuery, and Google Compute Engine.
    Technologies: Data Flows, Keras, TensorFlow, Scikit-learn, Pandas, PyTorch, Spark, BigQuery, EMR, ETL, Apache Airflow, Spark ML, AWS EMR, Google Cloud Platform (GCP), Spark SQL, Python 3, Linux, Big Data, AWS Kinesis, Redshift, AWS Athena, Agile, NumPy, Scala, Git, NoSQL, Data Modeling, Data Pipelines, Data Engineering, Google Data Studio, CSV, Machine Learning, SQL, Python, Computer Vision, Apache Spark, Data Science, Serverless, Linear Algebra, Neural Networks, LSTM, Probability Theory, Cloud Dataflow, Data Warehousing, Data Warehouse Design, Data Visualization, Data Building Tool (DBT), Cloud Storage, Infrastructure, Data Analysis, Data Reporting, APIs, Microsoft Excel, Data Integration, Amazon S3 (AWS S3), Data Aggregation, Lambda Functions, Pipelines, AWS, Beautiful Soup, Artificial Intelligence (AI), Deep Learning, OpenAI Gym, Amazon Athena
  • Back-end Developer

    2015 - 2018
    Instabridge
    • Designed and implemented the back-end architecture utilizing Heroku, AWS, and GCP.
    • Implemented data pipelines in Spark running on EMR scheduled with Airflow.
    • Applied machine learning to solve core data problems such as estimating locations of WiFi hotspots, quality of hotspots, classifying hotspots as moving or stationary, public or private, and matching hotspots and venues.
    • Implemented near real-time data pipelines using AWS Kinesis, lambda functions, and DynamoDB.
    Technologies: Amazon Web Services (AWS), Spark, MongoDB, RabbitMQ, Google Cloud Platform (GCP), AWS, Heroku, Ruby on Rails (RoR), ETL, Apache Airflow, Spark ML, AWS EMR, PostgreSQL, JavaScript, Spark SQL, BigQuery, Linux, Big Data, AWS Kinesis, Redshift, AWS Athena, Agile, NumPy, Scala, Git, NoSQL, Data Modeling, Data Pipelines, Data Engineering, CSV, Machine Learning, SQL, Apache Spark, Data Science, Serverless, Neural Networks, LSTM, Probability Theory, Cloud Dataflow, Data Warehousing, Data Warehouse Design, Data Visualization, Cloud Storage, Infrastructure, Data Analysis, Data Reporting, APIs, Microsoft Excel, Data Integration, Amazon S3 (AWS S3), Data Aggregation, Lambda Functions, Pipelines, Beautiful Soup, Artificial Intelligence (AI), Deep Learning, OpenAI Gym, Amazon Athena
  • Solutions Engineer

    2013 - 2014
    Cadence Design Systems
    • Developed internal productivity/process web applications for one of the two leading electronic design automation companies.
    • Improved my ability to work effectively in teams.
    • Developed communication skills.
    • Evaluated and continuously ranked priorities based on the business value.
    Technologies: Microsoft 365, Linux, Oracle, Perforce, MySQL, PHP, JavaScript, Data Modeling, CSV, SQL, Infrastructure, APIs, Microsoft Excel, Data Integration, Data Aggregation, Selenium
  • Associate Tutor

    2008 - 2012
    University of East Anglia
    • Communicated successfully difficult concepts to a range of students.
    • Marked coursework.
    Technologies: Blackboard, Pen & Paper, Cloud Storage

Experience

  • Web-based Server Monitor and Admin Tools for Medal of Honor

    This was written in PHP and has lots of socket programming, sessions, and user authentication. The client tool was built using C# and .NET.

Skills

  • Languages

    Python, SQL, Python 3, JavaScript, PHP, Haskell, Scala
  • Libraries/APIs

    LSTM, PyTorch, TensorFlow, Fast.ai, Spark ML, FFmpeg, Keras, PySpark, Scikit-learn, NLTK, ZeroMQ, Pandas, NumPy, OpenCV, Requests, Beautiful Soup
  • Tools

    BigQuery, Amazon Elastic MapReduce (EMR), Spark SQL, Apache Airflow, Microsoft Excel, AWS Athena, Jupyter, PyCharm, Git, Perforce, Gensim, Doccano, RabbitMQ, Google Compute Engine (GCE), Terraform, Cloud Dataflow, Google Cloud Composer, GitHub, OpenAI Gym, Amazon Athena
  • Platforms

    Linux, Google Cloud Platform (GCP), Amazon Web Services (AWS), Heroku, AWS Kinesis, AWS Lambda, Oracle, Blackboard, Arduino, Anaconda, Azure, Docker
  • Storage

    Amazon S3 (AWS S3), PostgreSQL, NoSQL, Data Pipelines, Data Integration, Redshift, MySQL, MongoDB
  • Other

    EMR, Convolutional Neural Networks, Linear Algebra, Google BigQuery, Neural Networks, Deep Learning, Artificial Intelligence (AI), Machine Learning, Data Engineering, Deep Neural Networks, CSV, Cloud Storage, Data Analysis, APIs, Data Aggregation, Pipelines, AWS, Natural Language Processing (NLP), Probability Theory, Stream Processing, IP Networks, Image Recognition, Statistics, Deep Reinforcement Learning, Computer Vision, Audio, Audio Processing, Digital Signal Processing, Data Modeling, Data Warehousing, Data Warehouse Design, Data Visualization, Data Building Tool (DBT), Infrastructure, Data Reporting, Serverless, Big Data, AWS API Gateway, Reinforcement Learning, Microsoft 365, Pen & Paper, Generative Adversarial Networks (GANs), PyTorch3D, Google Data Studio, Lambda Functions
  • Frameworks

    Apache Spark, AWS EMR, Spark, Flask, Django, Ruby on Rails (RoR), Selenium
  • Paradigms

    Functional Programming, Object-oriented Programming (OOP), ETL, Data Science, Business Intelligence (BI), Serverless Architecture, Agile

Education

  • B2 CEFR in Greek Language and Culture
    2014 - 2015
    University of Ioannina - Ioannina, Greece
  • PhD in Mathematics
    2008 - 2012
    University of East Anglia - Norwich, UK
  • Master's Degree in Mathematics
    2004 - 2008
    University of East Anglia - Norwich, UK

To view more profiles

Join Toptal
Share it with others