David Subiros, Developer in Bristol, United Kingdom
David is available for hire
Hire David

David Subiros

Verified Expert  in Engineering

Bio

David is a senior back-end developer with experience working on scalable, robust projects in organizations like HP and Dyson. He is passionate about problem-solving and writing scalable and efficient code. David enjoys collaborating with other technical people on delivering professional end-to-end solutions and has experience with graphs, data lakes, and ETL technologies.

Portfolio

Digital Wholesale Solutions
Amazon Web Services (AWS), Spark, AWS Glue, Python 3, Go, Software Architecture...
KidsLoop
Spark, Terraform, Amazon Web Services (AWS), Data Engineering, Python 3...
Office for National Statistics
Go, Vault, Terraform, Amazon Web Services (AWS), Apache Kafka, GraphDB...

Experience

  • Distributed Computing - 10 years
  • Concurrent Programming - 10 years
  • Amazon Web Services (AWS) - 7 years
  • Go - 6 years
  • Terraform - 5 years
  • Algorithms - 5 years
  • GraphDB - 3 years
  • Data Engineering - 3 years

Availability

Full-time

Preferred Environment

Visual Studio Code (VS Code), Slack, Amazon Web Services (AWS), MacOS, Ubuntu, Trello

The most amazing...

...project I've worked on is The Machine, a disruptive new computing architecture based on memory pools and SoCs.

Work Experience

Lead Data Engineer

2022 - 2022
Digital Wholesale Solutions
  • Analyzed different technologies for the company's new data lake project.
  • Built landing layer pipelines, generic terraform modules, and Python code to trigger a crawler and subsequent extract jobs for required tables.
  • Created a concurrent and modular scraper in Go that contacts Companies House API. I incorporated it into a directed acyclic graph triggered by Amazon EventBridge and based on the company ID in corresponding landing layer columns.
Technologies: Amazon Web Services (AWS), Spark, AWS Glue, Python 3, Go, Software Architecture, Amazon S3 (AWS S3), Data Modeling, Architecture, ETL, Git, Distributed Systems, SQL, Scalability, APIs, REST APIs, API Integration, Cloud Architecture, Solution Architecture, AWS SDK, Docker, Linux, Technical Architecture, Python, REST, Databases, Data Pipelines, Data Warehousing, Cloud, Data Scraping, Event-driven Architecture, Amazon DynamoDB, Microservices Architecture, RESTFul APIs, Identity & Access Management (IAM)

Senior Data Engineer

2022 - 2022
KidsLoop
  • Collaborated with data scientists and analysts on implementing data pipelines to provide an adaptive review system that recommends exercises to students based on their performance.
  • Conducted AWS infrastructure provisioning using Terraform, which included working on batch jobs, an API gateway, EventBridge, and glue software.
  • Utilized Python and PySpark to create Spark jobs and deployed them using GitHub actions with Databricks exposure.
Technologies: Spark, Terraform, Amazon Web Services (AWS), Data Engineering, Python 3, Back-end, Software Architecture, API Development, Amazon API Gateway, Amazon S3 (AWS S3), Data Modeling, Architecture, ETL, Git, Startups, Technical Leadership, Distributed Systems, SQL, Scalability, APIs, REST APIs, API Integration, AWS SDK, Docker, Linux, Bitbucket, Technical Architecture, Python, REST, Databases, Data Pipelines, Data Warehousing, Cloud, Event-driven Architecture, Amazon DynamoDB, Microservices Architecture, RESTFul APIs, Identity & Access Management (IAM)

Back-end Software Contractor

2019 - 2022
Office for National Statistics
  • Improved an Apache Kafka client library with a new concurrency model, abstracted the handlers, and implemented a state machine that starts or stops consuming depending on the service health status.
  • Reimplemented Amazon Neptune database queries to generate subgraph copies for hierarchical data structures, supporting 100,000 new nodes and edges.
  • Collaborated with a front-end developer on assessing and significantly improving the performance of end-to-end user journeys from a big O notation to O(logN) by adding better REST API endpoints and database queries.
  • Implemented a key auditing feature based on Apache Kafka messages and ETag and If-Match headers based on data hashes, updating them atomically in MongoDB.
  • Added pagination for REST API endpoints with an offset and count query parameters strategy.
Technologies: Go, Vault, Terraform, Amazon Web Services (AWS), Apache Kafka, GraphDB, Back-end, Software Architecture, API Development, Amazon S3 (AWS S3), Data Modeling, Architecture, Git, Technical Leadership, Distributed Systems, Scalability, APIs, REST APIs, API Integration, Cloud Architecture, AWS SDK, Docker, Linux, Technical Architecture, REST, Databases, Data Pipelines, Cloud, Event-driven Architecture, Microservices Architecture, RESTFul APIs

Back-end Software Contractor

2019 - 2019
Veea
  • Implemented a secure data transfer to IoT devices using WAMP and new REST API endpoints based on requirements.
  • Automated the AWS infrastructure provisioning and lifecycle using Terraform and Drone CI.
  • Modularized Terraform descriptions for consistency so that all environments shared common Git repositories.
  • Deployed and created microservices using Docker and orchestrated them using Docker Swarm.
  • Created a backup and recovery system for PostgreSQL, with a 10-minute recovery point objective and a 5-minute recovery time objective.
  • Utilized Jira, Bitbucket, and GitHub, with a feature-based Git branching strategy while working in an Agile environment.
  • Introduced testing to the development cycle and collaborated on creating an SSHLibrary over a WAMP library.
Technologies: Go, PostgreSQL, Amazon Web Services (AWS), Back-end, Software Architecture, API Development, Amazon S3 (AWS S3), Data Modeling, Architecture, Git, Distributed Systems, SQL, Scalability, APIs, REST APIs, API Integration, Cloud Architecture, Solution Architecture, AWS SDK, Docker, Linux, Jira, Technical Architecture, REST, Databases, Cloud, Microservices Architecture, RESTFul APIs

Back-end Software Contractor

2018 - 2018
Computing Distribution Group
  • Re-architected provisioning and deployment of AWS environments, enabling automatic triggering of Terraform, Docker, and Nomad from Jenkins.
  • Generalized new environment provisioning using the Jenkins shared libraries.
  • Implemented an automated lifecycle for AWS environments that modifies autoscaling groups and Amazon RDS clusters in testing environments, significantly reducing costs.
Technologies: Terraform, Java, Amazon Web Services (AWS), Back-end, Software Architecture, API Development, Amazon S3 (AWS S3), Data Modeling, Architecture, Git, Distributed Systems, Scalability, APIs, REST APIs, API Integration, Cloud Architecture, AWS SDK, Linux, Jira, REST, Databases, Cloud, Microservices Architecture, RESTFul APIs, Identity & Access Management (IAM)

Data Engineer

2017 - 2018
Dyson
  • Automated a distributed system managed using Apache Airflow.
  • Followed the IoT messages protocol received by air purifiers and robot devices and worked on data models.
  • Conducted unit and integration testing using Docker containers and Docker Swarm and mocked all in-use AWS services.
Technologies: Spark, Python, Java, Apache Airflow, Amazon S3 (AWS S3), Data Modeling, Architecture, ETL, Git, Distributed Systems, Scalability, APIs, REST APIs, API Integration, Cloud Architecture, Solution Architecture, AWS SDK, Docker, Linux, Jira, Technical Architecture, REST, Databases, Data Pipelines, Data Warehousing, Cloud, Event-driven Architecture, Microservices Architecture, RESTFul APIs

Senior Software Engineer and Researcher

2015 - 2017
Hewlett Packard Enterprise (HPE)
  • Contributed to The Machine, HPE Labs' main research project, by working in Loom, The Machine's manageability tool, which provided visualization and easy interaction with the hardware.
  • Served as a technical lead of The Machine's monitoring service, which collects telemetry metrics, saves them in InfluxDB and SQLite, and enables users to query them from a REST API. The service's scalability is enabled using a queue and aggregation.
  • Assisted with the automatic rule creation from anomalies detected in HPE proprietary data sets. We tried several anomaly detection algorithms and used a decision tree to create easy rules that can be validated and reported by an expert.
  • Built a semantic rule management system for Ruler that quantifies partial redundancies and detects redundancies and similarities between rules even if they don't overlap.
  • Collaborated on MoNanas, an open source ML orchestration framework, leveraging the ML solution and sharing capabilities while reducing the time-to-insight by around 80%.
  • Sped up graph queries on graph databases using Grapher, making them 100 times faster than the leading native graph databases.
Technologies: Java, Distributed Computing, Internet of Things (IoT), Declarative Programming, Machine Learning, Data Science, Back-end, Data Modeling, Architecture, Git, Distributed Systems, Scalability, APIs, REST APIs, Solution Architecture, Docker, Linux, Jira, Technical Architecture, REST, Databases, Data Warehousing, Cloud, Event-driven Architecture, Microservices Architecture, RESTFul APIs

Software Engineer and Researcher

2012 - 2015
HP
  • Improved the system call interception for a data leak prevention project while working in the printing and content delivery lab.
  • Made the system more robust by implementing a central policy service and remote policy enforcement agents.
  • Classified data using a support vector machine and an HP-proprietary feature extraction algorithm implemented by a PhD student working for the company.
Technologies: Java, Windows System Calls, C++, Python, Back-end, Software Architecture, Data Modeling, Architecture, Git, Distributed Systems, Scalability, APIs, REST APIs, Solution Architecture, REST, Databases, Event-driven Architecture, RESTFul APIs

Cloud Software Engineer

2010 - 2012
HP
  • Developed a cloud computing service core based on OpenStack.
  • Utilized OpenStack's Nova for computing and scheduling modules and improved the concurrency model.
  • Productized cloud technologies researched by HP Labs, replacing them with OpenStack.
Technologies: Python, OpenStack, Back-end, API Development, Data Modeling, Git, Distributed Systems, Scalability, APIs, REST APIs, API Integration, Cloud Architecture, Solution Architecture, REST, Databases, Cloud, Event-driven Architecture, Microservices Architecture, RESTFul APIs

Experience

Apache Kafka Digital Publishing

https://github.com/ONSdigital/dp-kafka
Improved a Sarama-based Apache Kafka client library that provides more control over concurrent consumptions and flows. I created a state machine that starts and stops consuming and an interface that can do the same for specific events. This solution was used by ONS Digital publishing services to stop consumption if a service is unhealthy and to resume when it becomes healthy again.

dp-kafka

https://github.com/ONSdigital/dp-kafka
This is the client library used by the Office for National Statistics (ONS) in the UK. I have made significant contributions to the library, including improvements to the concurrency models, a new state machine to start/stop consuming, and harnessing this to integrate with the ONS health-check library (i.e., stop consuming when a service becomes unhealthy).

dp-graph

https://github.com/ONSdigital/dp-graph
The GraphDB library in the ONS. I have made major contributions to the algorithm for cloning subgraphs for new datasets and improved queries by limiting the number of starting nodes per query so that each query is very likely to be completed in a reasonable time and parallelizing queries if required.

This library is used in several ONS digital publishing microservices to provide a hierarchical structure to data sets and allow queries once they are imported.

dp-cantabular-csv-exporter

https://github.com/ONSdigital/dp-cantabular-csv-exporter
Created an ONS digital-publishing microservice based on Kafka to process census data into the required CSV format. It involved creating a Kafka consumer, a Kafka producer, health checks, etc., to have a microservice ready for production. The business logic is based on processing some data coming from Cantabular into the required CSV format.

Education

2003 - 2009

Master's Degree in Telecommunications

Universitat Politècnica de Catalunya (UPC) - Barcelona, Spain

Certifications

JANUARY 2017 - PRESENT

Neural Networks for Machine Learning

University of Toronto | via Coursera

DECEMBER 2014 - PRESENT

Computational Investment

Georgia Institute of Technology

DECEMBER 2014 - PRESENT

Mining Massive Datasets

Stanford University | via Coursera

OCTOBER 2013 - PRESENT

Competitive Strategy

LMU München | via Coursera

SEPTEMBER 2013 - PRESENT

Algorithms: Design and Analysis, Part 1

Stanford University | via Coursera

SEPTEMBER 2013 - PRESENT

Algorithms: Design and Analysis, Part 2

Stanford University | via Coursera

Skills

Libraries/APIs

API Development, REST APIs

Tools

Terraform, Git, AWS SDK, Jira, Slack, Trello, Vault, Apache Airflow, AWS Glue, Bitbucket

Languages

Go, Python, Python 3, SQL, C, Java, C++

Paradigms

Concurrent Programming, REST, Event-driven Architecture, Microservices Architecture, Distributed Computing, ETL, Declarative Programming

Storage

Databases, Amazon DynamoDB, PostgreSQL, Amazon S3 (AWS S3), Data Pipelines

Platforms

Amazon Web Services (AWS), Apache Kafka, Docker, Linux, Visual Studio Code (VS Code), MacOS, Ubuntu, OpenStack

Frameworks

Spark

Other

Back-end, APIs, API Integration, Cloud, RESTFul APIs, Data Engineering, Algorithms, GraphDB, Software Architecture, Amazon API Gateway, Data Modeling, Architecture, Startups, Technical Leadership, Distributed Systems, Scalability, Cloud Architecture, Solution Architecture, Technical Architecture, Data Warehousing, Data Scraping, Identity & Access Management (IAM), Signal Filtering, Mathematics, Physics, Signal Theory, Electronics, Internet of Things (IoT), Machine Learning, Data Science, Windows System Calls, Neural Networks, Data Mining, Investing, Strategy, Game Theory, Networks, Concurrency, Amazon Neptune, Streaming

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring