
David Subiros
Verified Expert in Engineering
Back-end Developer
Bristol, United Kingdom
Toptal member since November 22, 2022
David is a senior back-end developer with experience working on scalable, robust projects in organizations like HP and Dyson. He is passionate about problem-solving and writing scalable and efficient code. David enjoys collaborating with other technical people on delivering professional end-to-end solutions and has experience with graphs, data lakes, and ETL technologies.
Portfolio
Experience
- Distributed Computing - 10 years
- Concurrent Programming - 10 years
- Amazon Web Services (AWS) - 7 years
- Go - 6 years
- Terraform - 5 years
- Algorithms - 5 years
- GraphDB - 3 years
- Data Engineering - 3 years
Availability
Preferred Environment
Visual Studio Code (VS Code), Slack, Amazon Web Services (AWS), MacOS, Ubuntu, Trello
The most amazing...
...project I've worked on is The Machine, a disruptive new computing architecture based on memory pools and SoCs.
Work Experience
Lead Data Engineer
Digital Wholesale Solutions
- Analyzed different technologies for the company's new data lake project.
- Built landing layer pipelines, generic terraform modules, and Python code to trigger a crawler and subsequent extract jobs for required tables.
- Created a concurrent and modular scraper in Go that contacts Companies House API. I incorporated it into a directed acyclic graph triggered by Amazon EventBridge and based on the company ID in corresponding landing layer columns.
Senior Data Engineer
KidsLoop
- Collaborated with data scientists and analysts on implementing data pipelines to provide an adaptive review system that recommends exercises to students based on their performance.
- Conducted AWS infrastructure provisioning using Terraform, which included working on batch jobs, an API gateway, EventBridge, and glue software.
- Utilized Python and PySpark to create Spark jobs and deployed them using GitHub actions with Databricks exposure.
Back-end Software Contractor
Office for National Statistics
- Improved an Apache Kafka client library with a new concurrency model, abstracted the handlers, and implemented a state machine that starts or stops consuming depending on the service health status.
- Reimplemented Amazon Neptune database queries to generate subgraph copies for hierarchical data structures, supporting 100,000 new nodes and edges.
- Collaborated with a front-end developer on assessing and significantly improving the performance of end-to-end user journeys from a big O notation to O(logN) by adding better REST API endpoints and database queries.
- Implemented a key auditing feature based on Apache Kafka messages and ETag and If-Match headers based on data hashes, updating them atomically in MongoDB.
- Added pagination for REST API endpoints with an offset and count query parameters strategy.
Back-end Software Contractor
Veea
- Implemented a secure data transfer to IoT devices using WAMP and new REST API endpoints based on requirements.
- Automated the AWS infrastructure provisioning and lifecycle using Terraform and Drone CI.
- Modularized Terraform descriptions for consistency so that all environments shared common Git repositories.
- Deployed and created microservices using Docker and orchestrated them using Docker Swarm.
- Created a backup and recovery system for PostgreSQL, with a 10-minute recovery point objective and a 5-minute recovery time objective.
- Utilized Jira, Bitbucket, and GitHub, with a feature-based Git branching strategy while working in an Agile environment.
- Introduced testing to the development cycle and collaborated on creating an SSHLibrary over a WAMP library.
Back-end Software Contractor
Computing Distribution Group
- Re-architected provisioning and deployment of AWS environments, enabling automatic triggering of Terraform, Docker, and Nomad from Jenkins.
- Generalized new environment provisioning using the Jenkins shared libraries.
- Implemented an automated lifecycle for AWS environments that modifies autoscaling groups and Amazon RDS clusters in testing environments, significantly reducing costs.
Data Engineer
Dyson
- Automated a distributed system managed using Apache Airflow.
- Followed the IoT messages protocol received by air purifiers and robot devices and worked on data models.
- Conducted unit and integration testing using Docker containers and Docker Swarm and mocked all in-use AWS services.
Senior Software Engineer and Researcher
Hewlett Packard Enterprise (HPE)
- Contributed to The Machine, HPE Labs' main research project, by working in Loom, The Machine's manageability tool, which provided visualization and easy interaction with the hardware.
- Served as a technical lead of The Machine's monitoring service, which collects telemetry metrics, saves them in InfluxDB and SQLite, and enables users to query them from a REST API. The service's scalability is enabled using a queue and aggregation.
- Assisted with the automatic rule creation from anomalies detected in HPE proprietary data sets. We tried several anomaly detection algorithms and used a decision tree to create easy rules that can be validated and reported by an expert.
- Built a semantic rule management system for Ruler that quantifies partial redundancies and detects redundancies and similarities between rules even if they don't overlap.
- Collaborated on MoNanas, an open source ML orchestration framework, leveraging the ML solution and sharing capabilities while reducing the time-to-insight by around 80%.
- Sped up graph queries on graph databases using Grapher, making them 100 times faster than the leading native graph databases.
Software Engineer and Researcher
HP
- Improved the system call interception for a data leak prevention project while working in the printing and content delivery lab.
- Made the system more robust by implementing a central policy service and remote policy enforcement agents.
- Classified data using a support vector machine and an HP-proprietary feature extraction algorithm implemented by a PhD student working for the company.
Cloud Software Engineer
HP
- Developed a cloud computing service core based on OpenStack.
- Utilized OpenStack's Nova for computing and scheduling modules and improved the concurrency model.
- Productized cloud technologies researched by HP Labs, replacing them with OpenStack.
Experience
Apache Kafka Digital Publishing
https://github.com/ONSdigital/dp-kafkadp-kafka
https://github.com/ONSdigital/dp-kafkadp-graph
https://github.com/ONSdigital/dp-graphThis library is used in several ONS digital publishing microservices to provide a hierarchical structure to data sets and allow queries once they are imported.
dp-cantabular-csv-exporter
https://github.com/ONSdigital/dp-cantabular-csv-exporterEducation
Master's Degree in Telecommunications
Universitat Politècnica de Catalunya (UPC) - Barcelona, Spain
Certifications
Neural Networks for Machine Learning
University of Toronto | via Coursera
Computational Investment
Georgia Institute of Technology
Mining Massive Datasets
Stanford University | via Coursera
Competitive Strategy
LMU München | via Coursera
Algorithms: Design and Analysis, Part 1
Stanford University | via Coursera
Algorithms: Design and Analysis, Part 2
Stanford University | via Coursera
Skills
Libraries/APIs
API Development, REST APIs
Tools
Terraform, Git, AWS SDK, Jira, Slack, Trello, Vault, Apache Airflow, AWS Glue, Bitbucket
Languages
Go, Python, Python 3, SQL, C, Java, C++
Paradigms
Concurrent Programming, REST, Event-driven Architecture, Microservices Architecture, Distributed Computing, ETL, Declarative Programming
Storage
Databases, Amazon DynamoDB, PostgreSQL, Amazon S3 (AWS S3), Data Pipelines
Platforms
Amazon Web Services (AWS), Apache Kafka, Docker, Linux, Visual Studio Code (VS Code), MacOS, Ubuntu, OpenStack
Frameworks
Spark
Other
Back-end, APIs, API Integration, Cloud, RESTFul APIs, Data Engineering, Algorithms, GraphDB, Software Architecture, Amazon API Gateway, Data Modeling, Architecture, Startups, Technical Leadership, Distributed Systems, Scalability, Cloud Architecture, Solution Architecture, Technical Architecture, Data Warehousing, Data Scraping, Identity & Access Management (IAM), Signal Filtering, Mathematics, Physics, Signal Theory, Electronics, Internet of Things (IoT), Machine Learning, Data Science, Windows System Calls, Neural Networks, Data Mining, Investing, Strategy, Game Theory, Networks, Concurrency, Amazon Neptune, Streaming
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring