David Smith, Developer in Denver, CO, United States
David is available for hire
Hire David

David Smith

Verified Expert  in Engineering

Data Engineer and Developer

Location
Denver, CO, United States
Toptal Member Since
April 1, 2019

David is a developer specializing in big data, back-end services, and full-stack SaaS products, working in various engineering, product, architecture, and management roles. He’s a hands-on coder regardless of the position who enjoys building products and delivering value to others. David always seeks to understand and build what the client needs to take them to the next step, whether it is a large scalable architecture or a quick MVP to test.

Portfolio

Gentle Valley Digital
Docker, Google Cloud, Flask, Python, React, Architecture, Data Engineering...
Reclaim
Python 3, Flask, Amazon Web Services (AWS), Amazon SageMaker...
AllCloud
AWS IoT, Amazon Timestream, Terraform, Redshift, Data Architecture...

Experience

Availability

Part-time

Preferred Environment

MacOS, Linux

The most amazing...

...product I've built is a robust and scalable big data platform for healthcare to analyze terabytes of sensor data in clinical studies.

Work Experience

Founder

2019 - PRESENT
Gentle Valley Digital
  • Launched an MVP for artistic work copyright threat detection SaaS on Google Cloud, running a Flask API back end and a React front end on GKE (Kubernetes) with Docker images, utilizing Cloud SQL, Cloud Vision, and Cloud Storage.
  • Integrated with Stripe for monthly subscription payments for a paying customer.
  • Worked with the founder of a successful textile design company to establish business-level KPIs, built data pipelines to process and store sales and marketing data, and used analytics tools and machine learning to measure and forecast sales.
Technologies: Docker, Google Cloud, Flask, Python, React, Architecture, Data Engineering, APIs, Back-end, Databases, Pandas, CTO, OpenAI, Large Language Models (LLMs), Leadership, Software Architecture, SEO Tools, Machine Learning, CI/CD Pipelines, REST APIs

Chief Technology Officer

2021 - 2023
Reclaim
  • Improved the product back-end scalability by replacing the initial FaaS MVP with a more stable and distributed architecture using Celery, SQS, Docker, ECS, and auto-scaling to meet an expected 100x increase in API traffic.
  • Introduced proper product monitoring and logging by creating Cloudwatch Dashboards, custom metrics, and alarms to give the development team visibility into operations and alert them to adverse conditions.
  • Managed the entire engineering organization and mentored the product team, providing them with growth opportunities involving more product ownership and improving design, testing, and considerations around architecture, security, and scale.
  • Led the effort to become SOCII and HIPAA compliant and provide pre-sales support in closing new customers.
Technologies: Python 3, Flask, Amazon Web Services (AWS), Amazon SageMaker, Large Scale Distributed Systems, PostgreSQL, Amazon CloudWatch, Docker, Cloud Architecture, Software Architecture, Celery, Amazon Elastic Container Service (Amazon ECS), Python, Requirements Analysis, Technical Requirements, Data Engineering, APIs, Back-end, Databases, Pandas, CTO, Unit Testing, Pytest, pylint, SaaS, Leadership, Architecture, Microservices, Full-stack, Machine Learning, JavaScript, Full-stack Development, CI/CD Pipelines, REST APIs, Event-driven Architecture

Data Architect

2021 - 2021
AllCloud
  • Prototyped a multi-tenant near real-time IoT data pipeline for a manufacturing test suite company to remotely control and monitor sensors on the factory floor while being alerted to failures.
  • Delivered AWS architecture recommendations for an InfoSec SaaS company with high-security requirements and flexibility around tenant isolation and on-premise needs of certain tenants.
  • Outlined the technical strategy for the company to grow its US AWS Data and Machine Learning business, focusing on the manufacturing sector.
Technologies: AWS IoT, Amazon Timestream, Terraform, Redshift, Data Architecture, AWS Cloud Architecture, Python, Technical Requirements, APIs, Back-end, Databases, Unit Testing, SaaS, Architecture, Software Architecture, Machine Learning, REST APIs, Amazon Kinesis

Volunteer Data Engineer for the State of Colorado (Single Project)

2020 - 2020
Citizen Software Engineers
  • Delivered an automated data pipeline using AWS EMR and PySpark to process and load large amounts of raw Xmode location data for every Colorado resident daily into AWS Athena for analysis in Domo for social distancing reporting.
  • Stabilized the codebase by upgrading it to Python 3, improved the code design, and incorporated unit testing on top of the initial prototype.
  • Properly architected the project's AWS environment using Terraform and best architecture and security practices.
Technologies: Terraform, ETL, PySpark, Amazon Elastic MapReduce (EMR), Domo, SQL, Python 3, Amazon Athena, Python, Unit Testing, Pytest, pylint, Event-driven Architecture

Data, Cloud, and Software Architect Consultant

2019 - 2020
Reclaim
  • Architected and built efficient, idempotent data pipelines for processing health insurance eligibility and claims files, as well as scalable and cost-efficient serverless back-end services for a consumer-facing mobile app in AWS.
  • Built a real-time similar person prediction REST API service running in ECS using a Scikit-learn nearest neighbor machine learning model from collaborating with a data scientist in Jupyter notebooks.
  • Transformed the development team culture to focus on quality and speed by establishing test automation, code linting, pull request reviews, and test coverage into practice.
Technologies: SQL, Microservices Architecture, Jupyter Notebook, Machine Learning, Scikit-learn, ETL, Cloud Architecture, Data Architecture, API Architecture, Amazon CloudWatch, Terraform, Amazon Cognito, Amazon API Gateway, Amazon Kinesis, Docker, Amazon Elastic Container Service (Amazon ECS), Flask-RESTful, AWS Lambda, Zappa, Python 3, Python, Technical Requirements, Data Engineering, APIs, Back-end, Databases, Flask, Pandas, Unit Testing, Pytest, pylint, SaaS, Leadership, Architecture, Software Architecture, Microservices, Full-stack, JavaScript, Full-stack Development, CI/CD Pipelines, REST APIs, Event-driven Architecture

Director of Technology

2017 - 2019
Evidation Health
  • Designed and implemented the data platform, a distributed and scalable system that runs Python ETL scripts to continuously and reliably process hundreds of gigabytes of raw data from third-party sensors, surveys, media, and studies into a data lake.
  • Created a method for idempotently merging large, partitioned data sets into a data lake using Amazon EMR for processing into an S3 back end, allowing schema changes and backfilling to occur without system downtime.
  • Performed major structural changes to the product architecture and release process to isolate customer environment-specific code from core services with minimal downtime.
  • Served as the team’s product manager and quality assurance engineer as the team succeeded in delivering our MVP with the most complex digital biomarker study protocol ever designed as its first use case.
  • Built additional web services on top of our platform for monitoring, delivering data, and quarantining problematic data for root cause analysis and repair.
  • Hired, managed, and mentored two engineering teams.
Technologies: Kibana, Elasticsearch, SQL, Amazon Web Services (AWS), Graphite, PostgreSQL, Hadoop, Amazon EC2, Amazon Virtual Private Cloud (VPC), AWS Lambda, Amazon Elastic MapReduce (EMR), Amazon S3 (AWS S3), JavaScript, React, Apache Airflow, Pandas, Jupyter, Flask, Python, Spark, Kubernetes, Requirements Analysis, Technical Requirements, Data Engineering, APIs, Back-end, Databases, Unit Testing, Databricks, Pytest, pylint, SaaS, Leadership, Architecture, Software Architecture, Full-stack, Machine Learning, Full-stack Development, CI/CD Pipelines, REST APIs, Amazon Kinesis, Event-driven Architecture

Tech Lead

2015 - 2017
AppFolio
  • Acted as the product owner and technical lead for the value-added service that integrates background check web services from partner systems to provide background check reports to our customers, making up 21% of annual revenue.
  • Prototyped our first in-house solution for a background search service, including a data comparison analysis and evaluating big data tools to perform thousands of queries over a billion records with multiple indexes for instant results.
  • Led the development of the property manager product's HOA support initiative which involved several refactors to import features such as recurring invoice scheduling and reporting.
  • Analyzed customer segments and rapidly prototyped an MVP to perform market validation for a new business offering that involved automatic scanned bill scraping, saving $1.5 million due to high investment, low-scan quality, and low adoption.
  • Directly managed and mentored four software engineers.
Technologies: SQL, CSS, Redis, MySQL, React, jQuery, JavaScript, Ruby on Rails (RoR), Technical Requirements, APIs, Back-end, Databases, Unit Testing, SaaS, Leadership, Full-stack, Full-stack Development, CI/CD Pipelines, REST APIs

Director of Product and Engineering

2014 - 2015
cielo24
  • Led the development of initiatives such as building our own task management system to delegate inbound media transcription jobs from customer systems to external partner APIs and workforce clouds at a large scale with strict SLA requirements.
  • Gathered stakeholder requirements and provided hands-on technical design and level-of-effort estimates for complex software features, allowing solution discovery to iterate rapidly without extensive redesign by engineers.
  • Defined product roadmaps, served as the scrum master, and managed weekly releases in Jira.
  • Reviewed pull requests to ensure that the requirements of the feature were met, that coding best practices were applied, and that test cases were covered.
  • Managed onsite software engineers, offsite-contractors, and partnerships.
  • Analyzed historical Amazon Mechanical Turk worker quality scores to recommend a new scoring structure to promote alignment across different phases of the media transcription and proofreading process using linear regression.
Technologies: Amazon Mechanical Turk, CSS, jQuery, JavaScript, PostgreSQL, Celery, Django, Python, Requirements Analysis, Technical Requirements, APIs, Back-end, Databases, Unit Testing, SaaS, Leadership, Full-stack, Full-stack Development, CI/CD Pipelines, REST APIs

Product Manager

2012 - 2014
Maker Studios (acquired by Disney)
  • Built the company's first automated data retrieval service to pull daily analytics from social media for over 60,000 channels into Amazon Redshift to identify better campaign targets for affiliate and brand advertisers.
  • Served as the team's database lead by designing new tables, optimizing query performance, and advising other team members with design and troubleshooting.
  • Designed and automated the customer onboarding workflow to record and sign required tax forms using web form validation, which reduced time from two weeks to just minutes and allowed the company to drastically scale converted customers.
  • Led product roadmaps for many high-profile products and authored several major patents.
Technologies: Google APIs, Salesforce, Redshift, Redis, MySQL, Node.js, Requirements Analysis, Technical Requirements, APIs, Databases, SaaS, Full-stack, JavaScript, Full-stack Development, REST APIs

Programmer | Analyst III

2010 - 2012
USC Information Sciences Institute
  • Created a Django-based content management system for the National Institute of Health's Non-Human Primary Research Centers to allow their pathologists to classify specimens and annotate progressively-rendered large virtual microscopy images.
  • Built a REST web service using Java Spring and used it for researching data transfer performance in a computing cluster.
  • Co-authored three academic papers that were published and used to attract future funding for projects.
Technologies: Hadoop, MySQL, jQuery, JavaScript, Spring, Java 6, Django, Python, Technical Requirements, APIs, Back-end, Databases, Full-stack, Full-stack Development, REST APIs

Senior Software Engineer

2005 - 2009
Computer Associates
  • Implemented the product integrations between the Spectrum Network Management product with several other computer associates' products such as single sign-on, service desk, and CMDB, as well as SAP BusinessObjects reports.
  • Acted as the key engineer on the Spectrum Network Reporting product team.
  • Aggregated data in MySQL from a distributed network of SNMP devices and built a Java-based web application on top of it for better visibility into overall network trends.
Technologies: JavaScript, Java, APIs, Back-end, Databases, Unit Testing, Full-stack, Full-stack Development, REST APIs

Evidation Data Platform

I built this big data platform for data scientists using sensors, media, and other metadata to gain insights into health outcomes during clinical trials and retrospective data analysis. This platform allowed users to create cohorts from a large data lake and process the raw data through feature computers to enable them to analyze quickly and easily without managing the challenges of large data sets and infrastructure.

Pysphinx-autoindex

https://github.com/suburbanmtman/pysphinx-autoindex
I developed an open-source utility to inspect Python modules and classes to autogenerate an in-depth index of code for documentation in Sphinx.

Hands-on Airflow Introduction

https://github.com/suburbanmtman/airflow-intro
I created a repo and a Medium article to help others new to Airflow to quickly launch a production-like environment using Docker. The article was featured in The Startup publication available at https://medium.com/swlh/write-code-in-airflow-within-minutes-e248d00c2b75.

Crystal W Design

I built a website showcasing designs for one of the top-selling artists on print-on-demand sites such as Spoonflower and Society6. It is a Django-driven site with an admin page and template customizations running on AWS Beanstalk and integrations with Pinterest and Mailchimp.

Languages

SQL, Python, Python 3, JavaScript, HTML, Java 6, CSS, Bash, Java, Scala, ECMAScript (ES6)

Frameworks

Flask, Ruby on Rails (RoR), Ruby on Rails 4, Django, Spark, Hadoop, Spring

Libraries/APIs

Flask-RESTful, REST APIs, Zappa, Pandas, YouTube API, jQuery, PySpark, Google APIs, Scikit-learn, NumPy, Spark ML, React, Node.js

Tools

Celery, Git, Apache Airflow, Pytest, pylint, Terraform, NGINX, Docker Compose, Jupyter, Amazon Athena, Amazon Elastic Container Service (Amazon ECS), Amazon Cognito, Amazon CloudWatch, Domo, Kibana, AWS Glue, Amazon QuickSight, AWS Step Functions, Amazon SageMaker, AWS Key Management Service (KMS), AWS CloudTrail, AWS Systems Manager, Amazon Elastic MapReduce (EMR), Amazon EBS, Google Analytics, RabbitMQ, Jenkins, TeamCity, CircleCI, SaltStack, Tableau, Superset, Amazon Simple Queue Service (SQS), Amazon Virtual Private Cloud (VPC), Grafana

Paradigms

Unit Testing, Lambda Architecture, ETL, Agile, Kanban, Requirements Analysis, Microservices, Event-driven Architecture, Data Science, API Architecture, Microservices Architecture, Management

Platforms

Amazon Web Services (AWS), Linux, New Relic, Jupyter Notebook, AWS Lambda, Docker, Databricks, MacOS, AWS IoT, Kubernetes, Salesforce, Amazon EC2, Heroku

Storage

Databases, MySQL, PostgreSQL, Redshift, Amazon S3 (AWS S3), Redis, Google Cloud, Elasticsearch, Amazon DynamoDB, Amazon EFS, Apache Hive

Other

Data Engineering, Architecture, Software Architecture, APIs, Back-end, SaaS, Leadership, Full-stack, Full-stack Development, CI/CD Pipelines, Machine Learning, Amazon Kinesis, Graphite, Technical Requirements, CTO, Amazon Mechanical Turk, Amazon API Gateway, Data Architecture, Cloud Architecture, Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNN), Modeling, Amazon Route 53, AWS Database Migration Service (DMS), AWS Server Migration Service (SMS), Large Scale Distributed Systems, Amazon RDS, Data Analytics, Strategy, Product Management, Computer Science, Amazon Timestream, AWS Cloud Architecture, OpenAI, Large Language Models (LLMs), SEO Tools

2011 - 2014

Master of Business Administration (MBA) Degree in Business Administration

University of Southern California - Los Angeles, CA, USA

2000 - 2005

Bachelor's Degree in Computer Science

University of New Hampshire - Durham, NH, USA

APRIL 2021 - APRIL 2024

AWS Certified Database Specialty

AWS

JANUARY 2021 - JANUARY 2024

AWS Solutions Architect Professional

Amazon Web Services

JANUARY 2021 - JANUARY 2024

AWS Certified Solutions Architect Professional

AWS

DECEMBER 2020 - DECEMBER 2023

AWS Certified Security Specialty

Amazon Web Services

DECEMBER 2020 - DECEMBER 2023

AWS Certified Machine Learning Specialty

Amazon Web Services

NOVEMBER 2020 - NOVEMBER 2023

AWS Certified Data Analytics Specialty

AWS

DECEMBER 2019 - DECEMBER 2021

Google Cloud Associate Cloud Engineer

Google Cloud

SEPTEMBER 2019 - PRESENT

Databricks Certified Associate Developer for Apache Spark 2.4

Databricks

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring