David Smith
Verified Expert in Engineering
Data Engineer and Developer
David is a developer specializing in big data, back-end services, and full-stack SaaS products, working in various engineering, product, architecture, and management roles. He’s a hands-on coder regardless of the position who enjoys building products and delivering value to others. David always seeks to understand and build what the client needs to take them to the next step, whether it is a large scalable architecture or a quick MVP to test.
Portfolio
Experience
Availability
Preferred Environment
MacOS, Linux
The most amazing...
...product I've built is a robust and scalable big data platform for healthcare to analyze terabytes of sensor data in clinical studies.
Work Experience
Founder
Gentle Valley Digital
- Launched an MVP for artistic work copyright threat detection SaaS on Google Cloud, running a Flask API back end and a React front end on GKE (Kubernetes) with Docker images, utilizing Cloud SQL, Cloud Vision, and Cloud Storage.
- Integrated with Stripe for monthly subscription payments for a paying customer.
- Worked with the founder of a successful textile design company to establish business-level KPIs, built data pipelines to process and store sales and marketing data, and used analytics tools and machine learning to measure and forecast sales.
Chief Technology Officer
Reclaim
- Improved the product back-end scalability by replacing the initial FaaS MVP with a more stable and distributed architecture using Celery, SQS, Docker, ECS, and auto-scaling to meet an expected 100x increase in API traffic.
- Introduced proper product monitoring and logging by creating Cloudwatch Dashboards, custom metrics, and alarms to give the development team visibility into operations and alert them to adverse conditions.
- Managed the entire engineering organization and mentored the product team, providing them with growth opportunities involving more product ownership and improving design, testing, and considerations around architecture, security, and scale.
- Led the effort to become SOCII and HIPAA compliant and provide pre-sales support in closing new customers.
Data Architect
AllCloud
- Prototyped a multi-tenant near real-time IoT data pipeline for a manufacturing test suite company to remotely control and monitor sensors on the factory floor while being alerted to failures.
- Delivered AWS architecture recommendations for an InfoSec SaaS company with high-security requirements and flexibility around tenant isolation and on-premise needs of certain tenants.
- Outlined the technical strategy for the company to grow its US AWS Data and Machine Learning business, focusing on the manufacturing sector.
Volunteer Data Engineer for the State of Colorado (Single Project)
Citizen Software Engineers
- Delivered an automated data pipeline using AWS EMR and PySpark to process and load large amounts of raw Xmode location data for every Colorado resident daily into AWS Athena for analysis in Domo for social distancing reporting.
- Stabilized the codebase by upgrading it to Python 3, improved the code design, and incorporated unit testing on top of the initial prototype.
- Properly architected the project's AWS environment using Terraform and best architecture and security practices.
Data, Cloud, and Software Architect Consultant
Reclaim
- Architected and built efficient, idempotent data pipelines for processing health insurance eligibility and claims files, as well as scalable and cost-efficient serverless back-end services for a consumer-facing mobile app in AWS.
- Built a real-time similar person prediction REST API service running in ECS using a Scikit-learn nearest neighbor machine learning model from collaborating with a data scientist in Jupyter notebooks.
- Transformed the development team culture to focus on quality and speed by establishing test automation, code linting, pull request reviews, and test coverage into practice.
Director of Technology
Evidation Health
- Designed and implemented the data platform, a distributed and scalable system that runs Python ETL scripts to continuously and reliably process hundreds of gigabytes of raw data from third-party sensors, surveys, media, and studies into a data lake.
- Created a method for idempotently merging large, partitioned data sets into a data lake using Amazon EMR for processing into an S3 back end, allowing schema changes and backfilling to occur without system downtime.
- Performed major structural changes to the product architecture and release process to isolate customer environment-specific code from core services with minimal downtime.
- Served as the team’s product manager and quality assurance engineer as the team succeeded in delivering our MVP with the most complex digital biomarker study protocol ever designed as its first use case.
- Built additional web services on top of our platform for monitoring, delivering data, and quarantining problematic data for root cause analysis and repair.
- Hired, managed, and mentored two engineering teams.
Tech Lead
AppFolio
- Acted as the product owner and technical lead for the value-added service that integrates background check web services from partner systems to provide background check reports to our customers, making up 21% of annual revenue.
- Prototyped our first in-house solution for a background search service, including a data comparison analysis and evaluating big data tools to perform thousands of queries over a billion records with multiple indexes for instant results.
- Led the development of the property manager product's HOA support initiative which involved several refactors to import features such as recurring invoice scheduling and reporting.
- Analyzed customer segments and rapidly prototyped an MVP to perform market validation for a new business offering that involved automatic scanned bill scraping, saving $1.5 million due to high investment, low-scan quality, and low adoption.
- Directly managed and mentored four software engineers.
Director of Product and Engineering
cielo24
- Led the development of initiatives such as building our own task management system to delegate inbound media transcription jobs from customer systems to external partner APIs and workforce clouds at a large scale with strict SLA requirements.
- Gathered stakeholder requirements and provided hands-on technical design and level-of-effort estimates for complex software features, allowing solution discovery to iterate rapidly without extensive redesign by engineers.
- Defined product roadmaps, served as the scrum master, and managed weekly releases in Jira.
- Reviewed pull requests to ensure that the requirements of the feature were met, that coding best practices were applied, and that test cases were covered.
- Managed onsite software engineers, offsite-contractors, and partnerships.
- Analyzed historical Amazon Mechanical Turk worker quality scores to recommend a new scoring structure to promote alignment across different phases of the media transcription and proofreading process using linear regression.
Product Manager
Maker Studios (acquired by Disney)
- Built the company's first automated data retrieval service to pull daily analytics from social media for over 60,000 channels into Amazon Redshift to identify better campaign targets for affiliate and brand advertisers.
- Served as the team's database lead by designing new tables, optimizing query performance, and advising other team members with design and troubleshooting.
- Designed and automated the customer onboarding workflow to record and sign required tax forms using web form validation, which reduced time from two weeks to just minutes and allowed the company to drastically scale converted customers.
- Led product roadmaps for many high-profile products and authored several major patents.
Programmer | Analyst III
USC Information Sciences Institute
- Created a Django-based content management system for the National Institute of Health's Non-Human Primary Research Centers to allow their pathologists to classify specimens and annotate progressively-rendered large virtual microscopy images.
- Built a REST web service using Java Spring and used it for researching data transfer performance in a computing cluster.
- Co-authored three academic papers that were published and used to attract future funding for projects.
Senior Software Engineer
Computer Associates
- Implemented the product integrations between the Spectrum Network Management product with several other computer associates' products such as single sign-on, service desk, and CMDB, as well as SAP BusinessObjects reports.
- Acted as the key engineer on the Spectrum Network Reporting product team.
- Aggregated data in MySQL from a distributed network of SNMP devices and built a Java-based web application on top of it for better visibility into overall network trends.
Experience
Evidation Data Platform
Pysphinx-autoindex
https://github.com/suburbanmtman/pysphinx-autoindexHands-on Airflow Introduction
https://github.com/suburbanmtman/airflow-introCrystal W Design
Skills
Languages
SQL, Python, Python 3, JavaScript, HTML, Java 6, CSS, Bash, Java, Scala, ECMAScript (ES6)
Frameworks
Flask, Ruby on Rails (RoR), Ruby on Rails 4, Django, Spark, Hadoop, Spring
Libraries/APIs
Flask-RESTful, REST APIs, Zappa, Pandas, YouTube API, jQuery, PySpark, Google APIs, Scikit-learn, NumPy, Spark ML, React, Node.js
Tools
Celery, Git, Apache Airflow, Pytest, pylint, Terraform, NGINX, Docker Compose, Jupyter, Amazon Athena, Amazon Elastic Container Service (Amazon ECS), Amazon Cognito, Amazon CloudWatch, Domo, Kibana, AWS Glue, Amazon QuickSight, AWS Step Functions, Amazon SageMaker, AWS Key Management Service (KMS), AWS CloudTrail, AWS Systems Manager, Amazon Elastic MapReduce (EMR), Amazon EBS, Google Analytics, RabbitMQ, Jenkins, TeamCity, CircleCI, SaltStack, Tableau, Superset, Amazon Simple Queue Service (SQS), Amazon Virtual Private Cloud (VPC), Grafana
Paradigms
Unit Testing, Lambda Architecture, ETL, Agile, Kanban, Requirements Analysis, Microservices, Event-driven Architecture, Data Science, API Architecture, Microservices Architecture, Management
Platforms
Amazon Web Services (AWS), Linux, New Relic, Jupyter Notebook, AWS Lambda, Docker, Databricks, MacOS, AWS IoT, Kubernetes, Salesforce, Amazon EC2, Heroku
Storage
Databases, MySQL, PostgreSQL, Redshift, Amazon S3 (AWS S3), Redis, Google Cloud, Elasticsearch, Amazon DynamoDB, Amazon EFS, Apache Hive
Other
Data Engineering, Architecture, Software Architecture, APIs, Back-end, SaaS, Leadership, Full-stack, Full-stack Development, CI/CD Pipelines, Machine Learning, Amazon Kinesis, Graphite, Technical Requirements, CTO, Amazon Mechanical Turk, Amazon API Gateway, Data Architecture, Cloud Architecture, Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNN), Modeling, Amazon Route 53, AWS Database Migration Service (DMS), AWS Server Migration Service (SMS), Large Scale Distributed Systems, Amazon RDS, Data Analytics, Strategy, Product Management, Computer Science, Amazon Timestream, AWS Cloud Architecture, OpenAI, Large Language Models (LLMs), SEO Tools
Education
Master of Business Administration (MBA) Degree in Business Administration
University of Southern California - Los Angeles, CA, USA
Bachelor's Degree in Computer Science
University of New Hampshire - Durham, NH, USA
Certifications
AWS Certified Database Specialty
AWS
AWS Solutions Architect Professional
Amazon Web Services
AWS Certified Solutions Architect Professional
AWS
AWS Certified Security Specialty
Amazon Web Services
AWS Certified Machine Learning Specialty
Amazon Web Services
AWS Certified Data Analytics Specialty
AWS
Google Cloud Associate Cloud Engineer
Google Cloud
Databricks Certified Associate Developer for Apache Spark 2.4
Databricks
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring