Data Architect and Engineer (Contract)
2022 - PRESENTTovuti- Redesigned and normalized the company's Aurora MySQL OLT database, thus improving performance by 40% and reducing storage.
- Optimized slow-running queries to ensure that response SLAs were met.
- Advised on data-related technologies, design patterns, and database architecture choices.
- Spearheaded design and implementation of a new database MVP, enabling the automation of client configuration. The project included scoping, data modeling, database-as-code solution, Python-based API, and synthetic data generation procedures.
Technologies: Amazon Aurora, PHP, MySQL, Data Architecture, Database Design, Query Optimization, Data Engineering, Data Modeling, Data Synthesis, Database Management Systems (DBMS), RDBMS, Serverless Architecture, SQL, Database Architecture, Database Optimization, Amazon Web Services (AWS), Amazon RDS, OLTP, Design Patterns, Data Protection, Business Analysis, Database Schema Design, Reporting, Data Reporting, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Data Management, Minimum Viable Product (MVP), Python 3, Python, ZapierDatabase Administrator
2022 - 2023VIDA (via Toptal)- Optimized the top 20 longest-running MySQL queries, improving performance by 40 times.
- Developed data purge mechanisms to address disk space shortage issues in an on-premise replicated MySQL environment. Produced an excellent process that handles 2TB+ tables being replicated to read-only instances.
- Built multiple batch-based solutions for handling I/O heavy database tasks in a replicated environment.
Technologies: MySQL, PostgreSQL, Database Administration (DBA), SQL, Query Optimization, Database ReplicationData Architect
2022 - 2022US Service Animals (via Toptal)- Piloted a time-boxed data discovery project touching on the transactional, analytical, and transformational aspects of the client's data platform.
- Uncovered a set of pre-existing conceptual misalignment issues between the transactional model and data collection logic, causing data quality and lineage issues in the downstream analytics layer.
- Generated a set of architecture, infrastructure, tooling, and approach recommendations detailing important changes across the platform and data collection business rules.
- Documented and presented findings and recommendations to audiences, such as C-level executives and product and technical teams.
Technologies: Database Design, SQL, Database Schema Design, Reporting, Business Intelligence (BI), Integration, Amazon S3 (AWS S3), Customer Relationship Management (CRM), Tableau, Data Analytics, Amazon Aurora, MySQL, PostgreSQL, Data Architecture, Business Analysis, Architecture, Roadmaps, Data Management, Minimum Viable Product (MVP)Data Architect Consultant
2019 - 2021Covax Data- Rearchitected the company's main PostgreSQL OLTP database.
- Designed and implemented a major portion of the OLTP data access layer.
- Implemented an interim PostgreSQL-based search and record-paging functionality.
- Proposed, configured, and tested PostgreSQL high availability infrastructure using pgBackRest, including backup, streaming replication, and dedicated backup or Wal repository.
Technologies: OLTP, Data Warehousing, Data Modeling, High-availability Clusters, SQL, Data Governance, Data Engineering, ETL, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, Amazon Web Services (AWS), Amazon RDS, Amazon EC2, Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), Business Analysis, Database Schema Design, Reporting, PL/SQL, Data Migration, Database Migration, GDPR, Data Pipelines, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Data ManagementPostgres Advisor
2019 - 2020Cherre- Advised the client on optimization issues related to the PostgreSQL engine, storage, and long-running queries.
- Provided recommendations on indexing and permission strategies as well as benchmarks for the proposed optimizations.
- Evaluated and refactored the SQL codebase, making it compatible with the PostgreSQL 10/11 upgrade.
Technologies: Google Cloud Platform (GCP), PostgreSQL, Query Optimization, Database Optimization, Data Engineering, SQL, Database Architecture, REST, Data Architecture, Google Cloud, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Data Reporting, Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, BigQuery, Google BigQuery, Real EstateSenior Data Engineer
2018 - 2019Lyft- Managed and expanded the in-house Python ETL framework, including MySQL ingestion, geocoding scripts, nightly checks and counts, and POS terminal data replication scripts.
- Re-implemented data extraction scripts in Python, allowing a single, generic, and configuration-based extraction script to handle all relational data sources, replacing many legacy scripts.
- Handled the merge and deduplication processes in Airflow and Hive, allowing data science to retrieve data for point-in-time analysis.
- Optimized the legacy Python ETL process through code refactoring, query optimization, and resource management improvements. Reduced data pipeline execution time from 12 to 3.5 hours.
- Conducted Redshift cluster performance analysis, code optimization, benchmarking, and documentation of the outcomes. Reduced the cluster workload and storage size and improved query performance.
- Configured and productionalized binlog replication-based ETL to allow complete change data capture and versioning.
Technologies: MySQL, Redshift, Redshift Spectrum, Python, Apache Hive, Amazon DynamoDB, Stitch Data, Amazon S3 (AWS S3), REST, Apache Airflow, SQL, Data Governance, Data Engineering, Big Data, ETL, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Presto DB, Amazon Athena, Database Optimization, Query Optimization, Amazon Web Services (AWS), Data Warehousing, Distributed Databases, Amazon RDS, Amazon EC2, OLTP, Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Database Schema Design, Reporting, Integration, PL/SQL, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Data Lakes, Data Management, Data Build Tool (dbt), NoSQL, ZapierPrincipal Data Architect
2016 - 2018Blocpower- Spearheaded the design, architecture, and implementation of the BlocPower data platform from the ground up, including all aspects of transactional and analytical processing, storage, and data access layer.
- Handled requirements gathering, discovery, and analysis of existing and new data sources in the company's data funnel.
- Implemented configuration-based ETL models and frameworks, data cleansing, and confirmation routines using Pentaho Data Integration, PL/pgSQL, and shell scripting. Used for ingestion and processing of municipal data.
- Developed a scoring algorithm to rank every building in a given city using publicly available data for retrofit targeting and business development.
- Performed analysis and structural overhaul of existing marketing, sales, and retrofit workflow management processes. Optimized Salesforce workflows, object structures, and data integrity practices.
Technologies: Pentaho Data Integration (Kettle), PostgreSQL, Amazon RDS, Amazon S3 (AWS S3), Data Architecture, Data Engineering, Database Design, Data Modeling, Data Analytics, Business Intelligence (BI), OLTP, OLAP, Business Analysis, SQL, Database Architecture, Database Optimization, ETL, Query Optimization, Amazon Web Services (AWS), Data Warehousing, Distributed Databases, Amazon EC2, REST, JasperReports, Data Warehouse Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Customer Relationship Management (CRM), Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Data Lakes, Data Management, Minimum Viable Product (MVP), Python, Zapier, Geospatial DataData Architect and Engineer
2015 - 2016Bytemark- Designed and implemented Bytemark's first data warehouse and analytics platform using Pentaho Data Integration, including data modeling, ETL, and reporting dashboards.
- Acted as a member of the Bytemark architects' team to redesign the mobile ticketing platform from the ground up. Oversaw the data modeling, data access layer, and conversion script implementation.
- Designed a flexible OLTP data model for handling different transit agency business rules related to trips, fare structures, and scheduling.
Technologies: Amazon RDS, MySQL, Pentaho, Data Architecture, ETL, Query Optimization, Data Warehousing, SQL, Database Design, Data Engineering, Big Data, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Amazon Web Services (AWS), Amazon EC2, Amazon S3 (AWS S3), OLTP, Data Modeling, JasperReports, Data Warehouse Design, Design Patterns, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Financial Services, Data Management, Model View Presenter (MVP), Geospatial DataData Warehouse - Senior Data Engineer
2014 - 2015OnDeck Capital- Designed and built a marketing campaign generation procedure allowing OnDeck to directly contact millions of potential customers. Reduced mailing cycle from 18 to 6 days and improved response rates and customer tracking.
- Acted as a major contributor to the central OLTP database design and optimizations used as a back end to the award-winning OnDeck Online (ODO) credit decision engine. Contributed to the company's IPO in December 2014.
- Developed data ingestion scripts to facilitate data movement from disparate business systems into the newly built OLTP environment.
- Introduced best practices for data consumers, coding style guides, and other company data standards.
- Designed and implemented a business system report generation process. Provided an aggregated view of the enterprise data funnels—from prospective clients to closed loans.
Technologies: Greenplum, PostgreSQL, SQL, ETL, Database Optimization, Query Optimization, Pentaho Data Integration (Kettle), Data Engineering, Python, Database Architecture, Amazon Web Services (AWS), Data Analytics, Data Warehousing, Pentaho, Amazon RDS, Amazon EC2, Amazon S3 (AWS S3), OLTP, Data Modeling, Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Integration, Customer Relationship Management (CRM), Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Financial Services, Data ManagementData Engineer
2011 - 2014Google- Developed and expanded dynamic ETL model (DBLoader) designed to push data from an in-memory data grid in Oracle Coherence into a relational database of customer's choice. Supported PostgreSQL, Greenplum, SQL Server 2005/2008, and Db2.
- Oversaw the continuous data warehouse normalization and denormalization efforts to balance between consistency, performance, and access complexities due to the constant introduction of new business rules and data entities and increasing data volumes.
- Optimized and tuned long-running analytical queries and ETL upsert processes. Reduced data landing time by 80% for the largest company client.
- Provided the database support for recent acquisitions such as Meebo (Google+), Nik Software (Google Photos) and Channel Intelligence (Google Shopping), and DoubleClick (Google AdSense).
- Wrote the documentation of database design and procedures for newly acquired companies. Created the playbook and troubleshooting manuals.
- Automated the SQL Server installation, replication, and data integration tasks using PowerShell and T-SQL scripts.
Technologies: Perl, SQL Server DBA, Greenplum, IBM Db2, Windows PowerShell, ETL, JasperReports, Data Warehousing, JVM, Oracle Coherence, SQL, Data Engineering, Big Data, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, Amazon Web Services (AWS), Distributed Databases, Amazon RDS, Microsoft SQL Server, Amazon EC2, Amazon S3 (AWS S3), Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, T-SQL (Transact-SQL), Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Integration, Analytics, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Financial Services, Data Management, NoSQLDBA and Data Engineer Consultant
2008 - 2010American Express- Led, architected, and actively participated in the complete overhaul of the American Express Global AML Investigation Tracking System (GAITS) from inception through several major releases.
- Completed several data remodeling iterations of the GAITS OLTP and OLAP data models to meet the bank secrecy, Patriot Act data collection, and FinCEN suspicious activity filing requirements.
- Implemented data pipelines in Perl and SQL Server to ingest AML investigation data and supporting documents into the GAITS transactional database.
- Led the design and implementation of the first OLTP and OLAP databases and application servers for the Amex FI unit from the ground up. This included server hardware and software installation and configuration and RAID configuration and security.
- Implemented data warehouse ETL scripts in Perl, SQL Server, and SSIS. Added downstream audience-specific data marts, as well as report generation and publishing mechanisms, which reduced the audit preparation process by 90%.
Technologies: OLTP, Data Warehouse Design, SQL Server DBA, T-SQL (Transact-SQL), SSIS Custom Components, ETL, Database Architecture, Finance, Market Insights, Anti-money Laundering (AML), Business Intelligence (BI), Crystal Reports, SQL, Database Optimization, Query Optimization, Data Analytics, Data Warehousing, Microsoft SQL Server, Perl, Data Modeling, Data Architecture, Windows PowerShell, Data Engineering, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, OLAP, Business Analysis, Reporting, Integration, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Data Management, Oracle8iData Engineer
1999 - 2006Metropolitan Jewish Health System- Developed Claim Remittance Data Integration scripts in Perl and MSSQL Server, allowing remitted claim lines to be merged into the centralized claims database. This software eliminated the need for a manual remittance process that was previously in place.
- Applied pharmacy data ingestion process in MSSQL DTS. Implementation included an insurance member matching algorithm to aid with hand-written processing prescriptions, reducing the number of member mismatches from 65% to 5%.
- Implemented Nursing Home Supply Inventory data model, ETL, and report generation and delivery mechanism in MSSQL Server and Crystal Reports. The solution provided a previously unsupported analysis of supply usage, frequency, and spending.
Technologies: SQL, Microsoft SQL Server, SQL Server DBA, Perl, ETL, Data, Data Analysis, Reporting, Database Optimization, Query Optimization, Analytics, Data Warehousing, DTS, SQL Server Integration Services (SSIS), Healthcare Services, Healthcare, Data Reporting, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design