Mikhail Fishzon, Data Engineer and Developer in Santa Fe, NM, United States
Mikhail Fishzon

Data Engineer and Developer in Santa Fe, NM, United States

Member since June 22, 2022
Mikhail is an alumnus of Google and Lyft and a data engineer who enjoys turning real-life business problems into scalable data platforms. Over the last two decades, he has helped companies architect, build, and maintain performance-optimized databases and data pipelines. Mikhail is knowledgeable in database optimization techniques and has achieved a 99% increase in slow-running query performance.
Mikhail is now available for hire

Portfolio

Experience

Location

Santa Fe, NM, United States

Availability

Full-time

Preferred Environment

SQL, PostgreSQL, MySQL, Pentaho, Amazon Web Services (AWS), Amazon RDS, Redshift, Database Architecture, Python, Google Cloud, Amazon Aurora, Google Cloud Platform (GCP)

The most amazing...

...thing I've worked on is a database design that could support all possible transit agency fares and scheduling business models.

Employment

  • Data Architect and Engineer (Contract)

    2022 - PRESENT
    Tovuti
    • Redesigned and normalized the company's Aurora MySQL OLT database, thus improving performance by 40% and reducing storage.
    • Optimized slow-running queries to ensure that response SLAs were met.
    • Advised on data-related technologies, design patterns, and database architecture choices.
    • Spearheaded design and implementation of a new database MVP, enabling the automation of client configuration. The project included scoping, data modeling, database-as-code solution, Python-based API, and synthetic data generation procedures.
    Technologies: Amazon Aurora, PHP, MySQL, Data Architecture, Database Design, Query Optimization, Data Engineering, Data Modeling, Data Synthesis, Database Management Systems (DBMS), RDBMS, Serverless Architecture, SQL, Database Architecture, Database Optimization, Amazon Web Services (AWS), Amazon RDS, OLTP, Design Patterns, Data Protection, Business Analysis, Database Schema Design, Reporting, Data Reporting, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Data Management, Minimum Viable Product (MVP), Python 3, Python, Zapier
  • Database Administrator

    2022 - 2023
    VIDA (via Toptal)
    • Optimized the top 20 longest-running MySQL queries, improving performance by 40 times.
    • Developed data purge mechanisms to address disk space shortage issues in an on-premise replicated MySQL environment. Produced an excellent process that handles 2TB+ tables being replicated to read-only instances.
    • Built multiple batch-based solutions for handling I/O heavy database tasks in a replicated environment.
    Technologies: MySQL, PostgreSQL, Database Administration (DBA), SQL, Query Optimization, Database Replication
  • Data Architect

    2022 - 2022
    US Service Animals (via Toptal)
    • Piloted a time-boxed data discovery project touching on the transactional, analytical, and transformational aspects of the client's data platform.
    • Uncovered a set of pre-existing conceptual misalignment issues between the transactional model and data collection logic, causing data quality and lineage issues in the downstream analytics layer.
    • Generated a set of architecture, infrastructure, tooling, and approach recommendations detailing important changes across the platform and data collection business rules.
    • Documented and presented findings and recommendations to audiences, such as C-level executives and product and technical teams.
    Technologies: Database Design, SQL, Database Schema Design, Reporting, Business Intelligence (BI), Integration, Amazon S3 (AWS S3), Customer Relationship Management (CRM), Tableau, Data Analytics, Amazon Aurora, MySQL, PostgreSQL, Data Architecture, Business Analysis, Architecture, Roadmaps, Data Management, Minimum Viable Product (MVP)
  • Data Architect Consultant

    2019 - 2021
    Covax Data
    • Rearchitected the company's main PostgreSQL OLTP database.
    • Designed and implemented a major portion of the OLTP data access layer.
    • Implemented an interim PostgreSQL-based search and record-paging functionality.
    • Proposed, configured, and tested PostgreSQL high availability infrastructure using pgBackRest, including backup, streaming replication, and dedicated backup or Wal repository.
    Technologies: OLTP, Data Warehousing, Data Modeling, High-availability Clusters, SQL, Data Governance, Data Engineering, ETL, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, Amazon Web Services (AWS), Amazon RDS, Amazon EC2, Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), Business Analysis, Database Schema Design, Reporting, PL/SQL, Data Migration, Database Migration, GDPR, Data Pipelines, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Data Management
  • Postgres Advisor

    2019 - 2020
    Cherre
    • Advised the client on optimization issues related to the PostgreSQL engine, storage, and long-running queries.
    • Provided recommendations on indexing and permission strategies as well as benchmarks for the proposed optimizations.
    • Evaluated and refactored the SQL codebase, making it compatible with the PostgreSQL 10/11 upgrade.
    Technologies: Google Cloud Platform (GCP), PostgreSQL, Query Optimization, Database Optimization, Data Engineering, SQL, Database Architecture, REST, Data Architecture, Google Cloud, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Data Reporting, Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, BigQuery, Google BigQuery, Real Estate
  • Senior Data Engineer

    2018 - 2019
    Lyft
    • Managed and expanded the in-house Python ETL framework, including MySQL ingestion, geocoding scripts, nightly checks and counts, and POS terminal data replication scripts.
    • Re-implemented data extraction scripts in Python, allowing a single, generic, and configuration-based extraction script to handle all relational data sources, replacing many legacy scripts.
    • Handled the merge and deduplication processes in Airflow and Hive, allowing data science to retrieve data for point-in-time analysis.
    • Optimized the legacy Python ETL process through code refactoring, query optimization, and resource management improvements. Reduced data pipeline execution time from 12 to 3.5 hours.
    • Conducted Redshift cluster performance analysis, code optimization, benchmarking, and documentation of the outcomes. Reduced the cluster workload and storage size and improved query performance.
    • Configured and productionalized binlog replication-based ETL to allow complete change data capture and versioning.
    Technologies: MySQL, Redshift, Redshift Spectrum, Python, Apache Hive, Amazon DynamoDB, Stitch Data, Amazon S3 (AWS S3), REST, Apache Airflow, SQL, Data Governance, Data Engineering, Big Data, ETL, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Presto DB, Amazon Athena, Database Optimization, Query Optimization, Amazon Web Services (AWS), Data Warehousing, Distributed Databases, Amazon RDS, Amazon EC2, OLTP, Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Database Schema Design, Reporting, Integration, PL/SQL, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Data Lakes, Data Management, Data Build Tool (dbt), NoSQL, Zapier
  • Principal Data Architect

    2016 - 2018
    Blocpower
    • Spearheaded the design, architecture, and implementation of the BlocPower data platform from the ground up, including all aspects of transactional and analytical processing, storage, and data access layer.
    • Handled requirements gathering, discovery, and analysis of existing and new data sources in the company's data funnel.
    • Implemented configuration-based ETL models and frameworks, data cleansing, and confirmation routines using Pentaho Data Integration, PL/pgSQL, and shell scripting. Used for ingestion and processing of municipal data.
    • Developed a scoring algorithm to rank every building in a given city using publicly available data for retrofit targeting and business development.
    • Performed analysis and structural overhaul of existing marketing, sales, and retrofit workflow management processes. Optimized Salesforce workflows, object structures, and data integrity practices.
    Technologies: Pentaho Data Integration (Kettle), PostgreSQL, Amazon RDS, Amazon S3 (AWS S3), Data Architecture, Data Engineering, Database Design, Data Modeling, Data Analytics, Business Intelligence (BI), OLTP, OLAP, Business Analysis, SQL, Database Architecture, Database Optimization, ETL, Query Optimization, Amazon Web Services (AWS), Data Warehousing, Distributed Databases, Amazon EC2, REST, JasperReports, Data Warehouse Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Customer Relationship Management (CRM), Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Data Lakes, Data Management, Minimum Viable Product (MVP), Python, Zapier, Geospatial Data
  • Data Architect and Engineer

    2015 - 2016
    Bytemark
    • Designed and implemented Bytemark's first data warehouse and analytics platform using Pentaho Data Integration, including data modeling, ETL, and reporting dashboards.
    • Acted as a member of the Bytemark architects' team to redesign the mobile ticketing platform from the ground up. Oversaw the data modeling, data access layer, and conversion script implementation.
    • Designed a flexible OLTP data model for handling different transit agency business rules related to trips, fare structures, and scheduling.
    Technologies: Amazon RDS, MySQL, Pentaho, Data Architecture, ETL, Query Optimization, Data Warehousing, SQL, Database Design, Data Engineering, Big Data, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Amazon Web Services (AWS), Amazon EC2, Amazon S3 (AWS S3), OLTP, Data Modeling, JasperReports, Data Warehouse Design, Design Patterns, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Financial Services, Data Management, Model View Presenter (MVP), Geospatial Data
  • Data Warehouse - Senior Data Engineer

    2014 - 2015
    OnDeck Capital
    • Designed and built a marketing campaign generation procedure allowing OnDeck to directly contact millions of potential customers. Reduced mailing cycle from 18 to 6 days and improved response rates and customer tracking.
    • Acted as a major contributor to the central OLTP database design and optimizations used as a back end to the award-winning OnDeck Online (ODO) credit decision engine. Contributed to the company's IPO in December 2014.
    • Developed data ingestion scripts to facilitate data movement from disparate business systems into the newly built OLTP environment.
    • Introduced best practices for data consumers, coding style guides, and other company data standards.
    • Designed and implemented a business system report generation process. Provided an aggregated view of the enterprise data funnels—from prospective clients to closed loans.
    Technologies: Greenplum, PostgreSQL, SQL, ETL, Database Optimization, Query Optimization, Pentaho Data Integration (Kettle), Data Engineering, Python, Database Architecture, Amazon Web Services (AWS), Data Analytics, Data Warehousing, Pentaho, Amazon RDS, Amazon EC2, Amazon S3 (AWS S3), OLTP, Data Modeling, Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Integration, Customer Relationship Management (CRM), Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Financial Services, Data Management
  • Data Engineer

    2011 - 2014
    Google
    • Developed and expanded dynamic ETL model (DBLoader) designed to push data from an in-memory data grid in Oracle Coherence into a relational database of customer's choice. Supported PostgreSQL, Greenplum, SQL Server 2005/2008, and Db2.
    • Oversaw the continuous data warehouse normalization and denormalization efforts to balance between consistency, performance, and access complexities due to the constant introduction of new business rules and data entities and increasing data volumes.
    • Optimized and tuned long-running analytical queries and ETL upsert processes. Reduced data landing time by 80% for the largest company client.
    • Provided the database support for recent acquisitions such as Meebo (Google+), Nik Software (Google Photos) and Channel Intelligence (Google Shopping), and DoubleClick (Google AdSense).
    • Wrote the documentation of database design and procedures for newly acquired companies. Created the playbook and troubleshooting manuals.
    • Automated the SQL Server installation, replication, and data integration tasks using PowerShell and T-SQL scripts.
    Technologies: Perl, SQL Server DBA, Greenplum, IBM Db2, Windows PowerShell, ETL, JasperReports, Data Warehousing, JVM, Oracle Coherence, SQL, Data Engineering, Big Data, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, Amazon Web Services (AWS), Distributed Databases, Amazon RDS, Microsoft SQL Server, Amazon EC2, Amazon S3 (AWS S3), Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, T-SQL (Transact-SQL), Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Integration, Analytics, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Financial Services, Data Management, NoSQL
  • DBA and Data Engineer Consultant

    2008 - 2010
    American Express
    • Led, architected, and actively participated in the complete overhaul of the American Express Global AML Investigation Tracking System (GAITS) from inception through several major releases.
    • Completed several data remodeling iterations of the GAITS OLTP and OLAP data models to meet the bank secrecy, Patriot Act data collection, and FinCEN suspicious activity filing requirements.
    • Implemented data pipelines in Perl and SQL Server to ingest AML investigation data and supporting documents into the GAITS transactional database.
    • Led the design and implementation of the first OLTP and OLAP databases and application servers for the Amex FI unit from the ground up. This included server hardware and software installation and configuration and RAID configuration and security.
    • Implemented data warehouse ETL scripts in Perl, SQL Server, and SSIS. Added downstream audience-specific data marts, as well as report generation and publishing mechanisms, which reduced the audit preparation process by 90%.
    Technologies: OLTP, Data Warehouse Design, SQL Server DBA, T-SQL (Transact-SQL), SSIS Custom Components, ETL, Database Architecture, Finance, Market Insights, Anti-money Laundering (AML), Business Intelligence (BI), Crystal Reports, SQL, Database Optimization, Query Optimization, Data Analytics, Data Warehousing, Microsoft SQL Server, Perl, Data Modeling, Data Architecture, Windows PowerShell, Data Engineering, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, OLAP, Business Analysis, Reporting, Integration, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Data Management, Oracle8i
  • Data Engineer

    1999 - 2006
    Metropolitan Jewish Health System
    • Developed Claim Remittance Data Integration scripts in Perl and MSSQL Server, allowing remitted claim lines to be merged into the centralized claims database. This software eliminated the need for a manual remittance process that was previously in place.
    • Applied pharmacy data ingestion process in MSSQL DTS. Implementation included an insurance member matching algorithm to aid with hand-written processing prescriptions, reducing the number of member mismatches from 65% to 5%.
    • Implemented Nursing Home Supply Inventory data model, ETL, and report generation and delivery mechanism in MSSQL Server and Crystal Reports. The solution provided a previously unsupported analysis of supply usage, frequency, and spending.
    Technologies: SQL, Microsoft SQL Server, SQL Server DBA, Perl, ETL, Data, Data Analysis, Reporting, Database Optimization, Query Optimization, Analytics, Data Warehousing, DTS, SQL Server Integration Services (SSIS), Healthcare Services, Healthcare, Data Reporting, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design

Experience

  • ETL Model for Data Extraction from In-memory Grid into Reporting Database
    http://www.txvia.com

    Txvia was a company in the e-payments space that developed a diagrammatic IDE (TxVia IDE) and a set of configurable e-payments models. TxVia IDE allowed users to create e-payment platforms based on the client's business rules and workflow. The system of record was a distributed in-memory data grid, and my job was to build an ETL solution to load platform data into a relational database of the client's choice.

    I developed and expanded a dynamic, configuration-based ETL model (DBLoader) that generated data pipeline code within the context of the TxVia IDE. It used client-specific e-payment models to generate table mappings for each upcoming release.

    This ETL model allowed a single person to keep up with the client-specific source system changes, improve ETL mechanics, and simultaneously manage over 30 instances of the ETL process.

    Database: Postgres (EC2), Greenplum (EC2), SQL Server (on-premise), Db2 (on-premise)
    Company: TxVia (acquired by Google)

  • Unified Data Model for Transit Agency's Business Rules

    Bytemark is a transit fare collection company providing payments as a service solution to agencies worldwide.

    I designed an OLTP data model using a supertype-subtype pattern and completed it with data access methods to support all possible transit agency's fares and scheduling business rules and models.

    The unified data model dealt with costly client implementations, improved query performance, and offered a 50% reduction in cloud infrastructure costs.

    Database: Amazon RDS for MySQL
    Company: Bytemark (acquired by Siemens AG)

  • Optimization of Long-running ETL Process

    Motivate International was the largest bike-share company in the US.

    I optimized the legacy Python ETL process through code refactoring, query optimization, and improvements to resource management. I also reduced data pipeline execution time from 12 to 3.5 hours.

    Database: Amazon RDS for MySQL, Amazon Redshift
    Company: Motivate International (acquired by Lyft)

Skills

  • Languages

    SQL, Perl, T-SQL (Transact-SQL), Python, Python 3, PHP
  • Tools

    Redshift Spectrum, Stitch Data, Apache Airflow, Pentaho Data Integration (Kettle), Amazon Athena, Crystal Reports, DTS, Oracle Coherence, Tableau, BigQuery, Zapier
  • Paradigms

    ETL, Design Patterns, Database Design, Business Intelligence (BI), REST, OLAP, Serverless Architecture
  • Platforms

    Amazon Web Services (AWS), Pentaho, Amazon EC2, Google Cloud Platform (GCP), Oracle, JVM
  • Storage

    Database Architecture, PostgreSQL, Distributed Databases, MySQL, Greenplum, Microsoft SQL Server, Amazon S3 (AWS S3), OLTP, SQL Server DBA, Amazon Aurora, PL/SQL, Database Management Systems (DBMS), RDBMS, Database Migration, SQL Server Integration Services (SSIS), Data Pipelines, Database Administration (DBA), Databases, Relational Databases, Database Replication, Redshift, Google Cloud, Oracle PL/SQL, Data Lakes, NoSQL, IBM Db2, Apache Hive, Amazon DynamoDB
  • Other

    Query Optimization, Database Optimization, Data Analytics, Data Warehousing, Amazon RDS, Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, Data Engineering, ETL Tools, GDPR, Data Protection, Business Analysis, Market Insights, Database Schema Design, Reporting, Integration, Data Migration, Analytics, Data Reporting, ELT, Relational Database Design, Architecture, Roadmaps, Data Management, Finance, Minimum Viable Product (MVP), SSIS Custom Components, Data Governance, Big Data, Data Wrangling, Data Analysis, Data Synthesis, Anti-money Laundering (AML), Customer Relationship Management (CRM), Healthcare Services, Data, Data Visualization, Big Data Architecture, Financial Services, Oracle8i, Geospatial Data, Data Build Tool (dbt), Google BigQuery, Real Estate
  • Frameworks

    Windows PowerShell, Presto DB
  • Libraries/APIs

    JasperReports
  • Industry Expertise

    Healthcare

Education

  • Coursework in Digital and Graphic Design
    2004 - 2006
    Parsons New School of Design - New York, NY, USA
  • Bachelor's Degree in Computer Science
    1995 - 1999
    Binghamton University - Binghamton, NY, USA

Certifications

  • Complete Python Bootcamp
    DECEMBER 2022 - PRESENT
    Udemy

To view more profiles

Join Toptal
Share it with others