Mikhail Fishzon, Data Engineer and Developer in Santa Fe, United States
Mikhail Fishzon

Data Engineer and Developer in Santa Fe, United States

Member since June 22, 2022
Mikhail is an alumnus of Google and Lyft and a data engineer who enjoys turning real-life business problems into scalable data platforms. Over the last two decades, he helped companies architect, build, and maintain performance-optimized databases and data pipelines. Mikhail is knowledgeable in database optimization techniques and has achieved a 99% increase in slow-running query performance.
Mikhail is now available for hire

Portfolio

  • TOVUTI LMS
    Amazon Aurora, PHP, MySQL, Data Architecture, Database Design...
  • Covax Data
    OLTP, Data Warehousing, Data Modeling, High-availability Clusters, SQL...
  • Cherre
    Google Cloud Platform (GCP), PostgreSQL, Query Optimization...

Experience

Location

Santa Fe, United States

Availability

Part-time

Preferred Environment

SQL, PostgreSQL, MySQL, Pentaho, AWS, AWS RDS, Redshift, Database Architecture, Python, Google Cloud, Amazon Aurora, Google Cloud Platform (GCP)

The most amazing...

...thing I've worked on is a database design that could support all possible transit agency fares and scheduling business models.

Employment

  • Data Architect | Engineer Contractor

    2022 - PRESENT
    TOVUTI LMS
    • Re-designed and normalized the company's Aurora MySQL OLT database, thus improving performance by 40% and reducing storage.
    • Optimized slow-running queries to ensure that response SLAs were met.
    • Advised on data-related technologies, design patterns, and database architecture choices.
    Technologies: Amazon Aurora, PHP, MySQL, Data Architecture, Database Design, Query Optimization, Data Engineering, Data Modeling, Data Synthesis, Database Management Systems (DBMS), RDBMS, Serverless Architecture, SQL, Database Architecture, Database Optimization, AWS, AWS RDS, OLTP, Design Patterns, Data Protection, Business Analysis, Database Schema Design, Reporting, High-availability database, Data Reporting, Database Administration (DBA)
  • Data Architect Consultant

    2019 - 2021
    Covax Data
    • Rearchitected the company's main PostgreSQL OLTP database.
    • Designed and implemented a major portion of the OLTP data access layer.
    • Implemented an interim PostgreSQL-based search and record-paging functionality.
    • Proposed, configured, and tested PostgreSQL high availability infrastructure using pgBackRest, including backup, streaming replication, and dedicated backup or Wal repository.
    Technologies: OLTP, Data Warehousing, Data Modeling, High-availability Clusters, SQL, Data Governance, Data Engineering, ETL, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, AWS, AWS RDS, Amazon EC2 (Amazon Elastic Compute Cloud), Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), Business Analysis, Database Schema Design, Reporting, High-availability database, PL/SQL, Data Migration, Database Migration, GDPR, Data Pipelines, Database Administration (DBA)
  • Postgres Advisor

    2019 - 2020
    Cherre
    • Advised client on issues related to optimizations of Postgres engine, storage, and long-running queries.
    • Provided recommendations on indexing and permission strategies and benchmarks for recommended optimizations.
    • Evaluated and refactored SQL code-base to make it incompatible with Postgres 10/11 upgrade.
    Technologies: Google Cloud Platform (GCP), PostgreSQL, Query Optimization, Database Optimization, Data Engineering, SQL, Database Architecture, REST, Data Architecture, Google Cloud, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Data Reporting
  • Senior Data Engineer

    2018 - 2019
    Lyft
    • Managed and expanded in-house Python ETL framework, including MySQL ingestion, geo-coding scripts, nightly checks and counts, and POS terminal data replication scripts.
    • Reimplemented data extraction scripts in Python, allowing a single, generic, and configuration-based extraction script to handle all relational data sources, replacing many legacy scripts.
    • Implemented the merge and deduplication process in Airflow and Hive, allowing data science to retrieve data for the point-in-time analysis.
    • Optimized the legacy Python ETL process through code refactoring, query optimization, and improvements to resource management. Reduced data pipeline execution time from 12 to 3.5 hours.
    • Conducted the Redshift cluster performance analysis, optimization code, benchmarking, and documentation of the outcomes. Reduced the cluster workload and storage size and improved query performance.
    • Configured and productionalized BinLog replication-based ETL to allow complete change data capture and versioning.
    Technologies: MySQL, Redshift, Redshift Spectrum, Python, Apache Hive, Amazon DynamoDB, Stitch Data, Amazon S3 (AWS S3), REST, Apache Airflow, SQL, Data Governance, Data Engineering, Big Data, ETL, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Presto DB, Amazon Athena, Database Optimization, Query Optimization, AWS, Data Warehousing, Distributed Databases, AWS RDS, Amazon EC2 (Amazon Elastic Compute Cloud), OLTP, Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Database Schema Design, Reporting, Integration, High-availability database, PL/SQL, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA)
  • Principal Data Architect

    2016 - 2018
    Blocpower
    • Spearheaded design, architecture, and implementation of Blocpower Data Platform from the ground up, including all aspects of transactional and analytical processing, storage, and data access layer.
    • Requirement gathering, discovery, and analysis of existing and new data sources in the company's data funnel.
    • Implemented configuration-based ETL models and frameworks, data cleansing, and confirmation routines using Pentaho Data Integration, PL/pgSQL, and shell scripting. Used for ingestion and processing of municipal data.
    • Developed a scoring algorithm to rank every building in a given city using publicly available data for retrofit targeting and business development.
    • Performed analysis and structural overhaul of existing marketing, sales, and retrofit workflow management processes. Optimized Salesforce workflows, object structures, and data integrity practices.
    Technologies: Pentaho Data Integration (Kettle), PostgreSQL, AWS RDS, Amazon S3 (AWS S3), Data Architecture, Data Engineering, Database Design, Data Modeling, Data Analytics, Business Intelligence (BI), OLTP, OLAP, Business Analysis, SQL, Database Architecture, Database Optimization, ETL, Query Optimization, AWS, Data Warehousing, Distributed Databases, Amazon EC2 (Amazon Elastic Compute Cloud), REST, JasperReports, Data Warehouse Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Customer Relationship Management (CRM), Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA)
  • Data Architect and Engineer

    2015 - 2016
    Bytemark
    • Designed and implemented the first data warehouse and analytics platform for Bytemark using Pentaho Data Integration, including data modeling, ETL, and reporting dashboards.
    • Acted as a member of the Bytemark architects' team to redesign the mobile ticketing platform from the ground up. Oversaw data modeling, data access layer, and conversion script implementation.
    • Designed a flexible OLTP data model for handling different transit agency business rules related to trips, fare structures, and scheduling.
    Technologies: AWS RDS, MySQL, Pentaho, Data Architecture, ETL, Query Optimization, Data Warehousing, SQL, Database Design, Data Engineering, Big Data, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, AWS, Amazon EC2 (Amazon Elastic Compute Cloud), Amazon S3 (AWS S3), OLTP, Data Modeling, JasperReports, Data Warehouse Design, Design Patterns, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA)
  • Data Warehouse - Senior Data Engineer

    2014 - 2015
    OnDeck Capital
    • Designed and built a marketing campaign generation procedure allowing OnDeck to directly contact millions of potential customers. Reduced mailing cycle from 18 to six days, improved response rates and customer tracking.
    • Acted as a major contributor to the central OLTP database design and optimizations used as a back end to the award-winning OnDeck Online(ODO) credit decision engine. Contributed to the company's IPO in December 2014.
    • Developed data ingestion scripts to facilitate data movement from desperate business systems into the newly built OLTP environment.
    • Introduced best practices for data consumers, coding style guides, and other company data standards.
    • Designed and implemented a business system report generation process. Provided an aggregated view of the enterprise data funnels from prospect clients to closed loans.
    Technologies: Greenplum, PostgreSQL, SQL, ETL, Database Optimization, Query Optimization, Pentaho Data Integration (Kettle), Data Engineering, Python, Database Architecture, AWS, Data Analytics, Data Warehousing, Pentaho, AWS RDS, Amazon EC2 (Amazon Elastic Compute Cloud), Amazon S3 (AWS S3), OLTP, Data Modeling, Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Integration, Customer Relationship Management (CRM), Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA)
  • Data Engineer

    2011 - 2014
    Google
    • Developed and expanded dynamic ETL model (DBLoader) designed to push data from an in-memory data grid in Oracle Coherence into a relational database of customer's choice. Supported PostgreSQL, Greenplum, SQL Server 2005/2008, and Db2.
    • Oversaw the continuous data warehouse normalization and denormalization efforts to balance between consistency, performance, and access complexities due to the constant introduction of new business rules and data entities and increasing data volumes.
    • Optimized and tuned long-running analytical queries and ETL upsert processes. Reduced data landing time by 80% for the largest company client.
    • Provided the database support for recent acquisitions such as Meebo (Google+), Nik Software (Google Photos) and Channel Intelligence (Google Shopping), and DoubleClick (Google AdSense).
    • Wrote the documentation of database design and procedures for newly acquired companies. Created the playbook and troubleshooting manuals.
    • Automated the SQL Server installation, replication, and data integration tasks using PowerShell and T-SQL scripts.
    Technologies: Perl, SQL Server DBA, Greenplum, IBM Db2, Windows PowerShell, ETL, JasperReports, Data Warehousing, JVM, Oracle Coherence, SQL, Data Engineering, Big Data, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, AWS, Distributed Databases, AWS RDS, Microsoft SQL Server, Amazon EC2 (Amazon Elastic Compute Cloud), Amazon S3 (AWS S3), Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, T-SQL, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Integration, Transact-SQL, Analytics, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA)
  • DBA/Data Engineer Consultant

    2008 - 2010
    American Express
    • Led, architected, and actively participated in a complete overhaul of the American Express global AML investigation tracking system (GAITS) from inception through several major releases.
    • Completed several iterations of data remodeling of the GAITS OLTP and OLAP data models to satisfy the bank secrecy, Patriot Act data collection, and FinCEN suspicious activity filing requirements.
    • Implemented data pipelines in Perl and SQL Server to ingest AML investigation data and supporting documents into GAITS transactional database.
    • Oversaw the design and implementation of the first OLTP and OLAP databases and application servers for the AMEX FI unit from the ground up, including server hardware and software installation and configuration and RAID configuration and security.
    • Implemented data warehouse ETL scripts in Perl, SQL Server, and SSIS. Added downstream audience-specific data marts and report generation and publishing mechanisms, which reduced the audit preparation process by 90%.
    Technologies: OLTP, Data Warehouse Design, SQL Server DBA, T-SQL, SSIS Custom Components, ETL, Database Architecture, Financial Intelligence, Anti-money Laundering (AML), Business Intelligence (BI), Crystal Reports, SQL, Database Optimization, Query Optimization, Data Analytics, Data Warehousing, Microsoft SQL Server, Perl, Data Modeling, Data Architecture, Windows PowerShell, Data Engineering, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, OLAP, Business Analysis, Reporting, Integration, Transact-SQL, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA)
  • Data Engineer

    1999 - 2006
    Metropolitan Jewish Health System
    • Developed Claim Remittance Data Integration scripts in Perl and MSSQL Server, allowing remitted claim lines to be merged into the centralized claims database. This software eliminated the need for a manual remittance process that was previously in place.
    • Applied pharmacy data ingestion process in MSSQL DTS. Implementation included an insurance member matching algorithm to aid with hand-written processing prescriptions, reducing the number of member mismatches from 65% to 5%.
    • Implemented Nursing Home Supply Inventory data model, ETL, and report generation and delivery mechanism in MSSQL Server and Crystal Reports. The solution provided a previously unsupported analysis of supply usage, frequency, and spending.
    Technologies: SQL, Microsoft SQL Server, SQL Server DBA, Perl, ETL, clinical data, Data Analysis, Reporting, Database Optimization, Query Optimization, Analytics, Data Warehousing, DTS, SQL Server Integration Services (SSIS), Healthcare Services, Healthcare, Data Reporting, Oracle, Oracle PL/SQL, Database Administration (DBA)

Experience

  • ETL Model for Data Extraction from In-memory Grid into Reporting Database
    http://www.txvia.com

    Txvia was a company in the e-payments space that developed a diagrammatic IDE (TxVia IDE) and a set of configurable e-payments models. TxVia IDE allowed users to create e-payment platforms based on the client's business rules and workflow. The system of record was a distributed in-memory data grid, and my job was to build an ETL solution to load platform data into a relational database of the client's choice.

    I developed and expanded a dynamic, configuration-based ETL model (DBLoader) that generated data pipeline code within the context of the TxVia IDE. It used client-specific e-payment models to generate table mappings for each upcoming release.

    This ETL model allowed a single person to keep up with the client-specific source system changes, improve ETL mechanics, and simultaneously manage over 30 instances of the ETL process.

    Database: Postgres (EC2), Greenplum (EC2), SQL Server (on-premise), Db2 (on-premise)
    Company: TxVia (acquired by Google)

  • Unified Data Model for Transit Agency's Business Rules

    Bytemark is a transit fare collection company providing payments as a service solution to agencies worldwide.

    I designed an OLTP data model using a supertype-subtype pattern and completed it with data access methods to support all possible transit agency's fares and scheduling business rules and models.

    The unified data model dealt with costly client implementations, improved query performance, and offered a 50% reduction in cloud infrastructure costs.

    Database: Amazon RDS for MySQL
    Company: Bytemark (acquired by Siemens AG)

  • Optimization of Long-running ETL Process

    Motivate International was the largest bike-share company in the US.

    I optimized the legacy Python ETL process through code refactoring, query optimization, and improvements to resource management. I also reduced data pipeline execution time from 12 to 3.5 hours.

    Database: Amazon RDS for MySQL, Amazon Redshift
    Company: Motivate International (acquired by Lyft)

Skills

  • Languages

    SQL, T-SQL, Transact-SQL, Python, Perl, PHP
  • Paradigms

    ETL, Database Design, Business Intelligence (BI), REST, Design Patterns, OLAP, Serverless Architecture
  • Platforms

    Amazon EC2 (Amazon Elastic Compute Cloud), Pentaho, Google Cloud Platform (GCP), Oracle, JVM
  • Storage

    Database Architecture, PostgreSQL, MySQL, Amazon S3 (AWS S3), OLTP, SQL Server DBA, PL/SQL, Database Management Systems (DBMS), RDBMS, Database Migration, SQL Server Integration Services (SSIS), Data Pipelines, Database Administration (DBA), Redshift, Distributed Databases, Greenplum, Microsoft SQL Server, Google Cloud, Amazon Aurora, Oracle PL/SQL, IBM Db2, Apache Hive, Amazon DynamoDB
  • Other

    Query Optimization, Database Optimization, AWS, Data Analytics, Data Warehousing, AWS RDS, Data Modeling, Data Architecture, Data Warehouse Design, Data Engineering, ETL Tools, GDPR, Data Protection, Business Analysis, Database Schema Design, Reporting, Integration, Data Migration, Analytics, Data Reporting, ELT, High-availability Clusters, SSIS Custom Components, Data Governance, Big Data, Data Wrangling, Data Analysis, High-availability database, Data Synthesis, Financial Intelligence, Anti-money Laundering (AML), Customer Relationship Management (CRM), Healthcare Services, clinical data, Data Visualization
  • Frameworks

    Windows PowerShell, Presto DB
  • Libraries/APIs

    JasperReports
  • Tools

    Redshift Spectrum, Stitch Data, Apache Airflow, Pentaho Data Integration (Kettle), Amazon Athena, Crystal Reports, DTS, Oracle Coherence
  • Industry Expertise

    Healthcare

Education

  • Coursework in Digital and Graphic Design
    2004 - 2006
    Parsons New School of Design - New York, NY, USA
  • Bachelor's Degree in Computer Science
    1995 - 1999
    Binghamton University - Binghamton, NY, USA

To view more profiles

Join Toptal
Share it with others