Mikhail is available for hire

Mikhail Fishzon

Verified Expert in Engineering

Data Engineer and Developer

Location

Santa Fe, NM, United States

Toptal Member Since

June 22, 2022

Mikhail is an alumnus of Google and Lyft and a data engineer who enjoys turning real-life business problems into scalable data platforms. Over the last two decades, he has helped companies architect, build, and maintain performance-optimized databases and data pipelines. Mikhail is knowledgeable in database optimization techniques and has achieved a 99% increase in slow-running query performance.

Portfolio

Tovuti

Amazon Aurora, PHP, MySQL, Data Architecture, Database Design...

Pirate Ship

Database Administration (DBA), Database Performance, MySQL...

BetCloud Pty Ltd

PostgreSQL, SQL, Google Cloud Platform (GCP), Solution Architecture...

Experience

Query Optimization - 20 years SQL - 20 years ETL - 18 years Data Warehousing - 17 years PostgreSQL - 14 years Python - 4 years Snowflake - 2 years AWS Glue - 2 years

Availability

Full-time

Preferred Environment

SQL, PostgreSQL, MySQL, Pentaho, Amazon Web Services (AWS), Amazon RDS, Redshift, Database Architecture, Python, Google Cloud, Amazon Aurora, Google Cloud Platform (GCP)

The most amazing...

...thing I've worked on is a database design that could support all possible transit agency fares and scheduling business models.

Work Experience

Data Architect and Engineer

2022 - PRESENT

Tovuti

Initiated design and implementation of a normalized data model from the constantly evolving Aurora MySQL database. Completed MVP, including all models, self-updating catalog, ETL and documentation, and a snowflake-based target database.
Optimized slow-running queries to ensure that response SLAs were met.
Spearheaded design and implementation of a DevOps database, enabling the automation of client configuration. The project included scoping, data modeling, database-as-code solution, Python-based API, and synthetic data generation procedures.
Advised on data-related technologies, design patterns, and database architecture choices.

Technologies: Amazon Aurora, PHP, MySQL, Data Architecture, Database Design, Query Optimization, Data Engineering, Data Modeling, Data Synthesis, Database Management Systems (DBMS), RDBMS, Serverless Architecture, SQL, Database Architecture, Database Optimization, Amazon Web Services (AWS), Amazon RDS, OLTP, Design Patterns, Data Protection, Business Analysis, Database Schema Design, Reporting, Data Reporting, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Data Management, Minimum Viable Product (MVP), Python 3, Python, Zapier, Microservices, Snowflake, AWS Glue, snowpark

Senior Database Expert

2023 - 2023

Pirate Ship

Assisted the client with migrating the on-prem mySQL database to Amazon Aurora for MySQL.
Advised client on Aurora database engine configuration suitable for the client database load.
Analyzed and documented performance issues and outages that occurred on the legacy MySQL issues during the months leading to the Aurora migration.

Technologies: Database Administration (DBA), Database Performance, MySQL, Amazon Web Services (AWS), Data Engineering, MySQL Performance Tuning, Amazon RDS, Amazon Aurora

PostgreSQL Database Administrator

2023 - 2023

BetCloud Pty Ltd

Analyzed and resolved GCP PostgreSQL configuration issues that allowed to minimize replication lag and reduce canceled queries against read replicas.
Proposed, designed, documented, and tested the company's first database disaster recovery plan.
Analyzed GCP PostgreSQL performance metrics and assisted the client with choosing optimal database engine configuration settings that improved performance and reduced resource utilization.

Technologies: PostgreSQL, SQL, Google Cloud Platform (GCP), Solution Architecture, Database Performance

Database Administrator

2022 - 2023

VIDA

Optimized the top 20 longest-running MySQL queries, improving performance by 40 times.
Developed data purge mechanisms to address disk space shortage issues in an on-premise replicated MySQL environment. Produced an excellent process that handles 2TB+ tables being replicated to read-only instances.
Built multiple batch-based solutions for handling I/O heavy database tasks in a replicated environment.

Technologies: MySQL, PostgreSQL, Database Administration (DBA), SQL, Query Optimization, Database Replication, Database Performance, MySQL Performance Tuning

Data Architect

2022 - 2022

US Service Animals

Piloted a time-boxed data discovery project touching on the transactional, analytical, and transformational aspects of the client's data platform.
Uncovered a set of pre-existing conceptual misalignment issues between the transactional model and data collection logic, causing data quality and lineage issues in the downstream analytics layer.
Generated a set of architecture, infrastructure, tooling, and approach recommendations detailing important changes across the platform and data collection business rules.
Documented and presented findings and recommendations to audiences, such as C-level executives and product and technical teams.

Technologies: Database Design, SQL, Database Schema Design, Reporting, Business Intelligence (BI), Integration, Amazon S3 (AWS S3), Customer Relationship Management (CRM), Tableau, Data Analytics, Amazon Aurora, MySQL, PostgreSQL, Data Architecture, Business Analysis, Architecture, Roadmaps, Data Management, Minimum Viable Product (MVP), Microservices, Solution Architecture, Database Performance

Data Architect Consultant

2019 - 2021

Covax Data

Rearchitected the company's main PostgreSQL OLTP database.
Designed and implemented a major portion of the OLTP data access layer.
Implemented an interim PostgreSQL-based search and record-paging functionality.
Proposed, configured, and tested PostgreSQL high availability infrastructure using pgBackRest, including backup, streaming replication, and dedicated backup or Wal repository.

Technologies: OLTP, Data Warehousing, Data Modeling, High-availability Clusters, SQL, Data Governance, Data Engineering, ETL, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, Amazon Web Services (AWS), Amazon RDS, Amazon EC2, Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), Business Analysis, Database Schema Design, Reporting, PL/SQL, Data Migration, Database Migration, GDPR, Data Pipelines, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Data Management, Microservices, Solution Architecture, Database Performance

PostgreSQL Advisor

2019 - 2020

Cherre

Advised the client on optimization issues related to the PostgreSQL engine, storage, and long-running queries.
Provided recommendations on indexing and permission strategies as well as benchmarks for the proposed optimizations.
Evaluated and refactored the SQL codebase, making it compatible with the PostgreSQL 10/11 upgrade.

Technologies: Google Cloud Platform (GCP), PostgreSQL, Query Optimization, Database Optimization, Data Engineering, SQL, Database Architecture, REST, Data Architecture, Google Cloud, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Data Reporting, Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, BigQuery, Google BigQuery, Real Estate, Microservices, Solution Architecture, Database Performance

Senior Data Engineer

2018 - 2019

Lyft

Managed and expanded the in-house Python ETL framework, including MySQL ingestion, geocoding scripts, nightly checks and counts, and POS terminal data replication scripts.
Re-implemented data extraction scripts in Python, allowing a single, generic, and configuration-based extraction script to handle all relational data sources, replacing many legacy scripts.
Handled the merge and deduplication processes in Airflow and Hive, allowing data science to retrieve data for point-in-time analysis.
Optimized the legacy Python ETL process through code refactoring, query optimization, and resource management improvements. Reduced data pipeline execution time from 12 to 3.5 hours.
Conducted Redshift cluster performance analysis, code optimization, benchmarking, and documentation of the outcomes. Reduced the cluster workload and storage size and improved query performance.
Configured and productionalized binlog replication-based ETL to allow complete change data capture and versioning.

Technologies: MySQL, Redshift, Amazon Redshift Spectrum, Python, Apache Hive, Amazon DynamoDB, Stitch Data, Amazon S3 (AWS S3), REST, Apache Airflow, SQL, Data Governance, Data Engineering, Big Data, ETL, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Presto, Amazon Athena, Database Optimization, Query Optimization, Amazon Web Services (AWS), Data Warehousing, Distributed Databases, Amazon RDS, Amazon EC2, OLTP, Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, Database Design, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Database Schema Design, Reporting, Integration, PL/SQL, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Data Lakes, Data Management, Data Build Tool (dbt), NoSQL, Zapier, Data Integration, Microservices, Solution Architecture, Database Performance

Principal Data Architect

2016 - 2018

Blocpower

Spearheaded the design, architecture, and implementation of the BlocPower data platform from the ground up, including all aspects of transactional and analytical processing, storage, and data access layer.
Handled requirements gathering, discovery, and analysis of existing and new data sources in the company's data funnel.
Implemented configuration-based ETL models and frameworks, data cleansing, and confirmation routines using Pentaho Data Integration, PL/pgSQL, and shell scripting. Used for ingestion and processing of municipal data.
Developed a scoring algorithm to rank every building in a given city using publicly available data for retrofit targeting and business development.
Performed analysis and structural overhaul of existing marketing, sales, and retrofit workflow management processes. Optimized Salesforce workflows, object structures, and data integrity practices.

Technologies: Pentaho Data Integration (Kettle), PostgreSQL, Amazon RDS, Amazon S3 (AWS S3), Data Architecture, Data Engineering, Database Design, Data Modeling, Data Analytics, Business Intelligence (BI), OLTP, OLAP, Business Analysis, SQL, Database Architecture, Database Optimization, ETL, Query Optimization, Amazon Web Services (AWS), Data Warehousing, Distributed Databases, Amazon EC2, REST, JasperReports, Data Warehouse Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Reporting, Integration, Customer Relationship Management (CRM), Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Data Lakes, Data Management, Minimum Viable Product (MVP), Python, Zapier, Geospatial Data, Data Integration, Microservices, Solution Architecture, Database Performance

Data Architect and Engineer

2015 - 2016

Bytemark

Designed and implemented Bytemark's first data warehouse and analytics platform using Pentaho Data Integration, including data modeling, ETL, and reporting dashboards.
Acted as a member of the Bytemark architects' team to redesign the mobile ticketing platform from the ground up. Oversaw the data modeling, data access layer, and conversion script implementation.
Designed a flexible OLTP data model for handling different transit agency business rules related to trips, fare structures, and scheduling.

Technologies: Amazon RDS, MySQL, Pentaho, Data Architecture, ETL, Query Optimization, Data Warehousing, SQL, Database Design, Data Engineering, Big Data, ETL Tools, Data Analytics, Data Wrangling, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Amazon Web Services (AWS), Amazon EC2, Amazon S3 (AWS S3), OLTP, Data Modeling, JasperReports, Data Warehouse Design, Design Patterns, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, GDPR, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Financial Services, Data Management, Geospatial Data, Data Integration, Microservices, Solution Architecture, Database Performance, MySQL Performance Tuning

Data Warehouse - Senior Data Engineer

2014 - 2015

OnDeck Capital

Designed and built a marketing campaign generation procedure allowing OnDeck to directly contact millions of potential customers. Reduced mailing cycle from 18 to 6 days and improved response rates and customer tracking.
Acted as a major contributor to the central OLTP database design and optimizations used as a back end to the award-winning OnDeck Online (ODO) credit decision engine. Contributed to the company's IPO in December 2014.
Developed data ingestion scripts to facilitate data movement from disparate business systems into the newly built OLTP environment.
Introduced best practices for data consumers, coding style guides, and other company data standards.
Designed and implemented a business system report generation process. Provided an aggregated view of the enterprise data funnels—from prospective clients to closed loans.

Technologies: Greenplum, PostgreSQL, SQL, ETL, Database Optimization, Query Optimization, Pentaho Data Integration (Kettle), Data Engineering, Python, Database Architecture, Amazon Web Services (AWS), Data Analytics, Data Warehousing, Pentaho, Amazon RDS, Amazon EC2, Amazon S3 (AWS S3), OLTP, Data Modeling, Data Architecture, Data Warehouse Design, Design Patterns, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Integration, Customer Relationship Management (CRM), Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, ELT, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Python 3, Big Data Architecture, Roadmaps, Financial Services, Data Management, Data Integration, Solution Architecture, Database Performance

Data Engineer

2011 - 2014

Google

Developed and expanded dynamic ETL model (DBLoader) designed to push data from an in-memory data grid in Oracle Coherence into a relational database of customer's choice. Supported PostgreSQL, Greenplum, SQL Server 2005/2008, and Db2.
Oversaw the continuous data warehouse normalization and denormalization efforts to balance between consistency, performance, and access complexities due to the constant introduction of new business rules and data entities and increasing data volumes.
Optimized and tuned long-running analytical queries and ETL upsert processes. Reduced data landing time by 80% for the largest company client.
Provided the database support for recent acquisitions such as Meebo (Google+), Nik Software (Google Photos) and Channel Intelligence (Google Shopping), and DoubleClick (Google AdSense).
Wrote the documentation of database design and procedures for newly acquired companies. Created the playbook and troubleshooting manuals.
Automated the SQL Server installation, replication, and data integration tasks using PowerShell and T-SQL scripts.

Technologies: Perl, SQL Server DBA, Greenplum, IBM Db2, Windows PowerShell, ETL, JasperReports, Data Warehousing, JVM, Oracle Coherence, SQL, Data Engineering, Big Data, Data Analytics, Data Analysis, Database Architecture, PostgreSQL, Database Optimization, Query Optimization, Amazon Web Services (AWS), Distributed Databases, Amazon RDS, Microsoft SQL Server, Amazon EC2, Amazon S3 (AWS S3), Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, T-SQL (Transact-SQL), Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, Business Intelligence (BI), OLAP, Business Analysis, Reporting, Integration, Analytics, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Roadmaps, Financial Services, Data Management, NoSQL, Data Integration, Solution Architecture, Database Performance

DBA and Data Engineer Consultant

2008 - 2010

American Express

Led, architected, and actively participated in the complete overhaul of the American Express Global AML Investigation Tracking System (GAITS) from inception through several major releases.
Completed several data remodeling iterations of the GAITS OLTP and OLAP data models to meet the bank secrecy, Patriot Act data collection, and FinCEN suspicious activity filing requirements.
Implemented data pipelines in Perl and SQL Server to ingest AML investigation data and supporting documents into the GAITS transactional database.
Led the design and implementation of the first OLTP and OLAP databases and application servers for the Amex FI unit from the ground up. This included server hardware and software installation and configuration and RAID configuration and security.
Implemented data warehouse ETL scripts in Perl, SQL Server, and SSIS. Added downstream audience-specific data marts, as well as report generation and publishing mechanisms, which reduced the audit preparation process by 90%.

Technologies: OLTP, Data Warehouse Design, SQL Server DBA, T-SQL (Transact-SQL), SSIS Custom Components, ETL, Database Architecture, Market Insights, Finance, Anti-money Laundering (AML), Business Intelligence (BI), Crystal Reports, SQL, Database Optimization, Query Optimization, Data Analytics, Data Warehousing, Microsoft SQL Server, Perl, Data Modeling, Data Architecture, Windows PowerShell, Data Engineering, Database Design, Data Governance, Data Protection, Database Management Systems (DBMS), RDBMS, OLAP, Business Analysis, Reporting, Integration, Data Migration, Database Migration, Analytics, Data Visualization, Data Reporting, Data Pipelines, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Architecture, Data Management, Oracle8i, Data Integration, Solution Architecture, Database Performance

Data Engineer

1999 - 2006

Metropolitan Jewish Health System

Developed Claim Remittance Data Integration scripts in Perl and MSSQL Server, allowing remitted claim lines to be merged into the centralized claims database. This software eliminated the need for a manual remittance process that was previously in place.
Applied pharmacy data ingestion process in MSSQL DTS. Implementation included an insurance member matching algorithm to aid with hand-written processing prescriptions, reducing the number of member mismatches from 65% to 5%.
Implemented Nursing Home Supply Inventory data model, ETL, and report generation and delivery mechanism in MSSQL Server and Crystal Reports. The solution provided a previously unsupported analysis of supply usage, frequency, and spending.

Technologies: SQL, Microsoft SQL Server, SQL Server DBA, Perl, ETL, Data, Data Analysis, Reporting, Database Optimization, Query Optimization, Analytics, Data Warehousing, DTS, SQL Server Integration Services (SSIS), Healthcare Services, Healthcare, Data Reporting, Oracle, Oracle PL/SQL, Database Administration (DBA), Databases, Relational Databases, Relational Database Design, Data Integration, Database Performance

Experience

ETL Model for Data Extraction from In-memory Grid into Reporting Database

http://www.txvia.com

Txvia was a company in the e-payments space that developed a diagrammatic IDE (TxVia IDE) and a set of configurable e-payments models. TxVia IDE allowed users to create e-payment platforms based on the client's business rules and workflow. The system of record was a distributed in-memory data grid, and my job was to build an ETL solution to load platform data into a relational database of the client's choice.

I developed and expanded a dynamic, configuration-based ETL model (DBLoader) that generated data pipeline code within the context of the TxVia IDE. It used client-specific e-payment models to generate table mappings for each upcoming release.

This ETL model allowed a single person to keep up with the client-specific source system changes, improve ETL mechanics, and simultaneously manage over 30 instances of the ETL process.

Database: PostgreSQL (EC2), Greenplum (EC2), SQL Server (on-premise), Db2 (on-premise)
Company: TxVia (acquired by Google)

Unified Data Model for Transit Agency's Business Rules

Bytemark is a transit fare collection company providing payments as a service solution to agencies worldwide.

I designed an OLTP data model using a supertype-subtype pattern and completed it with data access methods to support all possible transit agency's fares and scheduling business rules and models.

The unified data model dealt with costly client implementations, improved query performance, and offered a 50% reduction in cloud infrastructure costs.

Database: Amazon RDS for MySQL
Company: Bytemark (acquired by Siemens AG)

Optimization of Long-running ETL Process

Motivate International was the largest bike-share company in the US.

I optimized the legacy Python ETL process through code refactoring, query optimization, and improvements to resource management. I also reduced data pipeline execution time from 12 to 3.5 hours.

Database: Amazon RDS for MySQL, Amazon Redshift
Company: Motivate International (acquired by Lyft)

Skills

Languages

SQL, Python, Perl, T-SQL (Transact-SQL), Python 3, Snowflake, PHP

Tools

Amazon Redshift Spectrum, MySQL Performance Tuning, Stitch Data, Apache Airflow, Pentaho Data Integration (Kettle), Amazon Athena, Crystal Reports, DTS, AWS Glue, Oracle Coherence, Tableau, BigQuery, Zapier

Paradigms

ETL, Design Patterns, Database Design, Business Intelligence (BI), Microservices, REST, OLAP, Serverless Architecture

Platforms

Amazon Web Services (AWS), Pentaho, Amazon EC2, Google Cloud Platform (GCP), Oracle, JVM

Storage

Database Architecture, PostgreSQL, Distributed Databases, MySQL, Greenplum, Microsoft SQL Server, Amazon S3 (AWS S3), OLTP, SQL Server DBA, Amazon Aurora, PL/SQL, Database Management Systems (DBMS), RDBMS, Database Migration, SQL Server Integration Services (SSIS), Data Pipelines, Database Administration (DBA), Databases, Relational Databases, Database Replication, Data Integration, Database Performance, Redshift, Google Cloud, Oracle PL/SQL, Data Lakes, NoSQL, IBM Db2, Apache Hive, Amazon DynamoDB

Other

Query Optimization, Database Optimization, Data Analytics, Data Warehousing, Amazon RDS, Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, Data Engineering, ETL Tools, GDPR, Data Protection, Business Analysis, Market Insights, Database Schema Design, Reporting, Integration, Data Migration, Analytics, Data Reporting, ELT, Relational Database Design, Architecture, Roadmaps, Data Management, Finance, Minimum Viable Product (MVP), Solution Architecture, SSIS Custom Components, Data Governance, Big Data, Data Wrangling, Data Analysis, Data Synthesis, Anti-money Laundering (AML), Customer Relationship Management (CRM), Healthcare Services, Data, Data Visualization, Big Data Architecture, Financial Services, Oracle8i, Geospatial Data, snowpark, Data Build Tool (dbt), Google BigQuery, Real Estate

Frameworks

Windows PowerShell, Presto

Libraries/APIs

JasperReports

Industry Expertise

Healthcare

Education

2004 - 2006

Coursework in Digital and Graphic Design

Parsons New School of Design - New York, NY, USA

1995 - 1999

Bachelor's Degree in Computer Science

Binghamton University - Binghamton, NY, USA

Certifications

DECEMBER 2022 - PRESENT

Complete Python Bootcamp

Udemy

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring