Mikhail Fishzon
Verified Expert in Engineering
Data Engineer and Developer
Santa Fe, NM, United States
Toptal member since June 22, 2022
Mikhail is an alumnus of Google and Lyft and a data engineer who enjoys turning real-life business problems into scalable data platforms. Over the last two decades, he has helped companies architect, build, and maintain performance-optimized databases and data pipelines. Mikhail is knowledgeable in database optimization techniques and has achieved a 99% increase in slow-running query performance.
Portfolio
Experience
- Query Optimization - 20 years
- SQL - 20 years
- ETL - 18 years
- Data Warehousing - 17 years
- PostgreSQL - 14 years
- Python - 4 years
- Snowflake - 2 years
- AWS Glue - 2 years
Availability
Preferred Environment
PostgreSQL, MySQL, Amazon Web Services (AWS), Redshift, Database Architecture, Python, Amazon Aurora, Google Cloud Platform (GCP), AWS Glue, Snowflake, Performance, SQL Performance
The most amazing...
...thing I've worked on is a database design that could support all possible transit agency fares and scheduling business models.
Work Experience
Data Architect and Engineer
Tovuti
- Led a data discovery process to reverse-engineer business logic from an existing Aurora MySQL transactional database and application code. Developed a normalized data model capable of handling the transactional load with just 1/5 of computing power.
- Spearheaded the design and implementation of a Snowflake data warehouse using the normalized data model. Successfully developed and showcased an MVP ETL process with AWS Glue, PySpark, and Snowpark.
- Rearchitected sections of the transactional Aurora MySQL database to improve data integrity and performance. Enforced data collection of previously untracked data required by downstream systems.
- Established and automated DBA tasks and tools for the Aurora MySQL transactional database, including monitoring, DDL deployment in multi-tenant environments, managing database-as-code, and more.
- Refined the top N slow-running transactional queries, improved performance, and reduced database workload. Achieved an average of 60% reduction in query execution time.
- Examined Aurora MySQL cluster resource usage and optimized the database engine to slash cloud database costs by 50%.
- Created a DevOps-centric PostgreSQL database to automate client configuration in a multi-tenant environment. This project included scoping, data modeling, Python-based APIs, and synthetic test data generation.
Python Data Engineer Engineer
Looka Inc
- Designed and implemented a relational model in Amazon Redshift to efficiently store and manage user, subscription, and payment data retrieved through the Stripe API.
- Developed an Airflow ETL job to retrieve, process, and store user, subscription, and payment data generated from Stripe API.
- Provided strategic guidance to the client on updating infrastructure components that had reached end-of-life. Designed, tested, and documented streamlined procedures for setting up a simplified local Airflow development environment.
Senior Database Expert
Pirate Ship
- Assisted the client with migrating the on-prem mySQL database to Amazon Aurora for MySQL.
- Advised client on Aurora database engine configuration suitable for the client database load.
- Analyzed and documented performance issues and outages that occurred on the legacy MySQL issues during the months leading to the Aurora migration.
PostgreSQL Database Administrator
BetCloud Pty Ltd
- Analyzed and resolved GCP PostgreSQL configuration issues that allowed to minimize replication lag and reduce canceled queries against read replicas.
- Proposed, designed, documented, and tested the company's first database disaster recovery plan.
- Analyzed GCP PostgreSQL performance metrics and assisted the client with choosing optimal database engine configuration settings that improved performance and reduced resource utilization.
Database Administrator
VIDA
- Optimized the top 20 longest-running MySQL queries, improving performance by 40 times.
- Developed data purge mechanisms to address disk space shortage issues in an on-premise replicated MySQL environment. Produced an excellent process that handles 2TB+ tables being replicated to read-only instances.
- Built multiple batch-based solutions for handling I/O heavy database tasks in a replicated environment.
Data Architect
US Service Animals
- Piloted a time-boxed data discovery project touching on the transactional, analytical, and transformational aspects of the client's data platform.
- Uncovered a set of pre-existing conceptual misalignment issues between the transactional model and data collection logic, causing data quality and lineage issues in the downstream analytics layer.
- Generated a set of architecture, infrastructure, tooling, and approach recommendations detailing important changes across the platform and data collection business rules.
- Documented and presented findings and recommendations to audiences, such as C-level executives and product and technical teams.
Data Architect Consultant
Covax Data
- Rearchitected the company's main PostgreSQL OLTP database.
- Designed and implemented a major portion of the OLTP data access layer.
- Implemented an interim PostgreSQL-based search and record-paging functionality.
- Proposed, configured, and tested PostgreSQL high availability infrastructure using pgBackRest, including backup, streaming replication, and dedicated backup or Wal repository.
PostgreSQL Advisor
Cherre
- Advised the client on optimization issues related to the PostgreSQL engine, storage, and long-running queries.
- Provided recommendations on indexing and permission strategies as well as benchmarks for the proposed optimizations.
- Evaluated and refactored the SQL codebase, making it compatible with the PostgreSQL 10/11 upgrade.
Senior Data Engineer
Lyft
- Managed and expanded the in-house Python ETL framework, including MySQL ingestion, geocoding scripts, nightly checks and counts, and POS terminal data replication scripts.
- Re-implemented data extraction scripts in Python, allowing a single, generic, and configuration-based extraction script to handle all relational data sources, replacing many legacy scripts.
- Handled the merge and deduplication processes in Airflow and Hive, allowing data science to retrieve data for point-in-time analysis.
- Optimized the legacy Python ETL process through code refactoring, query optimization, and resource management improvements. Reduced data pipeline execution time from 12 to 3.5 hours.
- Conducted Redshift cluster performance analysis, code optimization, benchmarking, and documentation of the outcomes. Reduced the cluster workload and storage size and improved query performance.
- Configured and productionalized binlog replication-based ETL to allow complete change data capture and versioning.
Principal Data Architect
Blocpower
- Spearheaded the design, architecture, and implementation of the BlocPower data platform from the ground up, including all aspects of transactional and analytical processing, storage, and data access layer.
- Handled requirements gathering, discovery, and analysis of existing and new data sources in the company's data funnel.
- Implemented configuration-based ETL models and frameworks, data cleansing, and confirmation routines using Pentaho Data Integration, PL/pgSQL, and shell scripting. Used for ingestion and processing of municipal data.
- Developed a scoring algorithm to rank every building in a given city using publicly available data for retrofit targeting and business development.
- Performed an analysis and structural overhaul of existing marketing, sales, and retrofit workflow management processes. Optimized Salesforce workflows, object structures, and data integrity practices.
Data Architect and Engineer
Bytemark
- Designed and implemented Bytemark's first data warehouse and analytics platform using Pentaho Data Integration, including data modeling, ETL, and reporting dashboards.
- Acted as a member of the Bytemark architects' team to redesign the mobile ticketing platform from the ground up. Oversaw the data modeling, data access layer, and conversion script implementation.
- Designed a flexible OLTP data model for handling different transit agency business rules related to trips, fare structures, and scheduling.
Data Warehouse - Senior Data Engineer
OnDeck Capital
- Designed and built a marketing campaign generation procedure allowing OnDeck to directly contact millions of potential customers. Reduced mailing cycle from 18 to 6 days and improved response rates and customer tracking.
- Acted as a major contributor to the central OLTP database design and optimizations used as a back end to the award-winning OnDeck Online (ODO) credit decision engine. Contributed to the company's IPO in December 2014.
- Developed data ingestion scripts to facilitate data movement from disparate business systems into the newly built OLTP environment.
- Introduced best practices for data consumers, coding style guides, and other company data standards.
- Designed and implemented a business system report generation process. Provided an aggregated view of the enterprise data funnels—from prospective clients to closed loans.
Data Engineer
- Developed and expanded dynamic ETL model (DBLoader) designed to push data from an in-memory data grid in Oracle Coherence into a relational database of customer's choice. Supported PostgreSQL, Greenplum, SQL Server 2005/2008, and Db2.
- Oversaw the continuous data warehouse normalization and denormalization efforts to balance between consistency, performance, and access complexities due to the constant introduction of new business rules and data entities and increasing data volumes.
- Optimized and tuned long-running analytical queries and ETL upsert processes. Reduced data landing time by 80% for the largest company client.
- Provided the database support for recent acquisitions such as Meebo (Google+), Nik Software (Google Photos) and Channel Intelligence (Google Shopping), and DoubleClick (Google AdSense).
- Wrote the documentation of database design and procedures for newly acquired companies. Created the playbook and troubleshooting manuals.
- Automated the SQL Server installation, replication, and data integration tasks using PowerShell and T-SQL scripts.
DBA and Data Engineer Consultant
American Express
- Led, architected, and actively participated in the complete overhaul of the American Express Global AML Investigation Tracking System (GAITS) from inception through several major releases.
- Completed several data remodeling iterations of the GAITS OLTP and OLAP data models to meet the bank secrecy, Patriot Act data collection, and FinCEN suspicious activity filing requirements.
- Implemented data pipelines in Perl and SQL Server to ingest AML investigation data and supporting documents into the GAITS transactional database.
- Led the design and implementation of the first OLTP and OLAP databases and application servers for the Amex FI unit from the ground up. This included server hardware and software installation and configuration and RAID configuration and security.
- Implemented data warehouse ETL scripts in Perl, SQL Server, and SSIS. Added downstream audience-specific data marts, as well as report generation and publishing mechanisms, which reduced the audit preparation process by 90%.
Data Engineer
Metropolitan Jewish Health System
- Developed Claim Remittance Data Integration scripts in Perl and MSSQL Server, allowing remitted claim lines to be merged into the centralized claims database. This software eliminated the need for a manual remittance process that was previously in place.
- Applied pharmacy data ingestion process in MSSQL DTS. Implementation included an insurance member matching algorithm to aid with hand-written processing prescriptions, reducing the number of member mismatches from 65% to 5%.
- Implemented Nursing Home Supply Inventory data model, ETL, and report generation and delivery mechanism in MSSQL Server and Crystal Reports. The solution provided a previously unsupported analysis of supply usage, frequency, and spending.
Experience
ETL Model for Data Extraction from In-memory Grid into Reporting Database
http://www.txvia.comI developed and expanded a dynamic, configuration-based ETL model (DBLoader) that generated data pipeline code within the context of the TxVia IDE. It used client-specific e-payment models to generate table mappings for each upcoming release.
This ETL model allowed a single person to keep up with the client-specific source system changes, improve ETL mechanics, and simultaneously manage over 30 instances of the ETL process.
Database: PostgreSQL (EC2), Greenplum (EC2), SQL Server (on-premise), Db2 (on-premise)
Company: TxVia (acquired by Google)
Unified Data Model for Transit Agency's Business Rules
I designed an OLTP data model using a supertype-subtype pattern and completed it with data access methods to support all possible transit agency's fares and scheduling business rules and models.
The unified data model dealt with costly client implementations, improved query performance, and offered a 50% reduction in cloud infrastructure costs.
Database: Amazon RDS for MySQL
Company: Bytemark (acquired by Siemens AG)
Optimization of Long-running ETL Process
I optimized the legacy Python ETL process through code refactoring, query optimization, and improvements to resource management. I also reduced data pipeline execution time from 12 to 3.5 hours.
Database: Amazon RDS for MySQL, Amazon Redshift
Company: Motivate International (acquired by Lyft)
Education
Coursework in Digital and Graphic Design
Parsons New School of Design - New York, NY, USA
Bachelor's Degree in Computer Science
Binghamton University - Binghamton, NY, USA
Certifications
Complete Python Bootcamp
Udemy
Skills
Libraries/APIs
JasperReports, Snowpark, PySpark, Stripe API
Tools
Amazon Redshift Spectrum, MySQL Performance Tuning, Stitch Data, Apache Airflow, Pentaho Data Integration (Kettle), Amazon Athena, Crystal Reports, DTS, AWS Glue, Microsoft Excel, Looker, Oracle Coherence, Tableau, BigQuery, Zapier, Microsoft Power BI
Languages
SQL, Python, Perl, T-SQL (Transact-SQL), Snowflake, Python 3, PHP
Paradigms
ETL, Design Patterns, Database Design, Business Intelligence (BI), Microservices, REST, OLAP, Serverless Architecture
Platforms
Amazon Web Services (AWS), Pentaho, Amazon EC2, Google Cloud Platform (GCP), Oracle, Salesforce, Databricks, JVM
Storage
Database Architecture, PostgreSQL, Redshift, Distributed Databases, MySQL, Greenplum, Microsoft SQL Server, Amazon S3 (AWS S3), OLTP, SQL Server DBA, Amazon Aurora, PL/SQL, Database Management Systems (DBMS), RDBMS, Database Migration, SQL Server Integration Services (SSIS), Data Pipelines, Database Administration (DBA), Databases, Relational Databases, Database Replication, Data Integration, Database Performance, SQL Performance, Google Cloud, Oracle PL/SQL, Data Lakes, NoSQL, IBM Db2, Apache Hive, Amazon DynamoDB
Frameworks
Windows PowerShell, Presto
Industry Expertise
Healthcare
Other
Query Optimization, Database Optimization, Data Analytics, Data Warehousing, Amazon RDS, Data Modeling, High-availability Clusters, Data Architecture, Data Warehouse Design, Data Engineering, ETL Tools, GDPR, Data Protection, Business Analysis, Market Insights, Database Schema Design, Reporting, Integration, Data Migration, Analytics, Data Reporting, ELT, Relational Database Design, Architecture, Roadmaps, Data Management, Finance, Minimum Viable Product (MVP), Solution Architecture, eCommerce, Performance, SSIS Custom Components, Data Governance, Big Data, Data Wrangling, Data Analysis, Data Synthesis, Anti-money Laundering (AML), Customer Relationship Management (CRM), Healthcare Services, Data, Data Visualization, Big Data Architecture, Financial Services, Oracle8i, Geospatial Data, Fivetran, Data Build Tool (dbt), Google BigQuery, Real Estate
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring