Yuriy Margulis, Big Data Developer in Los Angeles, CA, United States
Yuriy Margulis

Big Data Developer in Los Angeles, CA, United States

Member since June 19, 2016
Yuriy is a data specialist with over 15 years of experience in data warehousing, data engineering, feature engineering, big data, ETL/ELT, and business intelligence. As a big data architect and engineer, Yuriy specializes in AWS and Azure frameworks, Spark/PySpark, Databricks, Hive, Redshift, Snowflake, relational databases, tools like Fivetran, Airflow, DBT, Presto/Athena, and data DevOps frameworks and toolsets.
Yuriy is now available for hire




Los Angeles, CA, United States



Preferred Environment

AWS, Spark, Snowflake

The most amazing...

...project I've grown was a PriceGrabber data warehouse from five to 17 subjects through many platform changes. I wrote multiple lines of SQL and other scripting c


  • Data Engineer

    2020 - PRESENT
    Maisonette (via Toptal)
    • Built a data platform and data lake.
    Technologies: AWS, PostgreSQL, Snowflake, Airflow, Python, Looker, Fivetran, DBT
  • Consultant | Co-founder | CEO

    2016 - PRESENT
    Crowd Consulting
    • Worked on full data warehouse implementations for multiple clients.
    • Provided big data training and support.
    • Engineered and built an ETL pipeline for AWS S3 data warehouse using AWS Kinesis, Lambda, Hive, Presto, and Spark. The pipeline was written in Python.
    Technologies: AWS EMR, Hadoop, Spark, Databricks, Presto/Athena, Hive, AWS Lambda, AWS Redshift, AWS RDS: Postgres, MySQL, DynamoDB, AWS Lambda, AWS S3, Python, Scala, Luigi, Tableau
  • Data Engineering Architect

    2020 - 2020
    CVS Health (via Toptal)
    • ETL and feature engineering - personalization engine.
    Technologies: Azure, Databricks, Spark, Python, Scala, RapidsAI
  • Data Engineer

    2019 - 2020
    Teespring (via Toptal)
    • Migrated a data warehouse ETL pipeline from Airflow/Redshift to Fivetran, Databricks, and Snowflake.
    Technologies: AWS, Fivetran, Databricks, Snowflake, Spark, Python, Airflow, Redshift, APIs
  • Data Engineer

    2018 - 2019
    BCG GAMMA (via Toptal, Three Contracts)
    • Provided engineering support for data scientists.
    • Designed and built a featured engineering data mart and customer 360° data lake in AWS S3.
    • Designed and developed a dynamic S3-to-S3 ETL system in Spark and Hive.
    • Completed various DevOps tasks included an Airflow installation, development of Ansible playbooks, and history backloads.
    • Worked on a feature engineering project which involved Hortonworks, Spark, Python, Hive, and Airflow.
    • Built a one-on-one marketing feature engineering pipeline in PySpark on Microsoft Azure and Databricks (used ADF, ADL, Databricks Delta Lake, and ADW as a source).
    Technologies: Python, Spark, Hive, Presto, Athena, Glue, RDS, PostgreSQL, Airflow, Boto 3 API, Ansible
  • Vice President, Data

    2017 - 2018
    • Managed the data engineering, BI reporting, and data science teams.
    • Worked as a hands-on data engineer.
    • Built a data lake on AWS.
    • Developed a reporting system with Redash/Presto.
    Technologies: AWS EMR: Hadoop, Spark, Presto, Hive, AWS Redshift, AWS RDS: PostgreSQL, MySQL, AuroraDB, AWS S3, Python, Airflow, Redash
  • Big Data Architect

    2016 - 2017
    • Worked in a full-time position, as a data architect for a transaction cost analysis system.
    • Installed a four-node Apache Hadoop/Spark cluster on ITG's private cloud.
    • Conducted platform POC embedding Apache Spark technology into ITG's data platform.
    • Supported the development of a platform POC for Kx Kdb+; also converted Sybase IQ queries to Kdb+ Q language.
    Technologies: Apache Hadoop, Hive, Spark, Python, Sybase ASE, Sybase IQ, Informatica, Kdb+, Q
  • Data Engineer

    2016 - 2017
    American Taekwondo Association (via Toptal)
    • Converted data from a legacy Oracle database to a newly designed SQL Server database.
    • Wrote SQL scripts, stored procedures, kettle transformations.
    • Administered two databases.
    • Performed extensive data cleansing and validation.
    Technologies: MS SQL Server, Oracle, Pentaho (Kettle)
  • Director, Data Warehouse

    2015 - 2016
    • Managed two data warehouses and BI teams for both PriceGrabber and Shopzilla. Connexity is also known as PriceGrabber, Shopzilla, and BizRate.
    • Handled operational support for the PriceGrabber data warehouse. Recovered data warehouse after the data center migration.
    • Merged one data warehouse into another and retired one of them. Hands-on designed business and data integration architecture; developed data validation scripts and ETL integration code. Managed the transfer of a BI reporting system from Cognos to OBIEE and Tableau.
    • Defined the technology platform change strategy for the combined data warehouse.
    • Created SQL: PL SQL stored procedures, packages, and anonymous scripts for ETL and data validation.
    • Completed an Amazon Redshift project.
    • Worked on and completed a Cloudera Impala project.
    Technologies: Oracle, PL/SQL, AWS Redshift, Hadoop, Impala, Cognos, OBIEE, Tableau, Perl, Python, Linux
  • Director, Data Warehouse

    2008 - 2015
    • Oversaw the company's data services, defined the overall and technical strategy for data warehousing, business intelligence, and big data environments.
    • Hired and managed a mixed on-shore (US)/off-shore (India) engineering team.
    • Replatformed a data warehouse to Oracle Exadata X3/Oracle ZFS combination, added big data and machine learning components to the data warehousing environment.
    • Supported 24x7x365 operations in compliance with the company's top-level production SLA.
    • Wrote thousands of lines of PL/SQL, PL/pgSQL, MySQL, and HiveQL code.
    • Wrote ETL scripting in Perl, Python, and JavaScript internally in Kettle.
    • Worked with big data on multiple types of projects (Hadoop, Pig, Hive, and Mahaut).
    • Developed a tool-based ETL for a Pentaho (Kettle) CE ETL redesign project.
    • Worked on machine learning for various types of projects (Python, SciPy, NumPy, and Pandas).
    Technologies: Oracle, Hadoop, Pig, Hive, PostgreSQL, MySQL, Perl, Python, Pentaho (Kettle), Linux
  • Director, Data Warehouse

    2007 - 2008
    • Managed a data warehouse team and project pipeline; supported operations.
    • Created PL/SQL stored procedures, packages, and anonymous scripts for ETL and data validation.
    • Worked on a tool-based ETL for multiple Informatica projects.
    Technologies: Oracle, Informatica, Perl, Linux
  • Manager, Data Warehouse

    2003 - 2007
    Universal Music Group
    • Managed, developed, and operated a CRM data warehouse.
    • Wrote PL/SQL, MySQL, and Perl code.
    • Administered to a Cognos reporting system.
    • Worked on C# for multiple supporting projects for the OLAP reporting system.
    • Designed and developed a MSAS OLAP cube system.
    Technologies: Oracle, SQL Server, MySQL, Cognos, C#, Perl, Lynux
  • Director, Decision Support and Financial Systems

    2001 - 2003
    MediaLive International
    • Managed a data warehouse, BI, and CRM systems.
    • Assumed responsibilities over an Oracle EBS application team.
    • Developed the PL/SQL coding for a data warehouse ETL and Oracle Application integration.
    • Worked with SQL server for multiple Transact-SQL and analysis service projects.
    • Worked on a tool-based ETL for multiple epiphany EPI*Channel projects.
    Technologies: Oracle, Oracle EBS, SQL Server,VB, Epiphany, Unix
  • Senior Principal Consultant (Professional Services, Essbase Practice)

    1999 - 2001
    Hyperion (Currently: Oracle)
    • Led a practice for a consulting company covering for multiple clients.
    • Developed Essbase satellite systems: relational data warehouses and data marts, reporting systems, ETL systems, CRM's, EPP's, ETL in and out of Essbase and with Essbase itself.
    • Worked on multiple PL/SQL projects, by providing full support of the team's Oracle project pipeline.
    • Helped to develop SQL servers for multiple Transact-SQL and analysis services projects.
    • Developed a tool-based ETL for an Informatica project.
    • Worked with Hyperion, Essbase, Enterprise, Pillar, planning, financial analyzers, and VBA projects.
    Technologies: Oracle, SQL Server, Hyperion Essbase, VBA, Informatica


  • Languages

    Python, SQL, PL/pgSQL, Snowflake, Perl
  • Frameworks

    Apache Spark, AWS EMR, Hadoop
  • Tools

    Apache Airflow, AWS Athena, Pentaho Data Integration (Kettle), AWS Glue, Informatica PowerCenter
  • Paradigms

    ETL, Business Intelligence (BI), Management, Database Design
  • Platforms

    Oracle, Databricks, Azure
  • Storage

    PostgreSQL, Apache Hive, Databases, Oracle PL/SQL, Redshift, Microsoft SQL Server, MySQL, AWS RDS, Cassandra, Essbase
  • Other

    Data Warehousing, Data Architecture, Leadership, Team Mentoring, Technology Strategy & Architecture, Big Data, Software Development, Fivetran, Data Warehouse, perlpod, Unix Shell Scripting, MSAS, Cognos 10


  • Certificate of Completion in Data Science and Engineering with Apache Spark
    2016 - 2016
    UC BerkeleyX (Online Courses from Berkeley) - Berkeley, California (USA)
  • Certificate of Completion in Cloudera Developer Training for Apache Hadoop
    2012 - 2012
    Cloudera University - New York, New York (USA)
  • Certificate of Completion in Oracle Database Administration
    1995 - 1995
    UCI Extension - Irvine, California (USA)
  • Diploma (Master of Science equivalent) degree in Applied Mathematics
    1975 - 1980
    Odessa I.I. Mechnikov University - Odessa, Ukraine

To view more profiles

Join Toptal
Share it with others