Sergey Dmitriev, Software Architecture Developer in Seattle, United States
Sergey Dmitriev

Software Architecture Developer in Seattle, United States

Member since August 24, 2018
Sergey is a senior data management professional, solution architect, and cloud architect with over 20 years of experience developing data-intense applications as well as building and leading technical teams to successfully deliver challenging software development and migration projects. He is skilled in all aspects of software design and development with demonstrated expertise in application delivery planning, design, and development.
Sergey is now available for hire

Portfolio

  • Dropbox
    SQL, Python, Apache Airflow, Spark, Spark SQL, Apache Hive...
  • Facebook
    SQL, Python, Apache Hive, Presto DB, Spark SQL, Spark, Database Development...
  • Gradient
    Google Cloud, Apache Airflow, Python, Google BigQuery, AWS...

Experience

  • Data Integration 20 years
  • SQL 20 years
  • ETL 20 years
  • Software Architecture 10 years
  • Python 6 years
  • Snowflake 5 years
  • PySpark 5 years
  • Databricks 3 years

Location

Seattle, United States

Availability

Part-time

Preferred Environment

AWS, Google Cloud Platform (GCP), Spark, PySpark, Python, Apache Hive, Databricks, Snowflake, Apache Airflow, Fivetran

The most amazing...

...project I've completed: anomaly detection for authorization service (Facebook - Data Infrastructure Security)

Employment

  • Staff Data Engineer

    2021 - PRESENT
    Dropbox
    • Build reliable integration with Google Ads Services.
    • Migrate data pipelines from legacy platform to Spark and Airflow.
    • Consolidate legacy data platform elements into strategic one.
    Technologies: SQL, Python, Apache Airflow, Spark, Spark SQL, Apache Hive, Database Development, ETL, Database Design, AWS S3, Amazon EC2, Git, Databases, Data Modeling, Software Architecture, Shell Scripting, PySpark, Linux, Data Integration
  • Staff Data Engineer

    2019 - 2021
    Facebook
    • Build anomaly detection platform for Data Platform authorization service.
    • Implemented infrastructure analytics for real-time communication between Instagram and Messenger.
    • Created unified analytics platform for operational health monitoring for real-time communication products.
    Technologies: SQL, Python, Apache Hive, Presto DB, Spark SQL, Spark, Database Development, ETL, Database Design, Databases, Data Modeling, Software Design, Shell Scripting, PySpark, Linux, Data Integration
  • Senior Backend Engineer

    2018 - 2019
    Gradient
    • Build data platform to ingest and process data from different sources.
    • Created data pipelines to power BI analytics and ML algorithms.
    • Build monitoring and alerting tool to simplify Data Platform operations.
    Technologies: Google Cloud, Apache Airflow, Python, Google BigQuery, AWS, Database Development, ETL, Database Design, AWS S3, Amazon EC2, Git, Databases, Data Modeling, Software Design, Software Architecture, Shell Scripting, Linux, Data Integration
  • Lead Solution Architect. Data Platforms.

    2017 - 2018
    Amazon Web Services
    • Planned and implemented relational database migrations to AWS.
    • Designed and implemented data warehouses, data lakes, and operational data stores in AWS.
    • Designed and implemented data pipelines on AWS using SQL and Python.
    • Created data models and database components for databases (Oracle) hosted on AWS RDS and EC2.
    • Optimized performance of reports and SQL queries for databases (Oracle) hosted in AWS.
    • Created labs, sales demos, and conference activities for database migrations to AWS.
    Technologies: Python, SQL, Amazon Web Services (AWS), Relational Databases, Database Development, ETL, Oracle RDBMS, Database Design, Amazon EC2, AWS CloudFormation, Amazon Aurora, Redshift, Pandas, Hadoop, Spark, Git, Databases, Data Modeling, Software Design, Software Architecture, Shell Scripting, PySpark, Linux, AWS EMR, Data Integration
  • Lead Data Architect

    2005 - 2017
    Deutsche Bank
    • Consolidated legacy Oracle databases on an Exadata (merging models and data, migrating data, modifying PL/SQL, shell, and Java code). Database size: 100Tb.
    • Optimized performance for reporting components on Exadata from hours running time to seconds.
    • Created data model and database components (Oracle) for high an application managing a lifecycle of listed derivatives transactions (2000 transaction/sec).
    • Designed and implemented data model and database code for risk management platform to capture risk model parameters for every risk calculation for compliance reporting.
    • Designed and implemented a dynamically configured reporting engine (in PL/SQL) for processing 30Tb dataset.
    • Designed and implemented data model and database code for real-time warehouse for Sales IT department, receiving information from 150+ feeds and applying complicated logic to calculate sales commissions.
    Technologies: Shell Scripting, Databases, PL/SQL, Java, SQL, Exadata, Oracle, Python, Database Development, ETL, Oracle RDBMS, Database Design, Data Modeling, Software Design, Software Architecture, Linux, Data Integration
  • Senior Database Developer, DBA

    2000 - 2005
    INIT-S
    • Designed and developed document management systems and resource management systems for nuclear power plants to manage each power plant’s entire documentation management process.
    • Enhanced and automated resource management, performed database migration between different platforms (Sybase ASE, MS SQL Server, Oracle), database servers administration, deployment packages creation, consulting customers.
    • Migrated critical databases from Sybase ASE to Oracle.
    Technologies: SQL, Erwin, C#, Sybase, Microsoft SQL Server, Oracle, Database Development, ETL, Oracle RDBMS, Database Design, Databases, Data Modeling, Shell Scripting, Data Integration

Experience

  • Transformation Program for Core Banking Equity Settlement Application

    Rebuilt a platform using COBOL, C++, and EJB 1.0 components with six databases 100Tb in size on Oracle 10g to Java application hosted in on-premises cloud platform with consolidation database on an Oracle Exadata cluster.

  • Data Pipelines on Google Compute Cloud

    I build data pipelines loading data from JSON formatted file in Google Cloud Storage to Bigquery with complicated transformation logic in SQL, aggregating and loading data into Data Mart in Postgres (Google SQL) to be used by application UI.

  • Data Pipelines on AWS

    I used Hadoop as a source for data pipelines as well as execution platform to run Pig, Hive, and Presto. I used Spark in data pipelines to do ETL in batch mode

    I have Python experience in building data pipelines for data warehouses and data science projects. I have experience building on-premises and cloud data pipelines as well as backend serverless cloud APIs on AWS. I used Spark to compute intense data pipelines in batch mode and SparkSQL. I’m very comfortable working with Spark and will learn new use cases quickly.

  • Anomaly detection for Data Platform access authorization

    The solution monitored and validated decisions of Data Platform authorization service and identified suspicions behavior.
    The suspicions behavior might be caused by a service bug or security configuration issues.

  • Infrastructure analytics for real-time communication platform.

    Collect logs from infrastructure services and client applications, build data pipelines to create core datasets, metrics and dashboards. Create ML pipelines to calculate results of A/B tests. Implement a tool to attribute metric movements using root cause analysis ML algorithm.

Skills

  • Languages

    SQL, Snowflake, Python
  • Frameworks

    Presto DB, AWS EMR, Spark, Hadoop, Apache Thrift
  • Libraries/APIs

    PySpark, Pandas
  • Tools

    Apache Airflow, Erwin, Oracle Exadata, Spark SQL, AWS CloudFormation, Git, Tableau
  • Paradigms

    Database Development, ETL, Database Design
  • Platforms

    Oracle, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Amazon EC2, Linux
  • Storage

    Relational Databases, Databases, Oracle RDBMS, Redshift, AWS S3, Data Integration, Apache Hive, Amazon Aurora, Google Cloud
  • Other

    Data Modeling, Shell Scripting, Google BigQuery, Software Design, Software Architecture, AWS, AWS RDS

Education

  • Master's Degree in Computer Science
    1997 - 2003
    Moscow Power Engineering Institute - Moscow, Russia

Certifications

  • AWS Certified Solution Architect - Associate
    JULY 2017 - JULY 2019
    AWS
  • Oracle Certified Professional (DBA)
    MARCH 2005 - PRESENT
    Oracle

To view more profiles

Join Toptal
Share it with others