Sergey Dmitriev, Software Architect and Developer in Seattle, United States
Sergey Dmitriev

Software Architect and Developer in Seattle, United States

Member since June 18, 2020
Sergey is a senior data management professional, solution architect, and cloud architect with over 20 years of experience developing data-intense applications as well as building and leading technical teams to successfully deliver challenging software development and migration projects. Sergey is skilled in all aspects of software design and development with demonstrated expertise in application delivery planning, design, and development.
Sergey is now available for hire

Portfolio

  • Dropbox
    SQL, Python, Apache Airflow, Spark, Spark SQL, Apache Hive...
  • Facebook
    SQL, Python, Apache Hive, Presto DB, Spark SQL, Spark, Database Development...
  • Gradient
    Google Cloud, Apache Airflow, Python, Google BigQuery...

Experience

  • Data Integration 20 years
  • SQL 20 years
  • ETL 20 years
  • Software Architecture 10 years
  • Python 6 years
  • Snowflake 5 years
  • PySpark 5 years
  • Databricks 3 years

Location

Seattle, United States

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Google Cloud Platform (GCP), Spark, PySpark, Python, Apache Hive, Databricks, Snowflake, Apache Airflow, Fivetran

The most amazing...

...project I've completed involved anomaly detection for an authorization service (Facebook, data infrastructure security).

Employment

  • Staff Data Engineer

    2021 - PRESENT
    Dropbox
    • Implemented a reliable integration with Google Ads Services.
    • Migrated data pipelines from a legacy platform to Spark and Airflow.
    • Consolidated legacy data platform elements into a strategic one.
    Technologies: SQL, Python, Apache Airflow, Spark, Spark SQL, Apache Hive, Database Development, ETL, Database Design, Amazon S3 (AWS S3), Amazon EC2, Git, Databases, Data Modeling, Software Architecture, Shell Scripting, PySpark, Linux, Data Integration
  • Staff Data Engineer

    2019 - 2021
    Facebook
    • Built an anomaly detection platform for a data platform authorization service.
    • Implemented infrastructure analytics for real-time communication between Instagram and Messenger.
    • Created a unified analytics platform for operational health monitoring for real-time communication products.
    Technologies: SQL, Python, Apache Hive, Presto DB, Spark SQL, Spark, Database Development, ETL, Database Design, Databases, Data Modeling, Software Design, Shell Scripting, PySpark, Linux, Data Integration
  • Senior Back-end Engineer

    2018 - 2019
    Gradient
    • Built a data platform to ingest and process data from different sources.
    • Created data pipelines for Power BI analytics and ML algorithms.
    • Constructed a monitoring and alerting tool to simplify data platform operations.
    Technologies: Google Cloud, Apache Airflow, Python, Google BigQuery, Amazon Web Services (AWS), Database Development, ETL, Database Design, Amazon S3 (AWS S3), Amazon EC2, Git, Databases, Data Modeling, Software Design, Software Architecture, Shell Scripting, Linux, Data Integration
  • Lead Solution Architect of Data Platforms

    2017 - 2018
    Amazon Web Services
    • Planned and implemented relational database migrations to AWS.
    • Designed and implemented data warehouses, data lakes, and operational data stores in AWS.
    • Designed and implemented data pipelines on AWS using SQL and Python.
    • Created data models and database components for databases (Oracle) hosted on AWS RDS and EC2.
    • Optimized the performance of reports and SQL queries for databases (Oracle) hosted in AWS.
    • Created labs, sales demos, and conference activities for database migrations to AWS.
    Technologies: Python, SQL, Amazon Web Services (AWS), Relational Databases, Database Development, ETL, Oracle RDBMS, Database Design, Amazon EC2, AWS CloudFormation, Amazon Aurora, Redshift, Pandas, Hadoop, Spark, Git, Databases, Data Modeling, Software Design, Software Architecture, Shell Scripting, PySpark, Linux, AWS EMR, Data Integration
  • Lead Data Architect

    2005 - 2017
    Deutsche Bank
    • Consolidated legacy Oracle databases (up to 100TB in size) on an Exadata which included merging models and data, migrating data, and modifying PL/SQL, Shell, and Java code.
    • Optimized the performance for reporting components on Exadata from hours-long running time to seconds.
    • Created data model and database components (Oracle) for an application managing a lifecycle of listed derivatives transactions (2,000 transactions per second).
    • Designed and implemented a data model and database code for a risk management platform to capture risk model parameters for every risk calculation for compliance reporting.
    • Designed and implemented a dynamically configured reporting engine (in PL/SQL) for processing a 30TB dataset.
    • Designed and implemented a data model and database code for a real-time warehouse for the sales IT department, receiving information from 150+ feeds, and applying complicated logic to calculate sales commissions.
    Technologies: Shell Scripting, Databases, PL/SQL, Java, SQL, Exadata, Oracle, Python, Database Development, ETL, Oracle RDBMS, Database Design, Data Modeling, Software Design, Software Architecture, Linux, Data Integration
  • Senior Database Developer | Database Administrator

    2000 - 2005
    INIT-S
    • Designed and developed document management systems and resource management systems for nuclear power plants to manage each power plant’s entire documentation process.
    • Enhanced and automated resource management, performed database migration between different platforms (Sybase ASE, MS SQL Server, Oracle), database server administration, deployment packages creation, and consulting customers.
    • Migrated critical databases from Sybase ASE to Oracle.
    Technologies: SQL, Erwin, C#, Sybase, Microsoft SQL Server, Oracle, Database Development, ETL, Oracle RDBMS, Database Design, Databases, Data Modeling, Shell Scripting, Data Integration

Experience

  • Transformation Program for a Core Banking Equity Settlement Application

    I rebuilt a platform using COBOL, C++, and EJB 1.0 components with six databases 100TB in size on Oracle 10g to a Java application hosted on an on-premise cloud platform with a consolidation database on an Oracle Exadata cluster.

  • Data Pipelines on Google Compute Cloud

    I've built data pipelines loading data from JSON formatted files in Google Cloud Storage to BigQuery with complicated transformation logic in SQL, aggregating and loading data into a data mart in PostgreSQL (Google SQL) to be used by an application's UI.

  • Data Pipelines on AWS

    I used Hadoop as a source for data pipelines as well as an execution platform to run Pig, Hive, and Presto. I used Spark in data pipelines to do ETL in batch mode

    My Python experience includes building data pipelines for data warehouses and data science projects. I've also built on-premises and cloud data pipelines as well as back-end serverless cloud APIs on AWS.

    I've used also Spark to compute intense data pipelines in batch mode and Spark SQL so I'm very comfortable working with Spark and will learn new use cases quickly.

  • Anomaly Detection for Data Platform Access Authorization

    The solution monitored and validated the decisions of a data platform authorization service and identified suspicious behavior. This suspicious behavior might be caused by a service bug or security configuration issues.

  • Infrastructure Analytics for a Real-time Communication Platform

    This project involved collecting logs from infrastructure services and client applications and building data pipelines to create core datasets, metrics, and dashboards. I also made ML pipelines to calculate the results of A/B tests as well as implemented a tool to attribute metric movements using a root cause analysis ML algorithm.

Skills

  • Languages

    SQL, Snowflake, Python, COBOL
  • Frameworks

    Presto DB, AWS EMR, Spark, Hadoop, Apache Thrift
  • Libraries/APIs

    PySpark, Pandas
  • Tools

    Apache Airflow, Erwin, Oracle Exadata, Spark SQL, AWS CloudFormation, Git, Tableau
  • Paradigms

    Database Development, ETL, Database Design
  • Platforms

    Oracle, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Amazon EC2, Linux, Apache Pig
  • Storage

    Relational Databases, Databases, Oracle RDBMS, Redshift, Amazon S3 (AWS S3), Data Integration, Apache Hive, Amazon Aurora, Google Cloud, PostgreSQL
  • Other

    Data Modeling, Shell Scripting, Google BigQuery, Software Design, Software Architecture, Amazon RDS

Education

  • Master's Degree in Computer Science
    1997 - 2003
    Moscow Power Engineering Institute - Moscow, Russia

Certifications

  • AWS Certified Solution Architect - Associate
    JULY 2017 - JULY 2019
    AWS
  • Oracle Certified Professional (DBA)
    MARCH 2005 - PRESENT
    Oracle

To view more profiles

Join Toptal
Share it with others