Sergey Dmitriev, Developer in Seattle, United States
Sergey is available for hire
Hire Sergey

Sergey Dmitriev

Verified Expert  in Engineering

Software Architect and Developer

Location
Seattle, United States
Toptal Member Since
June 18, 2020

Sergey is a senior data management professional, solution architect, and cloud architect with over 20 years of experience developing data-intense applications as well as building and leading technical teams to successfully deliver challenging software development and migration projects. Sergey is skilled in all aspects of software design and development with demonstrated expertise in application delivery planning, design, and development.

Portfolio

Dropbox
SQL, Python, Apache Airflow, Spark, Spark SQL, Apache Hive...
Facebook
SQL, Python, Apache Hive, Presto, Spark SQL, Spark, Database Development, ETL...
Gradient
Google Cloud, Apache Airflow, Python, Google BigQuery...

Experience

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Google Cloud Platform (GCP), Spark, PySpark, Python, Apache Hive, Databricks, Snowflake, Apache Airflow, Fivetran

The most amazing...

...project I've completed involved anomaly detection for an authorization service (Facebook, data infrastructure security).

Work Experience

Staff Data Engineer

2021 - PRESENT
Dropbox
  • Implemented a reliable integration with Google Ads Services.
  • Migrated data pipelines from a legacy platform to Spark and Airflow.
  • Consolidated legacy data platform elements into a strategic one.
Technologies: SQL, Python, Apache Airflow, Spark, Spark SQL, Apache Hive, Database Development, ETL, Database Design, Amazon S3 (AWS S3), Amazon EC2, Git, Databases, Data Modeling, Software Architecture, Shell Scripting, PySpark, Linux, Data Integration

Staff Data Engineer

2019 - 2021
Facebook
  • Built an anomaly detection platform for a data platform authorization service.
  • Implemented infrastructure analytics for real-time communication between Instagram and Messenger.
  • Created a unified analytics platform for operational health monitoring for real-time communication products.
Technologies: SQL, Python, Apache Hive, Presto, Spark SQL, Spark, Database Development, ETL, Database Design, Databases, Data Modeling, Software Design, Shell Scripting, PySpark, Linux, Data Integration

Senior Back-end Engineer

2018 - 2019
Gradient
  • Built a data platform to ingest and process data from different sources.
  • Created data pipelines for Power BI analytics and ML algorithms.
  • Constructed a monitoring and alerting tool to simplify data platform operations.
Technologies: Google Cloud, Apache Airflow, Python, Google BigQuery, Amazon Web Services (AWS), Database Development, ETL, Database Design, Amazon S3 (AWS S3), Amazon EC2, Git, Databases, Data Modeling, Software Design, Software Architecture, Shell Scripting, Linux, Data Integration

Lead Solution Architect of Data Platforms

2017 - 2018
Amazon Web Services
  • Planned and implemented relational database migrations to AWS.
  • Designed and implemented data warehouses, data lakes, and operational data stores in AWS.
  • Designed and implemented data pipelines on AWS using SQL and Python.
  • Created data models and database components for databases (Oracle) hosted on AWS RDS and EC2.
  • Optimized the performance of reports and SQL queries for databases (Oracle) hosted in AWS.
  • Created labs, sales demos, and conference activities for database migrations to AWS.
Technologies: Python, SQL, Amazon Web Services (AWS), Relational Databases, Database Development, ETL, Oracle RDBMS, Database Design, Amazon EC2, AWS CloudFormation, Amazon Aurora, Redshift, Pandas, Hadoop, Spark, Git, Databases, Data Modeling, Software Design, Software Architecture, Shell Scripting, PySpark, Linux, Amazon Elastic MapReduce (EMR), Data Integration

Lead Data Architect

2005 - 2017
Deutsche Bank
  • Consolidated legacy Oracle databases (up to 100TB in size) on an Exadata which included merging models and data, migrating data, and modifying PL/SQL, Shell, and Java code.
  • Optimized the performance for reporting components on Exadata from hours-long running time to seconds.
  • Created data model and database components (Oracle) for an application managing a lifecycle of listed derivatives transactions (2,000 transactions per second).
  • Designed and implemented a data model and database code for a risk management platform to capture risk model parameters for every risk calculation for compliance reporting.
  • Designed and implemented a dynamically configured reporting engine (in PL/SQL) for processing a 30TB dataset.
  • Designed and implemented a data model and database code for a real-time warehouse for the sales IT department, receiving information from 150+ feeds, and applying complicated logic to calculate sales commissions.
Technologies: Shell Scripting, Databases, SQL, Oracle, Python, Database Development, ETL, Oracle RDBMS, Database Design, Data Modeling, Software Design, Software Architecture, Linux, Data Integration

Senior Database Developer | Database Administrator

2000 - 2005
INIT-S
  • Designed and developed document management systems and resource management systems for nuclear power plants to manage each power plant’s entire documentation process.
  • Enhanced and automated resource management, performed database migration between different platforms (Sybase ASE, MS SQL Server, Oracle), database server administration, deployment packages creation, and consulting customers.
  • Migrated critical databases from Sybase ASE to Oracle.
Technologies: SQL, Erwin, Oracle, Database Development, ETL, Oracle RDBMS, Database Design, Databases, Data Modeling, Shell Scripting, Data Integration

Transformation Program for a Core Banking Equity Settlement Application

I rebuilt a platform using COBOL, C++, and EJB 1.0 components with six databases 100TB in size on Oracle 10g to a Java application hosted on an on-premise cloud platform with a consolidation database on an Oracle Exadata cluster.

Data Pipelines on Google Compute Cloud

I've built data pipelines loading data from JSON formatted files in Google Cloud Storage to BigQuery with complicated transformation logic in SQL, aggregating and loading data into a data mart in PostgreSQL (Google SQL) to be used by an application's UI.

Data Pipelines on AWS

I used Hadoop as a source for data pipelines as well as an execution platform to run Pig, Hive, and Presto. I used Spark in data pipelines to do ETL in batch mode

My Python experience includes building data pipelines for data warehouses and data science projects. I've also built on-premises and cloud data pipelines as well as back-end serverless cloud APIs on AWS.

I've used also Spark to compute intense data pipelines in batch mode and Spark SQL so I'm very comfortable working with Spark and will learn new use cases quickly.

Anomaly Detection for Data Platform Access Authorization

The solution monitored and validated the decisions of a data platform authorization service and identified suspicious behavior. This suspicious behavior might be caused by a service bug or security configuration issues.

Infrastructure Analytics for a Real-time Communication Platform

This project involved collecting logs from infrastructure services and client applications and building data pipelines to create core datasets, metrics, and dashboards. I also made ML pipelines to calculate the results of A/B tests as well as implemented a tool to attribute metric movements using a root cause analysis ML algorithm.
1997 - 2003

Master's Degree in Computer Science

Moscow Power Engineering Institute - Moscow, Russia

JULY 2017 - JULY 2019

AWS Certified Solution Architect - Associate

AWS

MARCH 2005 - PRESENT

Oracle Certified Professional (DBA)

Oracle

Libraries/APIs

PySpark, Pandas

Tools

Apache Airflow, Erwin, Oracle Exadata, Spark SQL, Amazon Elastic MapReduce (EMR), AWS CloudFormation, Git, Tableau

Frameworks

Presto, Spark, Hadoop, Apache Thrift

Languages

SQL, Snowflake, Python, COBOL

Paradigms

Database Development, ETL, Database Design

Platforms

Oracle, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Amazon EC2, Linux, Apache Pig

Storage

Relational Databases, Databases, Oracle RDBMS, Redshift, Amazon S3 (AWS S3), Data Integration, Apache Hive, Amazon Aurora, Google Cloud, PostgreSQL

Other

Data Modeling, Shell Scripting, Google BigQuery, Software Design, Software Architecture, Amazon RDS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring