Lead Big Data Engineer2016 - PRESENTEPAM Systems
Technologies: REST APIs, Apache Kafka, OpenShift, Docker, Hadoop, Spark, Python, Scala, Java
- Implemented the first stable, repeatable, and reusable data ingestion ETL for the fifth-largest US bank.
- Found and resolved the issue in the Cloudera cluster resource manager configuration. Proposed other changes to the default cluster configuration. Applied all of the suggestions reduced job latency and helped save $2.5 billion annually.
- Implemented a streaming ETL for collecting and unifying insurance company information about brokers and agencies. The new runtime source of truth helped to save the majority of time and efforts for the sales team on collecting data.
- Implemented a reusable DSL-based data ingestion framework that provided out-of-the-box flexibility in the implementation of new ETL jobs, collecting logs and deployments.
- Implemented a crowdsourcing platform based on the Amazon Mechanical Turk service for collecting a golden data set for a further supervised training ML platform.
Senior Big Data Engineer2014 - 2016Lohika Systems (Altran Group)
Technologies: Spark, Hadoop, Java
- Redesigned and reimplemented Hadoop v1 batch-oriented ETL to Storm based architecture.
- Migrated the custom computation engine of a fintech company to a Spark-based one. Migrated data from Neo4j and RDBMS to parquet files.
- Built a content enrichment service for a trend prediction system between social networks content.
Senior Java Engineer2011 - 2014Epam Systems
Technologies: Amazon Web Services (AWS), Hibernate, AWS, Spring, Jenkins, MySQL, Java
- Implemented one of the first Bid Data ETL and analytic systems in the industry for the biggest advertising company on the US market, which later was acquired by PayPal based on company performance.
- Implemented key functionality in one of the MVP for the EU government required using triplets, Semantic Web, SPARQL, and Alegro Graph to find matched and relative lows between the EU countries.
- Implemented crowd sourcing system for collecting golden data set to speed up supervised ML platform learning, increasing quality of results. Results being used by media company for more precocious tagging and labeling sold media content.