Selahattin is available for hire

Selahattin Gungormus

Verified Expert in Engineering

Data Engineer and Developer

Location

Istanbul, Turkey

Toptal Member Since

May 4, 2021

Selahattin is a data engineer with several years of hands-on experience building scalable data integration solutions using open-source technologies. He excels at developing data applications using distributed processing platforms such as Hadoop, Spark, and Kafka. Selahattin also has practical experience in cloud architecture types such as AWS and Azure, as well as developing microservices using Python and JavaScript frameworks

Data Engineering Data Warehousing Data Warehouse Design ETL SQL PL/SQL Data Pipelines Databases Python Apache Airflow Apache Spark Spark Hadoop Big Data Architecture Apache Kafka MapReduce Greenplum

Portfolio

Afiniti

Apache Spark, Python, Redis, Greenplum, Kubernetes, TypeScript, SQL...

Iyzico

Apache Airflow, Spark, Spark Streaming, Python, ETL Development...

Majestech

Apache Spark, Python, Apache Airflow, Node.js, Hadoop, SQL, Data Modeling...

Experience

ETL Development - 10 years SQL - 9 years Databases - 7 years Python - 6 years Apache Spark - 5 years Apache Airflow - 5 years Amazon Web Services (AWS) - 4 years Azure - 3 years

Availability

Part-time

Preferred Environment

Apache Airflow, Visual Studio Code (VS Code), Apache Spark, Amazon Web Services (AWS), Azure, Jupyter Notebook

The most amazing...

...thing I've done is to build a product that leverages Apache Spark for data processing and can be operated with drag-n-drop visual interfaces.

Work Experience

Lead Data and Back-end Engineer

2019 - PRESENT

Afiniti

Built a highly scalable, containerized data integration platform using Spark, Docker/Kubernetes, Python, and Greenplum database.
Wrapped up whole data pipeline procedures in an easy-to-deploy templating system, capable of running at scale with good performance. That effort made the data pipeline process 70% faster.
Created data models and pipelines for the application, resulting in powering dashboard reports with over 10 million events.
Established and standardized CI/CD pipeline processes across the team using Jenkins, Bitbucket, and Kubernetes.
Built and maintained an app's back-end service using Node.js, JavaScript, and GraphQL.

Technologies: Apache Spark, Python, Redis, Greenplum, Kubernetes, TypeScript, SQL, Data Modeling, Database Design, Apache Kafka, Data Pipelines, Data Engineering

Senior Data Engineer

2019 - 2019

Iyzico

Reengineered and optimized the existing data pipeline processes by creating a new technology stack using Airflow, Python, Spark, and Exasol database.
Accomplished the migration of over 300 data pipeline jobs from Talend to the new data platform which improved daily ETL performance by 60% (from eight hours to three hours).
Created a real-time data feed from transactional systems to dashboards using Spark Streaming and Kafka. That new functionality boosted operational efficiency for performance monitoring during peak hours.
Made an integration through AWS and provided daily data-marts to AWS Redshift service to make daily reports available to the global board.

Technologies: Apache Airflow, Spark, Spark Streaming, Python, ETL Development, Amazon Web Services (AWS), Data Engineering

Owner | Big Data Engineer | Instructor

2015 - 2019

Majestech

Provided consultancy and training services to transform data architectures of SMEs with cloud-based alternatives such as Amazon Web Services and Azure.
Delivered over ten data integration projects for businesses in the retail, banking, and telecommunications sectors. Transformed data integration processes to utilize cloud platforms such as AWS and Azure.
Built a clickstream data application to collect web traces of app users and store them in a data lake with minimal latency. Used Kafka and Spark Streaming on AWS as the technology base.
Launched a cloud-based data integration product: Integer8 on the AWS platform.
Built a visual interface for non-developer data professionals who wanted to leverage Hadoop and Spark distributed processing capabilities.
Provided big data engineering training with Cloudera partnership (over 20 training sessions).
Created data integration pipelines on AWS Snowflake Cloud DB using Apache Airflow and S3 Connectors.

Technologies: Apache Spark, Python, Apache Airflow, Node.js, Hadoop, SQL, Data Modeling, Apache Kafka, Amazon EC2, Amazon Web Services (AWS), Data Engineering

Data Engineer

2012 - 2015

i2i Systems

Implemented data quality testing automation with Python and used Oracle metadata information to produce daily automated tasks assessing possible issues on daily pipelines.
Created daily integration pipelines to feed enterprise data warehouse on ODS and RDS layers.
Built, for a telecommunication operator, a market optimization project's data preparation layer. Data from 35+ million subscribers were collected from five different source systems into a denormalized data structure with Oracle Data Integrator.

Technologies: Oracle, PL/SQL, Data Warehouse Design, Oracle Data Integrator 11g, Python, Data Pipelines, Data Engineering

Experience

Integer8 Data Integrator

https://www.f6s.com/integer8

A visual data integration product designed to run on a web application with drag-n-drop components. Any data professional without coding experience can use it to build data pipelines with a 100% visual experience. It leverages the Apache Spark execution engine and works on top of Hadoop platforms.

I created my startup with two developers in 2015 to launch the Integer8 product both on local and international marketplaces. I designed and led the development effort to make the product feasible for local SMEs. At the end of the first year, we deployed our platform to two different retail companies.

I became a cloud partner for Microsoft Azure in Turkey and spent one more year making Integer8 eligible for Azure Marketplace. At the end of this effort, Integer8 successfully became an official Azure Marketplace product.

Data Warehouse Transformation for a Mobile Payment Company

Over 300 data pipeline tasks were transformed from Talend into Airflow on a Python/Spark data architecture running on distributed Celery. The daily denormalized payment dataset is refreshed on Azure Blob Storage. The daily ETL duration was reduced by 60%.

I designed and implemented whole data pipeline processes as the responsible data engineer for the new data platform. I built a CDC mechanism from MySQL database into Kafka to provide a pub/sub-event system for near real time integration. I then prepared live Spark Streaming jobs to consume Kafka topics to refresh target data-stores. That helped the marketing and operations team to monitor the workload on the system and detect anomalies.

All data sources were consolidated into two main data marts for the Tableau reporting layer. Daily pre-aggregated tables helped live reports to perform 400% faster than the previous implementation. That also increased the motivation of using reporting tools by power-users all over the organization.

Cloud ETL Automation on AWS

I prepared a cloud ETL automation solution by using AWS Lambda along with Python for a client project. In this project, I was responsible for connecting REST APIs and event sources for new events and collecting the CRM information from AWS S3 Buckets. I used AWS EC2 layers to provide additional support for S3 and Pandas in Python environments. Individual Lambda functions were created to collect, cleanse and transform data sources on serverless architecture and writing them into the destination database.

As the target database I used Amazon Redshift. So individual events are emitted from Amazon EventBridge into Lambda functions and accumulated into Redshift database for further analysis.

Skills

Languages

Python, SQL, JavaScript, Scala, Snowflake, TypeScript

Frameworks

Apache Spark, Hadoop, Spark

Tools

Apache Airflow, Amazon CloudWatch

Paradigms

ETL, MapReduce, Database Design

Storage

PL/SQL, Databases, Data Pipelines, Redis, Greenplum, HDFS, HBase, Apache Hive, Amazon S3 (AWS S3)

Other

Data Modeling, Data Warehousing, Data Warehouse Design, ETL Development, Data Engineering, Data Architecture, Big Data Architecture, OOP Designs, Data Structures, Algorithms

Libraries/APIs

Spark Streaming, Node.js, Pandas

Platforms

Azure, Apache Kafka, Oracle, Amazon Web Services (AWS), Docker, Visual Studio Code (VS Code), Kubernetes, Jupyter Notebook, Oracle Data Integrator 11g, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2

Education

2005 - 2010

Bachelor's Degree in Computer Engineering

Istanbul Technical University - Istanbul, Turkey

Certifications

SEPTEMBER 2013 - PRESENT

Cloudera Certified Developer for Apache Hadoop

Cloudera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring