Selahattin is currently unavailable

Selahattin Gungormus

Verified Expert in Engineering

Data Engineer and Developer

Istanbul, Turkey

Toptal member since May 4, 2021

Expertise

Data Warehouse Data Engineering Database ETL Python SQL Big Data Architecture Apache Airflow Spark Hadoop AWS Databricks AWS Lambda

Bio

Selahattin is a senior data engineer with 10+ years of experience designing and building scalable data platforms using cloud-native and open-source technologies. He has a proven track record of developing high-performance data pipelines with Snowflake, dbt, Databricks, and Airflow, and significantly improving data reliability and accessibility. Selahattin is highly proficient in SQL and Python, with hands-on expertise in AWS, Azure, Kafka, and data modeling.

Portfolio

Pandora

Databricks, Azure Databricks, Azure Data Lake, Azure Synapse, Azure Event Hubs...

BCG - Corporate Marketing

Python, AWS Lambda, Amazon S3 (AWS S3), Snowflake, Data Build Tool (dbt)...

Pex

Snowflake, Data Build Tool (dbt), Data Vault 2.0, Apache Airflow, Python...

Experience

Python - 9 years
SQL - 9 years
Apache Spark - 8 years
Amazon Web Services (AWS) - 7 years
Apache Airflow - 5 years
Apache Kafka - 5 years
Data Build Tool (dbt) - 5 years
Snowflake - 3 years

Preferred Environment

Apache Airflow, Apache Spark, Snowflake, Data Build Tool (dbt), Databricks, Azure Data Lake, Azure Data Factory (ADF), AWS Glue, AWS Lambda, Apache Kafka

The most amazing...

...thing I've done is design and develop a highly scalable, containerized data integration platform using Apache Spark, Kubernetes, Python, and Greenplum database.

Work Experience

Senior Data Engineer

2025 - PRESENT

Pandora

Developed data flows to efficiently ingest near-real-time data from Kafka into Azure Blob using Databricks, laying the foundation for the Data Lake layer and improved data accessibility.
Prepared data models for Common Data Model (CDM) and Reference Data Model (RDM), ensuring that data organization met business needs and enhanced reporting capabilities.
Developed T-SQL procedures to populate data in a day-minus-one fashion in Azure SQL, which streamlined data availability and supported timely decision-making.
Collaborated with Azure Synapse pipelines to build robust data movement and orchestration flows, helping to automate processes and increase overall efficiency in data handling.

Technologies: Databricks, Azure Databricks, Azure Data Lake, Azure Synapse, Azure Event Hubs, Apache Kafka, Transact-SQL (T-SQL), Microsoft Power BI, Azure SQL Databases

Senior Data Engineer

2021 - 2025

BCG - Corporate Marketing

Designed and developed data integration pipelines using AWS Glue, Python, Snowflake, and dbt to build a data lake for the reporting requirements of the BCG Marketing team.
Prepared CI/CD pipelines to standardize development and deployment processes using GitHub Actions.
Designed a configuration-driven data ingestion framework for the Sprinklr Social Media Management platform, enabling efficient incremental API ingestion at scale.
Contributed to a CMS system migration by designing data models compatible with both legacy and target environments and integrating existing system data into the new CMS data model.
Designed event-based data ingestion pipelines using Python, AWS Lambda, Amazon S3, and Snowflake.

Technologies: Python, AWS Lambda, Amazon S3 (AWS S3), Snowflake, Data Build Tool (dbt), dbt Cloud, AWS Glue, ETL Development, Data Lakes, Data Integration, APIs, CI/CD Pipelines, Git, Streaming Data, Stream Processing

Senior Data Engineer

2022 - 2024

Pex

Contributed to the development of a data lakehouse within the analytics team, which significantly improved our capacity to address complex analytic requirements. This enabled more efficient data analysis and the generation of valuable insights.
Used dbt and Snowflake to construct transformation pipelines using the Data Vault modeling method, helping to streamline data management and enhance accessibility. This approach made it easier for analysts to retrieve and utilize data effectively.
Developed a robust data ingestion framework using Airflow and Python, which efficiently synchronized billions of rows to Snowflake, ensuring timely data availability for analysis.

Technologies: Snowflake, Data Build Tool (dbt), Data Vault 2.0, Apache Airflow, Python, Apache Spark, ETL Development, Data Lakes, Git, Data Integration, PySpark, Batch

Senior Data Engineer

2022 - 2022

Gartner

Developed efficient data pipelines for ingesting and transforming data, which streamlined internal reporting and enhanced product capabilities.
Utilized Python, AWS Batch, and Terraform to build and deploy data applications, ensuring reliable performance and scalability across our systems.
Collaborated with cross-functional teams to identify data needs, which helped align our analytics efforts with business objectives.
Implemented best practices for data management, improving data quality and accessibility for stakeholders across the organization.

Technologies: Python, AWS Batch, Amazon S3 (AWS S3), Terraform, ETL Development, Batch, Data Integration, Git, CI/CD Pipelines

Lead Data and Back-end Engineer

2019 - 2021

Afiniti

Developed a highly scalable, containerized data integration platform using Apache Spark, Kubernetes, Python, and Greenplum database, which improved our infrastructure's adaptability to varying workloads.
Led a back-end team of five developers within a larger international product development group of 50 members, which fostered collaboration and increased project efficiency.
Achieved a significant enhancement by wrapping up the entire data pipeline procedures into an easy-to-deploy templating system, helping to increase the data pipeline process speed by 70%.
Built the back-end architecture for a web-based AI product utilizing TypeScript, Node.js, and GraphQL, which provided a robust foundation for our applications and improved their performance.
Standardized CI/CD pipeline processes across the team using Jenkins, Bitbucket, and Kubernetes, which streamlined deployment workflows and ensured consistency in our development practices.

Technologies: Apache Spark, Python, Redis, Greenplum, Kubernetes, TypeScript, SQL, Data Modeling, Database Design, Apache Kafka, Data Pipelines, Data Engineering, ETL Development, Spark, PySpark, Data Integration, CI/CD Pipelines, APIs

Senior Data Engineer

2019 - 2019

Iyzico/PayU

Re-engineered data warehouse processes by developing a new technology stack using Airflow, Python, Apache Spark, and Exasol DB, which streamlined operations and improved efficiency.
Accomplished the migration of over 300 ETL jobs from Talend to the new platform, significantly enhancing the overall data processing capabilities.
Created a real-time data feed from transactional systems to dashboards using Spark Streaming and Kafka. That new functionality boosted operational efficiency for performance monitoring during peak hours.
Reduced daily ETL duration from eight to just three hours, which freed up valuable time for the team to focus on more strategic tasks.
Created reusable data transformation modules for Airflow, enabling Type-1 and Type-2 transformations, which improved data handling flexibility and consistency.
Prepared data mart layers for efficient reporting by building a pre-processed aggregated table, which significantly speeds up response times for reporting requests.
Improved the performance of the most frequently used dashboards by 70%, enhancing decision-making capabilities for business users.
Prepared data ingestion solutions using AWS Lambda and Amazon S3 to consume event-based data generated in 3rd-party systems.

Technologies: Apache Airflow, Spark, Spark Streaming, Python, Amazon Web Services (AWS), Data Engineering, ELT, ETL Development

Owner | Cloud Architect | Instructor

2015 - 2019

Majestech

Provided consulting and training to SMEs, guiding their transition to cloud-based data architectures on AWS and Azure, improving data scalability and management capabilities.
Accomplished over 10 projects across various industries, including retail, banking, and telecommunications, demonstrating a proven track record of delivering impactful data solutions.
Built a real-time clickstream data application using Apache Kafka and Apache Spark, capturing user web events and storing them in a Data Lake with minimal latency to support analytics and monitoring.
Developed scalable data models for a retail company, leveraging Azure Data Factory and Azure Data Lake to deliver a reliable, production-ready reporting platform.
Built a visual interface for non-developer data professionals who wanted to leverage Hadoop and Spark distributed processing capabilities.
Instructed 20+ Big Data engineering courses in partnership with Cloudera, helping to elevate the skills of aspiring data professionals and fostering a deeper understanding of Big Data technologies.

Technologies: Apache Spark, Python, Apache Airflow, Hadoop, SQL, Data Modeling, Apache Kafka, Amazon Web Services (AWS), Data Engineering, Azure, ETL Development, Databricks

Data Engineer

2012 - 2015

i2i Systems

Designed and implemented automated data quality testing using Python, leveraging Oracle database metadata to run daily validation tasks and proactively identify issues in ETL pipelines.
Developed and maintained daily integration pipelines using Oracle Data Integrator, loading data into ODS and RDS layers to support the Enterprise Data Warehouse (EDW) and improve cross-department data accessibility.
Built, for a telecommunication operator, a market optimization project's data preparation layer. Data from over 35 million subscribers was collected from five different source systems into a denormalized data structure using Oracle Data Integrator.
Maintained the data sources for the Market Optimization tool, which played a key role in generating targeted offers for Telco customers, ultimately enhancing marketing effectiveness.
Contributed to the development of ELT pipelines for the Enterprise Data Warehouse (EDW) of a large telecommunications operator, enabling analytics and reporting across customer, campaign, and offer domains using high-volume CDR data.

Technologies: Oracle, PL/SQL, Data Warehouse Design, Python, Data Pipelines, Data Engineering, SQL, ELT, ETL Development, Databases

Experience

Integer8 Data Integrator

ACCOMPLISHMENTS
• Founded and led the development of Integer8, a web-based visual data integration platform with drag-and-drop pipeline design, enabling non-technical users to build data workflows without coding.
• Architected the platform on Apache Spark running on Hadoop ecosystems, delivering scalable, high-performance data processing through a 100% visual user experience.
• Led the engineering team, driving end-to-end product development, architecture design, and go-to-market readiness for local SME adoption.
• Successfully deployed Integer8 to two retail enterprise customers within the first year of launch.
• Became an official Microsoft Azure Partner and led the technical and compliance efforts to onboard Integer8 to the Azure Marketplace, resulting in its acceptance as a listed marketplace product.

Data Warehouse Transformation for a Mobile Payment Company

ACCOMPLISHMENTS
• Migrated 300+ data pipeline tasks from Talend to Apache Airflow on a Python/Spark architecture running on distributed Celery, reducing daily ETL runtime by 70% and refreshing denormalized payment datasets in Azure Blob Storage.
• Designed and implemented the end-to-end data platform, including a CDC pipeline from MySQL to Kafka, to enable near real-time pub/sub integrations.
• Built Spark Streaming applications to consume Kafka topics and continuously refresh downstream data stores, enabling real-time workload monitoring and anomaly detection for marketing and operations teams.
• Consolidated all data sources into two centralized data marts for the Tableau reporting layer, delivering 400% faster report performance through daily pre-aggregations and driving increased adoption among power users.

Education

2005 - 2010

Bachelor's Degree in Computer Engineering

Istanbul Technical University - Istanbul, Turkey

Certifications

SEPTEMBER 2013 - PRESENT

Cloudera Certified Developer for Apache Hadoop

Cloudera

Skills

Libraries/APIs

Spark Streaming, PySpark

Tools

Apache Airflow, dbt Cloud, AWS Glue, AWS Batch, Terraform, Git, Microsoft Power BI

Languages

Python, SQL, Transact-SQL (T-SQL), Snowflake, TypeScript, Batch

Frameworks

Apache Spark, Hadoop, Spark

Paradigms

ETL, Database Design

Storage

PL/SQL, Databases, Data Pipelines, Redis, Greenplum, Apache Hive, Amazon S3 (AWS S3), Data Lakes, Data Integration, Azure SQL Databases

Platforms

Azure, Apache Kafka, Oracle, Amazon Web Services (AWS), Docker, AWS Lambda, Databricks, Azure Synapse, Azure Synapse Analytics, Kubernetes, Azure Event Hubs

Other

Data Modeling, Data Warehousing, Data Warehouse Design, ETL Development, Data Engineering, Data Build Tool (dbt), ELT, Azure Databricks, Data Structures, Azure Data Lake, Azure Data Factory (ADF), Data Vault 2.0, APIs, CI/CD Pipelines, Streaming Data, Stream Processing

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring