
Selahattin Gungormus
Verified Expert in Engineering
Data Engineer and Developer
Istanbul, Turkey
Toptal member since May 4, 2021
Selahattin is a senior data engineer with 10+ years of experience designing and building scalable data platforms using cloud-native and open-source technologies. He has a proven track record of developing high-performance data pipelines with Snowflake, dbt, Databricks, and Airflow, and significantly improving data reliability and accessibility. Selahattin is highly proficient in SQL and Python, with hands-on expertise in AWS, Azure, Kafka, and data modeling.
Portfolio
Experience
- Python - 9 years
- SQL - 9 years
- Apache Spark - 8 years
- Amazon Web Services (AWS) - 7 years
- Apache Airflow - 5 years
- Apache Kafka - 5 years
- Data Build Tool (dbt) - 5 years
- Snowflake - 3 years
Preferred Environment
Apache Airflow, Apache Spark, Snowflake, Data Build Tool (dbt), Databricks, Azure Data Lake, Azure Data Factory (ADF), AWS Glue, AWS Lambda, Apache Kafka
The most amazing...
...thing I've done is design and develop a highly scalable, containerized data integration platform using Apache Spark, Kubernetes, Python, and Greenplum database.
Work Experience
Senior Data Engineer
Pandora
- Developed data flows to efficiently ingest near-real-time data from Kafka into Azure Blob using Databricks, laying the foundation for the Data Lake layer and improved data accessibility.
- Prepared data models for Common Data Model (CDM) and Reference Data Model (RDM), ensuring that data organization met business needs and enhanced reporting capabilities.
- Developed T-SQL procedures to populate data in a day-minus-one fashion in Azure SQL, which streamlined data availability and supported timely decision-making.
- Collaborated with Azure Synapse pipelines to build robust data movement and orchestration flows, helping to automate processes and increase overall efficiency in data handling.
Senior Data Engineer
BCG - Corporate Marketing
- Designed and developed data integration pipelines using AWS Glue, Python, Snowflake, and dbt to build a data lake for the reporting requirements of the BCG Marketing team.
- Prepared CI/CD pipelines to standardize development and deployment processes using GitHub Actions.
- Designed a configuration-driven data ingestion framework for the Sprinklr Social Media Management platform, enabling efficient incremental API ingestion at scale.
- Contributed to a CMS system migration by designing data models compatible with both legacy and target environments and integrating existing system data into the new CMS data model.
- Designed event-based data ingestion pipelines using Python, AWS Lambda, Amazon S3, and Snowflake.
Senior Data Engineer
Pex
- Contributed to the development of a data lakehouse within the analytics team, which significantly improved our capacity to address complex analytic requirements. This enabled more efficient data analysis and the generation of valuable insights.
- Used dbt and Snowflake to construct transformation pipelines using the Data Vault modeling method, helping to streamline data management and enhance accessibility. This approach made it easier for analysts to retrieve and utilize data effectively.
- Developed a robust data ingestion framework using Airflow and Python, which efficiently synchronized billions of rows to Snowflake, ensuring timely data availability for analysis.
Senior Data Engineer
Gartner
- Developed efficient data pipelines for ingesting and transforming data, which streamlined internal reporting and enhanced product capabilities.
- Utilized Python, AWS Batch, and Terraform to build and deploy data applications, ensuring reliable performance and scalability across our systems.
- Collaborated with cross-functional teams to identify data needs, which helped align our analytics efforts with business objectives.
- Implemented best practices for data management, improving data quality and accessibility for stakeholders across the organization.
Lead Data and Back-end Engineer
Afiniti
- Developed a highly scalable, containerized data integration platform using Apache Spark, Kubernetes, Python, and Greenplum database, which improved our infrastructure's adaptability to varying workloads.
- Led a back-end team of five developers within a larger international product development group of 50 members, which fostered collaboration and increased project efficiency.
- Achieved a significant enhancement by wrapping up the entire data pipeline procedures into an easy-to-deploy templating system, helping to increase the data pipeline process speed by 70%.
- Built the back-end architecture for a web-based AI product utilizing TypeScript, Node.js, and GraphQL, which provided a robust foundation for our applications and improved their performance.
- Standardized CI/CD pipeline processes across the team using Jenkins, Bitbucket, and Kubernetes, which streamlined deployment workflows and ensured consistency in our development practices.
Senior Data Engineer
Iyzico/PayU
- Re-engineered data warehouse processes by developing a new technology stack using Airflow, Python, Apache Spark, and Exasol DB, which streamlined operations and improved efficiency.
- Accomplished the migration of over 300 ETL jobs from Talend to the new platform, significantly enhancing the overall data processing capabilities.
- Created a real-time data feed from transactional systems to dashboards using Spark Streaming and Kafka. That new functionality boosted operational efficiency for performance monitoring during peak hours.
- Reduced daily ETL duration from eight to just three hours, which freed up valuable time for the team to focus on more strategic tasks.
- Created reusable data transformation modules for Airflow, enabling Type-1 and Type-2 transformations, which improved data handling flexibility and consistency.
- Prepared data mart layers for efficient reporting by building a pre-processed aggregated table, which significantly speeds up response times for reporting requests.
- Improved the performance of the most frequently used dashboards by 70%, enhancing decision-making capabilities for business users.
- Prepared data ingestion solutions using AWS Lambda and Amazon S3 to consume event-based data generated in 3rd-party systems.
Owner | Cloud Architect | Instructor
Majestech
- Provided consulting and training to SMEs, guiding their transition to cloud-based data architectures on AWS and Azure, improving data scalability and management capabilities.
- Accomplished over 10 projects across various industries, including retail, banking, and telecommunications, demonstrating a proven track record of delivering impactful data solutions.
- Built a real-time clickstream data application using Apache Kafka and Apache Spark, capturing user web events and storing them in a Data Lake with minimal latency to support analytics and monitoring.
- Developed scalable data models for a retail company, leveraging Azure Data Factory and Azure Data Lake to deliver a reliable, production-ready reporting platform.
- Built a visual interface for non-developer data professionals who wanted to leverage Hadoop and Spark distributed processing capabilities.
- Instructed 20+ Big Data engineering courses in partnership with Cloudera, helping to elevate the skills of aspiring data professionals and fostering a deeper understanding of Big Data technologies.
Data Engineer
i2i Systems
- Designed and implemented automated data quality testing using Python, leveraging Oracle database metadata to run daily validation tasks and proactively identify issues in ETL pipelines.
- Developed and maintained daily integration pipelines using Oracle Data Integrator, loading data into ODS and RDS layers to support the Enterprise Data Warehouse (EDW) and improve cross-department data accessibility.
- Built, for a telecommunication operator, a market optimization project's data preparation layer. Data from over 35 million subscribers was collected from five different source systems into a denormalized data structure using Oracle Data Integrator.
- Maintained the data sources for the Market Optimization tool, which played a key role in generating targeted offers for Telco customers, ultimately enhancing marketing effectiveness.
- Contributed to the development of ELT pipelines for the Enterprise Data Warehouse (EDW) of a large telecommunications operator, enabling analytics and reporting across customer, campaign, and offer domains using high-volume CDR data.
Experience
Integer8 Data Integrator
• Founded and led the development of Integer8, a web-based visual data integration platform with drag-and-drop pipeline design, enabling non-technical users to build data workflows without coding.
• Architected the platform on Apache Spark running on Hadoop ecosystems, delivering scalable, high-performance data processing through a 100% visual user experience.
• Led the engineering team, driving end-to-end product development, architecture design, and go-to-market readiness for local SME adoption.
• Successfully deployed Integer8 to two retail enterprise customers within the first year of launch.
• Became an official Microsoft Azure Partner and led the technical and compliance efforts to onboard Integer8 to the Azure Marketplace, resulting in its acceptance as a listed marketplace product.
Data Warehouse Transformation for a Mobile Payment Company
• Migrated 300+ data pipeline tasks from Talend to Apache Airflow on a Python/Spark architecture running on distributed Celery, reducing daily ETL runtime by 70% and refreshing denormalized payment datasets in Azure Blob Storage.
• Designed and implemented the end-to-end data platform, including a CDC pipeline from MySQL to Kafka, to enable near real-time pub/sub integrations.
• Built Spark Streaming applications to consume Kafka topics and continuously refresh downstream data stores, enabling real-time workload monitoring and anomaly detection for marketing and operations teams.
• Consolidated all data sources into two centralized data marts for the Tableau reporting layer, delivering 400% faster report performance through daily pre-aggregations and driving increased adoption among power users.
Education
Bachelor's Degree in Computer Engineering
Istanbul Technical University - Istanbul, Turkey
Certifications
Cloudera Certified Developer for Apache Hadoop
Cloudera
Skills
Libraries/APIs
Spark Streaming, PySpark
Tools
Apache Airflow, dbt Cloud, AWS Glue, AWS Batch, Terraform, Git, Microsoft Power BI
Languages
Python, SQL, Transact-SQL (T-SQL), Snowflake, TypeScript, Batch
Frameworks
Apache Spark, Hadoop, Spark
Paradigms
ETL, Database Design
Storage
PL/SQL, Databases, Data Pipelines, Redis, Greenplum, Apache Hive, Amazon S3 (AWS S3), Data Lakes, Data Integration, Azure SQL Databases
Platforms
Azure, Apache Kafka, Oracle, Amazon Web Services (AWS), Docker, AWS Lambda, Databricks, Azure Synapse, Azure Synapse Analytics, Kubernetes, Azure Event Hubs
Other
Data Modeling, Data Warehousing, Data Warehouse Design, ETL Development, Data Engineering, Data Build Tool (dbt), ELT, Azure Databricks, Data Structures, Azure Data Lake, Azure Data Factory (ADF), Data Vault 2.0, APIs, CI/CD Pipelines, Streaming Data, Stream Processing
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring