Verified Expert in Engineering
Data Engineer and Developer
Apache Airflow, Visual Studio Code (VS Code), Apache Spark, Amazon Web Services (AWS), Azure, Jupyter Notebook
The most amazing...
...thing I've done is to build a product that leverages Apache Spark for data processing and can be operated with drag-n-drop visual interfaces.
Lead Data and Back-end Engineer
- Built a highly scalable, containerized data integration platform using Spark, Docker/Kubernetes, Python, and Greenplum database.
- Wrapped up whole data pipeline procedures in an easy-to-deploy templating system, capable of running at scale with good performance. That effort made the data pipeline process 70% faster.
- Created data models and pipelines for the application, resulting in powering dashboard reports with over 10 million events.
- Established and standardized CI/CD pipeline processes across the team using Jenkins, Bitbucket, and Kubernetes.
Senior Data Engineer
- Reengineered and optimized the existing data pipeline processes by creating a new technology stack using Airflow, Python, Spark, and Exasol database.
- Accomplished the migration of over 300 data pipeline jobs from Talend to the new data platform which improved daily ETL performance by 60% (from eight hours to three hours).
- Created a real-time data feed from transactional systems to dashboards using Spark Streaming and Kafka. That new functionality boosted operational efficiency for performance monitoring during peak hours.
- Made an integration through AWS and provided daily data-marts to AWS Redshift service to make daily reports available to the global board.
Owner | Big Data Engineer | Instructor
- Provided consultancy and training services to transform data architectures of SMEs with cloud-based alternatives such as Amazon Web Services and Azure.
- Delivered over ten data integration projects for businesses in the retail, banking, and telecommunications sectors. Transformed data integration processes to utilize cloud platforms such as AWS and Azure.
- Built a clickstream data application to collect web traces of app users and store them in a data lake with minimal latency. Used Kafka and Spark Streaming on AWS as the technology base.
- Launched a cloud-based data integration product: Integer8 on the AWS platform.
- Built a visual interface for non-developer data professionals who wanted to leverage Hadoop and Spark distributed processing capabilities.
- Provided big data engineering training with Cloudera partnership (over 20 training sessions).
- Created data integration pipelines on AWS Snowflake Cloud DB using Apache Airflow and S3 Connectors.
- Implemented data quality testing automation with Python and used Oracle metadata information to produce daily automated tasks assessing possible issues on daily pipelines.
- Created daily integration pipelines to feed enterprise data warehouse on ODS and RDS layers.
- Built, for a telecommunication operator, a market optimization project's data preparation layer. Data from 35+ million subscribers were collected from five different source systems into a denormalized data structure with Oracle Data Integrator.
Integer8 Data Integratorhttps://www.f6s.com/integer8
I created my startup with two developers in 2015 to launch the Integer8 product both on local and international marketplaces. I designed and led the development effort to make the product feasible for local SMEs. At the end of the first year, we deployed our platform to two different retail companies.
I became a cloud partner for Microsoft Azure in Turkey and spent one more year making Integer8 eligible for Azure Marketplace. At the end of this effort, Integer8 successfully became an official Azure Marketplace product.
Data Warehouse Transformation for a Mobile Payment Company
I designed and implemented whole data pipeline processes as the responsible data engineer for the new data platform. I built a CDC mechanism from MySQL database into Kafka to provide a pub/sub-event system for near real time integration. I then prepared live Spark Streaming jobs to consume Kafka topics to refresh target data-stores. That helped the marketing and operations team to monitor the workload on the system and detect anomalies.
All data sources were consolidated into two main data marts for the Tableau reporting layer. Daily pre-aggregated tables helped live reports to perform 400% faster than the previous implementation. That also increased the motivation of using reporting tools by power-users all over the organization.
Cloud ETL Automation on AWS
As the target database I used Amazon Redshift. So individual events are emitted from Amazon EventBridge into Lambda functions and accumulated into Redshift database for further analysis.
Apache Spark, Hadoop, Spark
Apache Airflow, Amazon CloudWatch
ETL, MapReduce, Database Design
PL/SQL, Databases, Data Pipelines, Redis, Greenplum, HDFS, HBase, Apache Hive, Amazon S3 (AWS S3)
Data Modeling, Data Warehousing, Data Warehouse Design, ETL Development, Data Engineering, Data Architecture, Big Data Architecture, OOP Designs, Data Structures, Algorithms
Spark Streaming, Node.js, Pandas
Azure, Apache Kafka, Oracle, Amazon Web Services (AWS), Docker, Visual Studio Code (VS Code), Kubernetes, Jupyter Notebook, Oracle Data Integrator 11g, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2
Bachelor's Degree in Computer Engineering
Istanbul Technical University - Istanbul, Turkey
Cloudera Certified Developer for Apache Hadoop