Mehmet Sahin, Developer in London, United Kingdom
Mehmet is available for hire
Hire Mehmet

Mehmet Sahin

Verified Expert  in Engineering

Data Engineer and Developer

Location
London, United Kingdom
Toptal Member Since
June 18, 2020

Mehmet is a developer who works with all known RDBMS and NoSQL databases, AWS-GCP cloud providers, and popular big data tools like Hadoop, Hive, Spark, Kafka, and Elasticsearch, as well as with PySpark and Kafka. Thanks to a profound knowledge of ETL tools like Oracle GoldenGate and Informatica, he developed several apps with Python, most of them related to ETL, where Mehmet especially shines. Recently, Mehmet's been fascinated by graph databases, PyTorch, Keras, and deep learning.

Portfolio

Sentium Consulting -- Roche Pharmaceuticals -- Genentech
Hadoop, Spark, PySpark, Python, Pandas, SQL, Kedro, Google Cloud Platform (GCP)...
Seguridad, Inc
Python, Data Engineering, Amazon Web Services (AWS), AWS Lambda...
Turkish Department of Information Technologies
Apache Airflow, Big Data, Elasticsearch, Linux, Oracle, Gremlin, ETL Tools...

Experience

Availability

Full-time

Preferred Environment

Linux, Informatica, SQL, Python, RDBMS, NoSQL, ETL, Amazon Web Services (AWS), Google Cloud Platform (GCP), Big Data

The most amazing...

...project I've done was an ETL system that processes two TB of data daily into the HBase graph format. Users can now easily see previously hidden relationships.

Work Experience

Senior Data Engineer

2021 - 2023
Sentium Consulting -- Roche Pharmaceuticals -- Genentech
  • Created ETL pipelines using Dataproc clusters on the Google Cloud Platform. Created Spark jobs and manipulated data with PySpark, Python, and Pandas. Built machine learning reproducible, maintainable, and modular pipelines using the Kedro package.
  • Collected terabytes of data from various sources like sensors, RDBMS, and flat files and loaded them to GCP BigQuery.
  • Gained different insights into the data using machine learning algorithms by the DS team, depending on the prepared data.
Technologies: Hadoop, Spark, PySpark, Python, Pandas, SQL, Kedro, Google Cloud Platform (GCP), Google BigQuery, Google Cloud Dataproc, Jira, Confluence, Data Engineering, PyTorch, ETL

Data Engineer

2022 - 2022
Seguridad, Inc
  • Developed AWS Lambda services to collect, transform, aggregate, clean, and store structured and unstructured pharmacy data using Python.
  • Built the MySQL AWS Aurora database and load data with Python and SQLAlchemy ORM.
  • Loaded data to S3, built the Athena database, and created Athena SQL queries.
  • Built a fully automated system to collect, aggregate, and store data using AWS services.
Technologies: Python, Data Engineering, Amazon Web Services (AWS), AWS Lambda, Amazon Simple Queue Service (SQS), HIPAA Compliance, MySQL, Amazon Aurora

Project Manager | Developer

2018 - 2021
Turkish Department of Information Technologies
  • Managed a big data platform that included Hadoop, Hive, HBase, Spark, Kafka, Solr, and Elasticsearch on 18 nodes with Cloudera.
  • Created Spark jobs and manipulated data with PySpark and a Spark stream.
  • Developed graph databases, executed a migration project from RDBMS to a graph database, and developed apps with Gremlin.
  • Processed 2TB of data daily by converting it to a graph format. Thus, users were able to identify previously unseen relationships easily.
  • Prepared Python and Bash scripts for the transfer of external data. These were monitored and scheduled with Apache Airflow.
  • Brought a new perspective to the data concerning the running of various graph algorithms, such as shortest path or PageRank.
Technologies: Apache Airflow, Big Data, Elasticsearch, Linux, Oracle, Gremlin, ETL Tools, Oracle GoldenGate, Apache Kafka, PySpark, Spark, SQL, Python, Solr, HBase, Apache Hive, Hadoop, Data Engineering, GRAPH, Keras, GIS, ETL

ETL and Database Developer

2014 - 2018
Turkish Department of Information Technologies
  • Gathered workflows and data from many different sources under a single ETL system.
  • Built an ETL system that is more manageable, monitorable, and has a lower error rate.
  • Created and maintained a very healthy and always live ETL system that processed daily terabytes of real-time data.
Technologies: Python, Data Warehouse Design, Data Warehousing, Oracle, Informatica, Microsoft SQL Server, PostgreSQL, ETL Tools, Oracle GoldenGate, SQL, Teradata, Oracle Exadata, Oracle RDBMS, Data Engineering, GIS, ETL, SQL Stored Procedures

Business Intelligence Platform Developer

2012 - 2014
Turkish Department of Information Technologies
  • Designed and built an ETL system using SSIS and MSSQL CDC.
  • Quickly created reports using OLAP Cubes (which were prepared using SSAS).
  • Enabled it so that users could quickly access the requested reports. Additionally, with ad-hoc reporting, they could create the reports they wanted. Reports were presented by SSRS and PowerBI.
Technologies: Microsoft SQL Server, ETL Tools, SQL, DAX, MDX, Microsoft Power BI, SharePoint, SQL Server Reporting Services (SSRS), SSAS, SQL Server Integration Services (SSIS), ETL, Data Warehousing

Property Graph Database System

A graph data environment with an HBase infrastructure was created by combining databases collected from various data sources but related to each other and held in separate systems.

I worked as the ETL and database developer, and I managed all clusters, created Spark jobs, and prepared data pipelines with GoldenGate, Kafka, and Airflow. Also, I was responsible for designing the graph structure and improving the performance of Gremlin queries.

Exadata Migration

Multiple Oracle databases were merged and moved to the Exadata environment, and I made the necessary configurations to make this happen. I prepared dozens of tables, procedures, triggers, functions, and views required in the new database. I also made all the necessary performance improvements. The size of the new database serves about 40 TB and 1,000 users.

Company ETL Project

The project's goal was to gather the company's data flows under a single system while working together with many technologies. For this purpose, various mini services were prepared, especially in Python and Bash. Along with data sources such as flat files, MSSQL, and Oracle, images and unstructured data were also used. By using Oracle GoldenGate in the center, I prepared a monitorable, manageable, and high-performant ETL system.

Dynamic Machine Learning Pipeline

I created Kedro ETL and ML pipelines using Dataproc clusters on Google Cloud Platform, collected terabytes of data from various sources like sensors, RDBMS, and flat files, and loaded them to GCP Bigquery. In the end, the company gained different insights into the data using machine learning algorithms by the DS team depending on the data that I prepared.
2016 - 2019

Bachelor's Degree in Computer Engineering

University of Ankara - Ankara, Turkey

2015 - 2019

Bachelor's Degree in Faculty of Law

Anadolu University - Ankara, Turkey

2008 - 2012

Bachelor's Degree in Security Sciences

Police Academy - Ankara, Turkey

JANUARY 2023 - JANUARY 2026

AWS Certified Database - Specialty (DBS)

Amazon Web Services

OCTOBER 2021 - PRESENT

AWS Certified Cloud Practitioner

Amazon Web Services

JUNE 2020 - PRESENT

Improving Deep Neural Networks: Hyperparameter Tuning, Regularization, and Optimization

Coursera

MAY 2020 - PRESENT

Neural Networks and Deep Learning by Deeplearning.ai

Coursera

MAY 2020 - PRESENT

Machine Learning by Stanford University

Coursera

MARCH 2020 - PRESENT

Deep Learning A-Z™: Hands-on Artificial Neural Networks

Udemy

DECEMBER 2019 - PRESENT

Cloudera Developer Training for Apache Spark and Hadoop

Cloudera, ExitCertified

JUNE 2017 - PRESENT

SQL Tuning for Oracle

Oracle University

MARCH 2017 - PRESENT

Oracle Database 12c: Introduction to SQL Ed 1.1

Oracle University

OCTOBER 2016 - PRESENT

Oracle Database 12c Administration Workshop Ed 2

Infopark

OCTOBER 2016 - PRESENT

Oracle Database 12c: Backup and Recovery Workshop

Infopark

SEPTEMBER 2016 - PRESENT

Oracle Database 12c Install and Upgrade Workshop Ed 1

İnfoPark

MAY 2015 - PRESENT

Predictive Modeling, Segmentation and Relational Rules, Time Series, Sequential Events with SPSS

AIMS

MARCH 2014 - PRESENT

Implementing a Data Warehouse with Microsoft SQL Server 2012

BNT Pro

Languages

Python, SQL, Gremlin, Bash, MDX

Tools

Oracle GoldenGate, GIS, PyCharm, Solr, Oracle Exadata, SSAS, Microsoft Power BI, Apache Airflow, Google Cloud Dataproc, Jira, Confluence, Amazon Simple Queue Service (SQS)

Paradigms

ETL, HIPAA Compliance

Storage

Data Pipelines, HBase, PostgreSQL, Elasticsearch, Microsoft SQL Server, SQL Stored Procedures, Apache Hive, Oracle RDBMS, Teradata, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), MySQL, Amazon Aurora, RDBMS, NoSQL, Redshift, Amazon DynamoDB

Other

ETL Tools, Data Warehousing, Data Warehouse Design, Informatica, GRAPH, Data Engineering, Big Data, DAX, Cloud, Kedro, Google BigQuery, Amazon Neptune, Amazon RDS

Frameworks

Spark, Hadoop

Libraries/APIs

PySpark, PyTorch, Keras, Pandas

Platforms

Oracle, Apache Kafka, Linux, SharePoint, Google Cloud Platform (GCP), Amazon Web Services (AWS), AWS Lambda

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring