Serkan Coskun, Developer in London, United Kingdom
Serkan is available for hire
Hire Serkan

Serkan Coskun

Verified Expert  in Engineering

Data Engineering Developer

Location
London, United Kingdom
Toptal Member Since
July 23, 2020

Serkan is a professionally qualified, hands-on IT specialist with proven commercial experience working as an AI/ML/Data engineer within the banking (fintech), telco, energy, and healthcare sectors. He's a self-motivated individual, capable of understanding complex technical environments and technologies. Serkan is skilled in creating, maintaining, and operating data ingestion/ETL/ELT pipelines to the data lake/layers and providing extensive data/cloud/ML functions for technology services.

Portfolio

Healthcare/Pharmaceutical Company
AutoML, Databricks, Amazon Web Services (AWS)...
Healthcare/Pharmaceutical Company
Python 3, Neo4j, Cypher, Redshift, Amazon Web Services (AWS), PostgreSQL
Yara International
Spark SQL, PySpark, Jupyter, Scikit-learn, NumPy, Pandas, REST APIs...

Experience

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Machine Learning Operations (MLOps), Spark, Big Data, Databricks, Python 3, Redshift, Snowflake, Azure, Neo4j

The most amazing...

...were an AI prediction model and root cause analysis project that built an AI model and ML techniques to identify anomalies in the manufacturing line.

Work Experience

Senior AI/ML and Data Engineer

2022 - 2022
Healthcare/Pharmaceutical Company
  • Worked on machine learning models analyzing insurance claims on AWS Databricks platform with AutoML component. The functionality of these models was to estimate the probability that a given claim would be denied.
  • Identified specific sections of a claim that may be problematic. Recommended new values for fields in a claim that were improperly populated.
  • Proposed a new combination based on similar claims in our database that were accepted by insurance if a given combination of codes is found invalid.
  • Suggested a new modifier based on similar claims in our database that were accepted by insurance if a modifier is found invalid.
  • Developed EDI Generator module for generating EDI 837 files. Accepting a JSON file with a patient, provider, and encounter data as input and producing an EDI 837 claim file as output. Used boto3, pyx12, EDI-835-parser python libraries.
  • Created the claim edit module for identifying flaws in claims and proposing fixes. This module was able to apply its own proprietary editing rules and machine learning models created by the Machine Learning Engine.
Technologies: AutoML, Databricks, Amazon Web Services (AWS), Electronic Data Interchange (EDI), Boto 3

Senior AI/ML and Data Engineer

2022 - 2022
Healthcare/Pharmaceutical Company
  • Used Knowledge Representation and Semantic Technologies (OWL, RDF, SWRL, SPARQL, JSON-LD), including semantic modeling and data integration, data unification, knowledge graph design, ontology, and taxonomy.
  • Created the data and information architecture and modeling languages such as UML, ER, IDEF, and Data Flow Diagram. Worked with mapping techniques, R2RML, for transforming relational/tabular datasets into triples.
  • Developed end-to-end data pipelines with Python and used Cypher for transforming relational and tabular datasets into Neo4j. Used py2neo and neo4j drivers for ingesting data from AWS Redshift and PostgreSQL into Neo4j DB.
  • Used arrows.app to create graph models for communicating and explaining to the stakeholders.
Technologies: Python 3, Neo4j, Cypher, Redshift, Amazon Web Services (AWS), PostgreSQL

Senior AI/ML and Data Engineer

2020 - 2022
Yara International
  • Worked on the digital farming data platform project, a cloud-based platform built on AWS, as an AI/ML data engineer in the data science team. Created, maintained, and operated E2E data pipelines and built AI/ML models for agronomy use cases.
  • Connected to external data sources via APIs, push and pull, then stored data in the AWS S3 buckets. Developed Spark Streaming jobs that read data from the valid topic of Kafka, complete transformation on the fly, then load into the Hive table.
  • Created a Lambda function that reads data from S3 buckets, validates with the Kafka schema registry based on the registered format, and then pushes the message to the relevant Kafka topic that is valid or invalid.
  • Used AWS scheduler to invoke Lambda functions periodically—hourly and daily. In Kafka, defined the schema, converted it to the Avro format then uploaded it to the schema registry. Used AWS S3 Glue-Redshift for E2E data pipelines and data lake.
  • Developed read and get microservices using Spring Boot and KStream to read the invalid topic messages or to get an exception report. Loaded data to Cassandra DB from Hive DB for democratizing the data to the third-party users via API Gateway.
  • Used Python 3, Jupyter lab for the developing use cases, AI/ML models, and used Kubeflow for the MLOps operations. Developed and built REST APIs with Python Flask for API integration with third parties.
  • Created and developed data lineage pipelines, dictionaries, and the ontology of the project and used Apache Atlas for data governance and metadata management. Elaborated business glossaries, specified the classifications, and added tags on the data.
  • Utilized Apache Ranger for the central administration of security policies and monitoring user access. Used AWS CloudWatch for all the logs and pushed to Elasticsearch and Kibana to read the log messages.
  • Built, cloned, dropped databases, and loaded data to Snowflake. Shared Snowflake table(s) among separate accounts or external users without creating a second copy of the table data. Set up and managed to compute resources to load or query.
  • Loaded structured and semi-structured data and created views and materialized views. Designed external stages for Snowflake Managed and AWS. Used Time Travel for restoring data-related objects and analyzing data usage and manipulation.
Technologies: Spark SQL, PySpark, Jupyter, Scikit-learn, NumPy, Pandas, REST APIs, Flask-Marshmallow, Flask, Flask-RESTful, Amazon Web Services (AWS), Jupyter Notebook, Python, Spark, Amazon CloudWatch, Amazon Glacier, Grafana, Prometheus, Elasticsearch, Kibana, Kubeflow, Apache Ranger, Apache Atlas, Apache Cassandra, Apache Hive, APIs, Microservices, Job Schedulers, AWS Lambda, Apache Kafka, Amazon S3 (AWS S3), Redshift, AWS Glue, Amazon Elastic MapReduce (EMR), Amazon Athena, Snowflake, Machine Learning, Data Engineering

AI-ML and Data Engineer

2018 - 2020
Healthcare/Pharmaceutical Company
  • Worked on the Safety Stock Optimiser project. Built-in AI/ML techniques to quantify demand and supply uncertainties to calculate a recommended safety stock level. Created solutions for demand uncertainty and supply uncertainty.
  • Used Azure Data Factory for end-to-end data pipelines. Manipulated and transformed data in ADLS (Azure Data Lake) raw, foundation, trusted, and unified business logic layer.
  • Made intensive use of Delta functionality and combined batch and streaming workloads. Used Azure Databricks for AI/ML model development. Visualized the results via Power BI.
  • Worked on the AI prediction model and root cause analysis project that built an AI model and ML techniques to identify the critical anomalies in manufacturing performance.
  • Reduced downtime, rectified breakdowns, predicted failures, and allowed scheduling of maintenance activities and control strategies for machine parameters to avoid breakdowns.
  • Created holistic data infrastructure and data lake/marts for manufacturing lines. Ingested structured, semi-structured, and unstructured batch and real-time/stream data from numerous data sources into various data platforms using data flow/pipelines.
  • Automated data ingestion using Oozie and Airflow. Accessed and manipulated data with HCatalog, Hive server and Query, Apache Pig, Shell scripts, Python, and PySpark.
  • Ensured high scalability, efficiency, and computation across large-scale network servers and big data distributed systems with Go channels and goroutines. Developed REST APIs with Python Flask and Django for API integration with third parties.
Technologies: SOAP, REST APIs, Presto, Big Data, Apache Sqoop, Spark, Spark SQL, Cloudera, Matplotlib, Seaborn, Scikit-learn, SciPy, TensorFlow, Pandas, NumPy, Hortonworks Data Platform (HDP), Django, Go, Flask-Marshmallow, Flask-RESTful, Flask, MapReduce, Hadoop, Apache Airflow, Hibernate Query Language (HQL), SQL, PySpark, Python, Shell, Apache Pig, Apache Hive, Azure SQL, Azure Data Lake, Azure DevOps, Azure Data Factory, Databricks, Azure Cosmos DB, Microsoft Power BI, Machine Learning, Data Engineering

Senior Data Engineer and Analytics Expert

2017 - 2018
SSE
  • Worked on a smart meter transformation project and created/maintained optimal data lake and data pipeline architecture. Assembled large, complex data sets that met functional/non-functional business requirements.
  • Identified, designed, and implemented internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
  • Collected data from different data sources, inserted in their native formats, and transformed when needed (ELT/ELT processes).
  • Optimized the report data warehouse for performance, access, and integration using Kimball-Star Schema Methodology.
Technologies: Oozie, Spark, MapReduce, HBase, Apache Hive, Spark Streaming, Apache Kafka, Apache Sqoop, Hadoop, Data Engineering

Data Engineer

2016 - 2017
Heathrow Gatwick Airport
  • Analyzed and consolidated data from various sources (SQL Server, Oracle, MySQL, flat files, XML, web services, and CSV).
  • Collected data from different data sources, manipulated, then inserted them into DWH. Optimized the report data warehouse and data mart construction for performance, access, and integration. Solved ETL problems and performance tuning.
  • Queried data analysis and profiling data using SQL in standard database applications. Optimized report data warehouse and data mart construction for performance, access, and integration.
  • Used logical/physical database design techniques for both structured and unstructured data stores. Used business process modeling, data flow modeling, and data lineage modeling.
  • Created revenue dashboards and performance reports for the CXO level managers. Monitored sales pipeline and provided useful reports to the sales team.
Technologies: Python, Oracle, SQL Server 2016, ETL Tools

Graph DB Neo4j

• Used Knowledge Representation and Semantic Technologies (OWL, RDF, SWRL, SPARQL, JSON-LD), including semantic modeling and data integration, data unification, knowledge graph design, ontology, and taxonomy
• Created data/information architecture and modeling languages such as UML, ER, IDEF, Data Flow Diagram
• Worked with mapping techniques, R2RML, for transforming relational/tabular datasets into triples
• Developed end-to-end data pipelines with Python and used Cypher for transforming relational/tabular datasets into Neo4j
• Used py2neo and neo4j drivers for ingesting data from AWS Redshift and PostgreSQL into Neo4j DB.
• Used arrows.app to create Graph Models for communicating and explaining to the stakeholders

ML Models and Pipelines for Agronomy

http://www.yara.com
Used Python 3 and JupyterLab for developing use cases and ML models and used KubeFlow for the MLOps. Created data dictionaries, the ontology of the use cases, and Apache Atlas for data governance and metadata management. Developed business glossaries, specified the classifications, and added tags on the data. Used Apache Ranger for the central administration of security policies and monitoring of user access. Based on GDPR regulations, tagged PII (personally identifiable information) fields via Atlas and applied relevant data processing techniques such as tokenization/masking/anonymization to Ranger's appropriate fields/tags.

AI Prediction Model and Root Cause Analysis

“AI Prediction Model and Root Cause Analysis” project that has been built using AI model and ML techniques to identify the key indicators of anomalies in manufacturing performance. Project benefits include reduced downtime, rectified breakdowns, predictable failures, allowed scheduling of maintenance activities, and control strategies for machine parameters to avoid breakdowns.

Safety Stock Optimizer

Worked on the Safety Stock Optimizer project. There is uncertainty in demand and in supply which may lead to shortfalls at the end of the replenishment cycles. Therefore, needs to be carried an appropriate amount of safety stock to protect against shortfalls. Safety Stock Optimizer project built-in AI/ML techniques to quantify demand and supply uncertainties and calculate a recommended safety stock level. This model created solutions for:
1. Demand uncertainty: uses forecast error calculation–a more realistic representation of actual planning behavior. Used decision tree to recommend optimum Safety Stock.
2. Supply uncertainty: variation of lead time against the average assumed to be representative of variation.
Used Azure Data Factory for end-to-end data pipelines. Made intensive use of delta functionality and combined batch and streaming workloads. Manipulated and transformed data in Microsoft Azure Data Lake Storage (ADLS) raw, foundation, trusted, and unified business logic layer. Used Azure Databricks for AI/ML model development.

Transformation Project for Smart Meters

http://www.sse.com
Worked on the Smart Meter Transformation project and created/maintained optimal Data Lake and data pipeline architecture. Assembled large, complex data sets that meet functional/non-functional business requirements. Identified, designed, and implemented internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability

Languages

SQL, Python, Go, Bash, Hibernate Query Language (HQL), XML, Active Server Pages (ASP), PL/1, COBOL, Python 3, Snowflake, Cypher

Other

Big Data, ETL Tools, Data Warehousing, Data Engineering, Data Warehouse Design, Shell Scripting, Machine Learning, Machine Learning Operations (MLOps), Azure Data Lake, Microsoft OneNote, Web Services, CSV, VisualAge, Job Schedulers, APIs, Apache Cassandra, Apache Atlas, Prometheus, Amazon Glacier, SOAP, Data Modeling, Data Governance, Azure Data Factory, GraphDB, Electronic Data Interchange (EDI)

Frameworks

Hadoop, Spark, Apache Ranger, Flask, Django, Presto

Libraries/APIs

PySpark, Pandas, Flask-RESTful, REST APIs, Spark Streaming, Flask-Marshmallow, NumPy, TensorFlow, SciPy, Scikit-learn, Matplotlib

Tools

Apache Airflow, Apache Sqoop, PyCharm, Sublime Text, ITerm, Jupyter, Zsh, Shell, Oozie, IBM Cognos, Kibana, Grafana, Amazon CloudWatch, Seaborn, Cloudera, Spark SQL, Microsoft Power BI, AWS Glue, Amazon Elastic MapReduce (EMR), Amazon Athena, AutoML, Boto 3

Paradigms

Data Science, MapReduce, ETL, ITIL, OLAP, Business Intelligence (BI), Microservices, Azure DevOps

Platforms

Docker, Apache Pig, Databricks, MacOS, Anaconda, Apache Kafka, Oracle, MapR, AWS Lambda, Kubeflow, Jupyter Notebook, Amazon Web Services (AWS), Hortonworks Data Platform (HDP), Azure

Storage

Apache Hive, HDFS, HBase, Microsoft SQL Server, MySQL, FlatFile, MongoDB, OLTP, SQL Server Integration Services (SSIS), IBM Db2, Amazon S3 (AWS S3), Elasticsearch, Azure SQL, Azure Cosmos DB, Redshift, Azure SQL Databases, SQL Server 2016, Neo4j, PostgreSQL

JUNE 2022 - PRESENT

Neo4j Certified Professional

Neo4j

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring