Serkan Coskun
Verified Expert in Engineering
Data Engineering Developer
Serkan is a professionally qualified, hands-on IT specialist with proven commercial experience working as an AI/ML/Data engineer within the banking (fintech), telco, energy, and healthcare sectors. He's a self-motivated individual, capable of understanding complex technical environments and technologies. Serkan is skilled in creating, maintaining, and operating data ingestion/ETL/ELT pipelines to the data lake/layers and providing extensive data/cloud/ML functions for technology services.
Portfolio
Experience
Availability
Preferred Environment
Amazon Web Services (AWS), Machine Learning Operations (MLOps), Spark, Big Data, Databricks, Python 3, Redshift, Snowflake, Azure, Neo4j
The most amazing...
...were an AI prediction model and root cause analysis project that built an AI model and ML techniques to identify anomalies in the manufacturing line.
Work Experience
Senior AI/ML and Data Engineer
Healthcare/Pharmaceutical Company
- Worked on machine learning models analyzing insurance claims on AWS Databricks platform with AutoML component. The functionality of these models was to estimate the probability that a given claim would be denied.
- Identified specific sections of a claim that may be problematic. Recommended new values for fields in a claim that were improperly populated.
- Proposed a new combination based on similar claims in our database that were accepted by insurance if a given combination of codes is found invalid.
- Suggested a new modifier based on similar claims in our database that were accepted by insurance if a modifier is found invalid.
- Developed EDI Generator module for generating EDI 837 files. Accepting a JSON file with a patient, provider, and encounter data as input and producing an EDI 837 claim file as output. Used boto3, pyx12, EDI-835-parser python libraries.
- Created the claim edit module for identifying flaws in claims and proposing fixes. This module was able to apply its own proprietary editing rules and machine learning models created by the Machine Learning Engine.
Senior AI/ML and Data Engineer
Healthcare/Pharmaceutical Company
- Used Knowledge Representation and Semantic Technologies (OWL, RDF, SWRL, SPARQL, JSON-LD), including semantic modeling and data integration, data unification, knowledge graph design, ontology, and taxonomy.
- Created the data and information architecture and modeling languages such as UML, ER, IDEF, and Data Flow Diagram. Worked with mapping techniques, R2RML, for transforming relational/tabular datasets into triples.
- Developed end-to-end data pipelines with Python and used Cypher for transforming relational and tabular datasets into Neo4j. Used py2neo and neo4j drivers for ingesting data from AWS Redshift and PostgreSQL into Neo4j DB.
- Used arrows.app to create graph models for communicating and explaining to the stakeholders.
Senior AI/ML and Data Engineer
Yara International
- Worked on the digital farming data platform project, a cloud-based platform built on AWS, as an AI/ML data engineer in the data science team. Created, maintained, and operated E2E data pipelines and built AI/ML models for agronomy use cases.
- Connected to external data sources via APIs, push and pull, then stored data in the AWS S3 buckets. Developed Spark Streaming jobs that read data from the valid topic of Kafka, complete transformation on the fly, then load into the Hive table.
- Created a Lambda function that reads data from S3 buckets, validates with the Kafka schema registry based on the registered format, and then pushes the message to the relevant Kafka topic that is valid or invalid.
- Used AWS scheduler to invoke Lambda functions periodically—hourly and daily. In Kafka, defined the schema, converted it to the Avro format then uploaded it to the schema registry. Used AWS S3 Glue-Redshift for E2E data pipelines and data lake.
- Developed read and get microservices using Spring Boot and KStream to read the invalid topic messages or to get an exception report. Loaded data to Cassandra DB from Hive DB for democratizing the data to the third-party users via API Gateway.
- Used Python 3, Jupyter lab for the developing use cases, AI/ML models, and used Kubeflow for the MLOps operations. Developed and built REST APIs with Python Flask for API integration with third parties.
- Created and developed data lineage pipelines, dictionaries, and the ontology of the project and used Apache Atlas for data governance and metadata management. Elaborated business glossaries, specified the classifications, and added tags on the data.
- Utilized Apache Ranger for the central administration of security policies and monitoring user access. Used AWS CloudWatch for all the logs and pushed to Elasticsearch and Kibana to read the log messages.
- Built, cloned, dropped databases, and loaded data to Snowflake. Shared Snowflake table(s) among separate accounts or external users without creating a second copy of the table data. Set up and managed to compute resources to load or query.
- Loaded structured and semi-structured data and created views and materialized views. Designed external stages for Snowflake Managed and AWS. Used Time Travel for restoring data-related objects and analyzing data usage and manipulation.
AI-ML and Data Engineer
Healthcare/Pharmaceutical Company
- Worked on the Safety Stock Optimiser project. Built-in AI/ML techniques to quantify demand and supply uncertainties to calculate a recommended safety stock level. Created solutions for demand uncertainty and supply uncertainty.
- Used Azure Data Factory for end-to-end data pipelines. Manipulated and transformed data in ADLS (Azure Data Lake) raw, foundation, trusted, and unified business logic layer.
- Made intensive use of Delta functionality and combined batch and streaming workloads. Used Azure Databricks for AI/ML model development. Visualized the results via Power BI.
- Worked on the AI prediction model and root cause analysis project that built an AI model and ML techniques to identify the critical anomalies in manufacturing performance.
- Reduced downtime, rectified breakdowns, predicted failures, and allowed scheduling of maintenance activities and control strategies for machine parameters to avoid breakdowns.
- Created holistic data infrastructure and data lake/marts for manufacturing lines. Ingested structured, semi-structured, and unstructured batch and real-time/stream data from numerous data sources into various data platforms using data flow/pipelines.
- Automated data ingestion using Oozie and Airflow. Accessed and manipulated data with HCatalog, Hive server and Query, Apache Pig, Shell scripts, Python, and PySpark.
- Ensured high scalability, efficiency, and computation across large-scale network servers and big data distributed systems with Go channels and goroutines. Developed REST APIs with Python Flask and Django for API integration with third parties.
Senior Data Engineer and Analytics Expert
SSE
- Worked on a smart meter transformation project and created/maintained optimal data lake and data pipeline architecture. Assembled large, complex data sets that met functional/non-functional business requirements.
- Identified, designed, and implemented internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Collected data from different data sources, inserted in their native formats, and transformed when needed (ELT/ELT processes).
- Optimized the report data warehouse for performance, access, and integration using Kimball-Star Schema Methodology.
Data Engineer
Heathrow Gatwick Airport
- Analyzed and consolidated data from various sources (SQL Server, Oracle, MySQL, flat files, XML, web services, and CSV).
- Collected data from different data sources, manipulated, then inserted them into DWH. Optimized the report data warehouse and data mart construction for performance, access, and integration. Solved ETL problems and performance tuning.
- Queried data analysis and profiling data using SQL in standard database applications. Optimized report data warehouse and data mart construction for performance, access, and integration.
- Used logical/physical database design techniques for both structured and unstructured data stores. Used business process modeling, data flow modeling, and data lineage modeling.
- Created revenue dashboards and performance reports for the CXO level managers. Monitored sales pipeline and provided useful reports to the sales team.
Experience
Graph DB Neo4j
• Created data/information architecture and modeling languages such as UML, ER, IDEF, Data Flow Diagram
• Worked with mapping techniques, R2RML, for transforming relational/tabular datasets into triples
• Developed end-to-end data pipelines with Python and used Cypher for transforming relational/tabular datasets into Neo4j
• Used py2neo and neo4j drivers for ingesting data from AWS Redshift and PostgreSQL into Neo4j DB.
• Used arrows.app to create Graph Models for communicating and explaining to the stakeholders
ML Models and Pipelines for Agronomy
http://www.yara.comAI Prediction Model and Root Cause Analysis
Safety Stock Optimizer
1. Demand uncertainty: uses forecast error calculation–a more realistic representation of actual planning behavior. Used decision tree to recommend optimum Safety Stock.
2. Supply uncertainty: variation of lead time against the average assumed to be representative of variation.
Used Azure Data Factory for end-to-end data pipelines. Made intensive use of delta functionality and combined batch and streaming workloads. Manipulated and transformed data in Microsoft Azure Data Lake Storage (ADLS) raw, foundation, trusted, and unified business logic layer. Used Azure Databricks for AI/ML model development.
Transformation Project for Smart Meters
http://www.sse.comSkills
Languages
SQL, Python, Go, Bash, Hibernate Query Language (HQL), XML, Active Server Pages (ASP), PL/1, COBOL, Python 3, Snowflake, Cypher
Other
Big Data, ETL Tools, Data Warehousing, Data Engineering, Data Warehouse Design, Shell Scripting, Machine Learning, Machine Learning Operations (MLOps), Azure Data Lake, Microsoft OneNote, Web Services, CSV, VisualAge, Job Schedulers, APIs, Apache Cassandra, Apache Atlas, Prometheus, Amazon Glacier, SOAP, Data Modeling, Data Governance, Azure Data Factory, GraphDB, Electronic Data Interchange (EDI)
Frameworks
Hadoop, Spark, Apache Ranger, Flask, Django, Presto
Libraries/APIs
PySpark, Pandas, Flask-RESTful, REST APIs, Spark Streaming, Flask-Marshmallow, NumPy, TensorFlow, SciPy, Scikit-learn, Matplotlib
Tools
Apache Airflow, Apache Sqoop, PyCharm, Sublime Text, ITerm, Jupyter, Zsh, Shell, Oozie, IBM Cognos, Kibana, Grafana, Amazon CloudWatch, Seaborn, Cloudera, Spark SQL, Microsoft Power BI, AWS Glue, Amazon Elastic MapReduce (EMR), Amazon Athena, AutoML, Boto 3
Paradigms
Data Science, MapReduce, ETL, ITIL, OLAP, Business Intelligence (BI), Microservices, Azure DevOps
Platforms
Docker, Apache Pig, Databricks, MacOS, Anaconda, Apache Kafka, Oracle, MapR, AWS Lambda, Kubeflow, Jupyter Notebook, Amazon Web Services (AWS), Hortonworks Data Platform (HDP), Azure
Storage
Apache Hive, HDFS, HBase, Microsoft SQL Server, MySQL, FlatFile, MongoDB, OLTP, SQL Server Integration Services (SSIS), IBM Db2, Amazon S3 (AWS S3), Elasticsearch, Azure SQL, Azure Cosmos DB, Redshift, Azure SQL Databases, SQL Server 2016, Neo4j, PostgreSQL
Certifications
Neo4j Certified Professional
Neo4j
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring