Dogugun Ozkaya
Verified Expert in Engineering
Data Scientist and Software Developer
İstanbul, Turkey
Toptal member since November 10, 2021
Dogugun is a skilled data scientist with expertise in Machine Learning and data-centric solutions. Proficient in Python and SQL, he drives data-driven decisions and delivers innovative AI solutions. His strengths lie in B2B ML projects, automated pipelines, and real-time APIs. Dogugun's predictive modeling and visualization dashboards empower businesses to optimize processes and gain strategic insights. Dogugun is a valuable asset in advancing businesses with ML-driven solutions.
Portfolio
Experience
Availability
Preferred Environment
PyCharm, Jupyter, Amazon Web Services (AWS), Megalodon
The most amazing...
...project was a B2B ML project for a data-driven decision making tool in an enterprise consultation company. I also took part in its automation and deployment.
Work Experience
Lead Data Scientist
Amperecloud
- Created time series analytical modules to enable customers to monitor the performance of their PV-energy facilities.
- Designed and developed time series forecasting models for both power generation and power loss based on seasonal and irradiation data.
- Developed a comprehensive loss forecast model, coupled with a shading detection algorithm, to detect and classify loss amounts effectively.
- Established a streamlined data pipeline, optimizing data collection from MongoDB and VictoriaMetrics into Redis for efficient utilization in analytical tasks.
Data Scientist
Big Consultancy Company
- Developed a B2B machine learning project, complete with an automation pipeline, to cater to diverse internal teams within the organization. Delivered a versatile API serving multiple stakeholders.
- Created automated Jupyter notebooks tailored to business stakeholders' needs.
- Introduced coverage metrics to assess the data pipeline's effectiveness and successfully unified disparate data sources.
- Orchestrated the automation of training, deployment, and model scoring processes in a containerized cloud environment, streamlining operations for increased efficiency.
Data Scientist
The Conti Group, LLC
- Developed an analysis project powered by machine learning to estimate urban development and its effect on real estate at macro and micro levels.
- Augmented the dataset by collecting data from online resources, seamlessly integrating them into our ML model.
- Developed web scraping bots to collect data from online resources and websites efficiently and in an automated way.
- Developed user-friendly Jupyter notebooks and PowerBI dashboards to convey model outcomes and their business implications.
Data Scientist
Amadeus
- Implemented a customer lifetime value (CLV) prediction model based on loyalty points, contributing to the improvement in customer segmentation products.
- Worked on extracting loyalty KPIs and creating visualizations using Qlik Sense, collaborating closely with the business intelligence team. We built a data pipeline from the ground up using PySpark tailored specifically for analytical use cases.
- Collaborated with a consultancy team to deliver simulation projects and conduct in-depth analyses focused on exploring alternative strategies for increasing engagement.
- Developed, during the COVID-19 crisis, a recommendation tool geared toward increasing passenger engagement. This tool was centered around non-air loyalty items and utilized the ALS Library.
- Built a profile update-based fraud prediction and monitoring tool for loyalty.
Data Scientist
Enerjisa Uretim A.S. — E.ON Energy
- Developed predictive maintenance capabilities for thermal plants through a time series forecasting project focusing on FID fans within combustion engines.
- Collaborated closely with the engineering team at a thermic plant and successfully enhanced coal calorie prediction within the designated mining zone, leveraging the SGeMS tool and ordinary kriging techniques.
- Developed, in collaboration with the trading team, a model for forecasting the electricity market-clearing price. The outputs were instrumental in guiding the trading team's decisions regarding surplus and shortage pricing strategies.
- Created near real-time dashboards at both plant and portfolio levels within Grafana.
Software Engineer and Data Scientist
Mavi Jeans
- Took part in developing and deploying an online store on AWS by integrating the ERP services with the eCommerce platform.
- Undertook the implementation of a recommendation engine utilizing an ALS model. This innovative solution found its home on an AWS EMR instance.
- Collaborated with the eCommerce team to develop ad-hoc propensity models geared toward bolstering targeted marketing capabilities.
- Ventured into the development of back-end services for an in-house CRM tool. This endeavor allowed me to work extensively with NoSQL, particularly with Couchbase.
Experience
B2B Consultancy
My role encompassed the creation of the ML model and an API to serve real-time needs within the organization. Additionally, I visualized coverage metrics within the data pipeline, optimizing the process. In the final stages of the project, I worked on visualizing the data model outputs and SHAP values. I contributed by building an MLOps pipeline and automating the training, deployment, and scoring processes of the ML application within a Dockerized environment on AWS.
Urban Development Analysis
Drug Information Chatbot
https://github.com/dogugun/drug_chatbot_ragFor this purpose, we captured drug labels from the FDA's web resources in XML format and converted them to PDF. Then, we utilized LangChain and HF embeddings to convert them to vectors and upload them to the designated Pinecone instance.
In the final phase, we employed OpenOrca's Mistral model to generate answers from the list of documents we captured from our vector database.
The end solution is deployed as an API with Flask to be served on an AWS EC2 instance.
Predictive Maintenance for CID Fans in Thermal Power Plant
https://github.com/dogugun/industrial_tspIn the original project, the goal was to estimate the vibration of the combustion fan. The input features were temperature, dust collection, and airstream sensors. The target value is the vibration frequency in 6-9-12 hours of prediction horizons.
The input data is collected in SQL Server from Osisoft PI's IoT data. The model is deployed to the on-premise server, and the outputs are visualized in Grafana, along with other input feature values.
In the original project, a GBM was used for retraining and retuning easiness. This demo version uses LSTM as an alternative approach.
Finally, the deployed model and predictive monitoring dashboard are used by supervisor shifts to plan their maintenance program proactively.
Market-clearing Price Estimation
The model data consisted of renewable forecasts, demand forecasts, plant availability declarations, and natural gas prices. The data is collected in a DWH model in SQL Server from APIs, web scraping, and Excel file reports.
A model is developed with gradient gradient-boosting regressor from the scikit-learn library.
Recommendation Engine for Online Retail Store
Comparison of Non-parametric Models and Neural Networks in Blood Glucose Prediction
https://tez.yok.gov.tr/UlusalTezMerkezi/tezDetay.jsp?id=7Xp9BRBO0SOctQECK7M1rw&no=KIdCcRP4-ehNYHHpxNxdRwWithin the non-parametric category, we compared Random Forest Regression (RFR), Gradient Boosting Machines (GBM), and Support Vector Machines (SVM). We pitted Long Short-Term Memory Networks (LSTMs) against Adaptive Neuro-Fuzzy Inference Systems (ANFIS) in the neural network category.
The study revealed that SVM outperformed in terms of Root Mean Square Error (RMSE) among the non-parametric models, while ANFIS demonstrated superior performance in neural networks, surpassing SVM.
Web Scraping and Data Pipeline
The automatized scraper is triggered by corn jobs daily, and the collected data is saved on PostgreSQL DB in AWS.
Stock Market Data Extracting
Technically, the script runs on my personal AWS EC2 cluster. A Lambda service initiates the EC2 instance, and once the script completes its run, it shuts down the instance. The resulting files are archived in an S3 bucket.
Education
Master's Degree in Biomedical Engineering
Bogazici University - Istanbul, Turkey
Bachelor's Degree in Computer Engineering
Bogazici University - Istanbul, Turkey
Certifications
Learn LangChain, Pinecone & OpenAI: Build Next-Gen LLM Apps
Udemy
CS110x: Big Data Analysis with Apache Spark
edX
Skills
Libraries/APIs
LSTM, NumPy, Pandas, PySpark, Beautiful Soup, CatBoost, Keras, Scikit-learn
Tools
PyCharm, Jupyter, Qlik Sense, Grafana, Amazon Elastic MapReduce (EMR), Microsoft Power BI, Apache Airflow, Apache NiFi, LaTeX, VictoriaMetrics, Tableau
Languages
Python, SQL, Snowflake
Platforms
Amazon EC2, Amazon Web Services (AWS), Docker, Jupyter Notebook, AWS Lambda
Industry Expertise
Bioinformatics
Frameworks
Selenium
Storage
Amazon S3 (AWS S3), PostgreSQL, Redis
Other
Computer Science, Statistics, Recommendation Systems, Machine Learning, Data Science, Data Analytics, Dashboards, Data Analysis, Data Engineering, Software Development, Biomedical Skills, Big Data, Deep Learning, Data Scraping, Megalodon, OSIsoft PI, Neural Networks, Long Short-term Memory (LSTM), Web Scraping, Stock Exchange, Predictive Modeling, Machine Learning Operations (MLOps), Data Reporting, Time Series Analysis, FastAPI, Geospatial Analytics, Geospatial Data, Recurrent Neural Networks (RNNs), Time Series, Biostatistics, Random Forests, Random Forest Regression, Support Vector Regression, Support Vector Machines (SVM), Adaptive Neuro-fuzzy Inference System (ANFIS), Scraping, Financial Data, Gradient Boosting, Linear Regression, Feature Engineering, Artificial Intelligence (AI), Large Language Models (LLMs), Pinecone, OpenAI GPT-3 API, OpenAI GPT-4 API, LangChain, Chatbots, Natural Language Processing (NLP), Vector Data, Vector Databases, Hugging Face, Mistral AI, Scalable Vector Databases, AI Chatbots, Retrieval-augmented Generation (RAG)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring