Burcin Sarac
Verified Expert in Engineering
Data Scientist and Software Developer
Istanbul, Turkey
Toptal member since August 13, 2021
Burcin is a seasoned data scientist and AI developer with a master's degree in the field and certifications in ML and AI. With a strong command of Python and its ecosystem, he has extensive hands-on experience across various AI and ML technologies. Burcin's current focus lies in the advancements of large language models (LLMs), specializing in task automation and the development and deployment of AI products into cloud environments, particularly on Google Cloud Platform (GCP).
Portfolio
Experience
- Data Science - 7 years
- Data Analytics - 7 years
- Python 3 - 7 years
- Statistics - 7 years
- SQL - 6 years
- Natural Language Processing (NLP) - 4 years
- Data Engineering - 4 years
- Google Cloud Platform (GCP) - 3 years
Availability
Preferred Environment
Python 3, Jupyter Notebook, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Google Cloud Platform (GCP), Amazon Web Services (AWS), Visual Studio Code (VS Code), Ubuntu, Large Language Models (LLMs)
The most amazing...
...thing I've built is a versatile bot using LLMs for automated, context-aware interactions on social media, adaptable to any input and deployed on GCP.
Work Experience
LLM Developer
AB-InBev - Gen AI - India
- Improved the in-house chatbot's responses by optimizing prompts for token efficiency and relevance using tools like DSPy and TextGrad.
- Designed and implemented a robust system to generate synthetic question-answer pairs for product performance testing.
- Developed scripts to dynamically generate questions and complex SQL queries involving multiple tables (AzureSQL and PostgreSQL) to test database integrations.
- Leveraged FAISS, LangChain, and GPT-4o API to process and embed unstructured documents (PDFs, presentations, etc.). Generated questions and answers based on similar document pairs using LLMs.
- Combined structured and unstructured datasets into a unified pipeline to generate complex, multi-layered questions and answers. Used embeddings and AI-generated QA pairs to create advanced compound queries.
AI Developer
Specc AS
- Developed an advanced chatbot using Langraph and RAG architecture to process and interact with unstructured API documentation data, improving user experience and guidance on the platform.
- Created and managed a ChromaDB vector database, leveraging Google and Hugging Face embedding models. Conducted extensive testing with various metadata and data-cleaning strategies to enhance the accuracy and relevance of the chatbot's responses.
- Structured the RAG system as a directed acyclic graph (DAG) within LangGraph, where each node represents a specific operation (retrieval, generation, verification, hallucination control), and edges define the data flow between these operations.
- Deployed the AI model as an API on Google Cloud Platform (GCP) Cloud Run, enabling seamless integration across various platforms.
- Set up automated triggers using Google Cloud Platform (GCP) Cloud Functions to update the vector database when new documents are uploaded, ensuring that the AI remains up-to-date with the latest data.
AI Developer (via Toptal)
Onyx Relations Corp
- Developed a versatile bot capable of posting about specific topics and press releases and engaging with users on social media platforms, including Twitter and Reddit.
- Integrated and leveraged state-of-the-art LLM/GPT technologies, including OpenAI API and Gemini Pro, to enable organic and contextually relevant responses to user interactions.
- Implemented functionalities to detect and respond to relevant threads, discussions, and trends across multiple platforms.
- Enhanced the bot's adaptability to any input stock symbol, fetching news data from at least 50 news sources using APIs, RSS feeds, and webpage parsing techniques.
- Summarized news data using the latest LLM models to provide concise and informative content.
- Dockerized the entire app as a service and deployed all processes to the Google Cloud Platform using various technologies, such as Cloud Run, Cloud Functions, BigQuery, and Cloud Scheduler, ensuring efficient and scalable operations.
Senior Data Scientist
n11.com
- Constructed customer data pipelines for daily, weekly, and monthly generated features based on customer transactions. Scheduled jobs to generate tables in BigQuery using Python.
- Redesigned and improved a churn model to detect churners and calculate customer lifetime values using customer transactions as raw data, deployed as a Kubeflow service; fetched and processed data from BigQuery, all orchestrated with Cloud Scheduler.
- Segmented customers based on behaviors using platform logs and transactions, deployed as a Kubeflow service; fetched and processed data from BigQuery, created segments, and wrote to another table, all orchestrated with Cloud Scheduler.
- Developed and deployed a custom chatbot using customer interaction data, with the model deployed as a custom prediction routine endpoint in Vertex AI. The pipeline was Dockerized and deployed as an API on Cloud Run.
- Designed an HTML page to use in-office screens to track real-time order amounts with animations to celebrate whenever the target hits using HTML, CSS, and JavaScript together with FastAPI in the back end.
- Worked as part of a team on a custom in-house recommender system project and contributed to the design of the whole project lifecycle, including the API design. Integrated Gradio to create web interfaces that testers could use on the model.
- Designed and developed fraud and counterfeit product detection approaches, including image recognition, TF-IDF, lemmatization, stemming, and text embedding generation.
- Developed and deployed an advanced image processing model on the Google Cloud Platform using Vertex AI to provide real-time predictions. The model was integrated with a Dataflow pipeline that generated and stored image embeddings in a BigQuery table.
- Designed and developed a Kubernetes-managed service to retrieve image embeddings from BigQuery, standardize and rotate images for data augmentation, and optimize image data for further processing and analysis.
AI Developer
Onyx Relations Corp
- Developed a bot capable of posting about specific topics, press releases, and engaging with users on social media platforms.
- Integrated and leveraged LLM/GPT technologies to enable organic and contextually relevant responses to user interactions.
- Implemented functionalities to detect and respond to relevant threads, discussions, and trends across Twitter and Reddit.
- Deployed all the processes to Google Cloud Platform using various technologies, such as Cloud Run, Cloud Functions, BigQuery, and Cloud Scheduler, among others.
Data Scientist | AI Developer
Sole Entrepreneurship in US
- Developed and did backtesting using price-related data following trend strategies in the US stock market.
- Automated successful trading strategies based on backtesting results using Python by connecting to stock market APIs.
- Deployed all fully automated trading bots on the cloud, allowing the user to change parameters and start/stop them through a clean front screen.
- Created separate BigQuery tables to record closed trades of each trading bot and visualized the trading results with filtering options to let the user analyze the bot performance using Looker Studio.
Senior Applied Scientist
Magnify
- Acted as an ML model developer in a post-sales automation and orchestration platform development project. Segmented customers based on Salesforce platform usage attributes.
- Gathered, transformed, and summarized features to define a rule-based churn algorithm to detect possible churners among customers.
- Connected to the AWS VM Instance using SSH from the local machine, set up MLFlow experiment tracking records in an AWS S3 bucket, and generated experiment track reports using Prefect.
Senior Data Scientist
Intertech (Emirates NBD Bank)
- Developed an NLP model to summarize texts using claim documents to classify customer requests and forward them to the relevant department.
- Summarized effort logs of employees were collected as time series data, and then future efforts were estimated for planning future employee capacity requirements.
- Built an anomaly detection model to detect anomalies in invoice payments and then implemented an email alert system for immediate intervention by the relevant teams.
- Constructed pipelines for gathering data from various sources such as relational databases and HTML or Excel files to generate reports; these were published via Power BI.
Senior Data Scientist
Sekerbank (Samruk — Kazyna Invest LLP)
- Built and presented propensity models for retail loan products and loan accounts to determine the tendency of customers to purchase these products.
- Developed and implemented a clustering algorithm to segment retail customers based on their assets, liabilities, and product ownership.
- Cleaned and classified texts from customer complaints about products and services to generate weekly reports.
- Developed a market-basket analysis project based on customer product ownership to improve marketing activities.
- Constructed pipelines for the parsing and analysis of customer data for daily, weekly, and monthly executive reports to automatize report preparation.
Data Scientist
Vakifbank
- Developed and deployed product propensity models for retail and SME customers to detect if a customer was likely to buy and improve marketing initiatives' customer targeting.
- Constructed a customer segmentation model based on the customer's balance account, transactions, credit cards, and loan usage behaviors.
- Investigated and updated currently in-use prediction models to improve prediction performances and simplify the results.
- Improved report generation pipelines to automate preparation processes based on customer data.
Experience
Lyrics Generator | A Web Scrapping and Lyric Generation Project
https://github.com/burcins/LyricsGeneratorIn the first step, I parsed lyrics from a web page via a Beautiful Soup package and then cleaned as well as prepared them for model development. After that, I created a bidirectional LSTM model with a couple of layers and then trained it with a hundred iteration. Eventually, I provided the initial words for the trained model and it predicted an additional 100 words.
Twitter Sentiment Analysis
https://github.com/burcins/Twitter-Sentiment-AnalysisATM Cash Demand Forecasting
https://github.com/burcins/Time-Series-ForecastingThe dataset included three features: Cash In, Cash Out, and Date. It also contains 1,186 observations in total which correspond to 1,186 days starting from 01/01/2016 to 03/31/2019. Eventually, it was expected to forecast the Cash In and Cash Out values between 04/01/2019 and 04/30/2019 separately.
Term Deposit Propensity Prediction
https://github.com/burcins/Term-Deposit-Propensity-PredictionThe data contains 40,000 pieces of customer data with 14 features, including term deposit ownership.
Text Summarizer
https://huggingface.co/spaces/Burcin/ExtractiveSummarizerMulticlass Classification Development and Deployment (MLOps)
https://github.com/burcins/mlops-zoomcamp-main-projectFor this project, publicly available wine data was used, and a simple multiclass classification model was developed to predict wine quality and assign a quality rate between 3 and 9, based on the product's ingredients as predictors.
Education
Master's Degree in Business Analytics
Athens University of Economics and Business - Athens, Greece
Master's Degree in Capital Markets
Marmara University - Istanbul, TURKEY
Certifications
MLOps Zoomcamp
DataTalks.Club
Natural Language Processing Specialization
Coursera
Skills
Libraries/APIs
Pandas, Scikit-learn, X (formerly Twitter) API, NumPy, XGBoost, OpenAI API, TensorFlow, Beautiful Soup, Natural Language Toolkit (NLTK), SpaCy, Reddit API, OpenCV
Tools
BigQuery, ChatGPT, PyCharm, Microsoft Power BI, Amazon SageMaker, Yahoo! Finance, Azure Machine Learning, Apache Airflow, Cron, Cloud Dataflow, Prefect, Grafana, Looker, Apache Beam
Languages
Python 3, Python, SQL, SAS, R, HTML, CSS, JavaScript
Platforms
Jupyter Notebook, Vertex AI, Ubuntu 20.04, Google Cloud Platform (GCP), Docker, Kubeflow, Cloud Run, Amazon Web Services (AWS), Visual Studio Code (VS Code), Ubuntu, Kubernetes
Storage
Google Cloud, Microsoft SQL Server, Oracle SQL, MySQL, PostgreSQL, Data Pipelines, MongoDB, Cassandra, Redis, NoSQL
Frameworks
Flask, Streamlit, Django, LangGraph, DSPy
Paradigms
ETL, REST, Automation
Other
Data Science, Machine Learning, Natural Language Processing (NLP), Time Series, Classification, Clustering, Unsupervised Learning, Supervised Machine Learning, Data Analysis, Supervised Learning, Artificial Intelligence (AI), Data Analytics, Regression, Google BigQuery, Data Processing Automation, API Integration, Google Cloud Functions, Finance, APIs, AI Chatbots, Llama, ChatGPT API, FastAPI, Vector Databases, Deep Learning, Statistics, Text Classification, Web Scraping, Machine Learning Operations (MLOps), Time Series Analysis, Financial Modeling, Trend Forecasting, Microsoft Azure, Data Visualization, Data Engineering, Trading, Algorithmic Trading, Financial Markets, Capital Markets, Stock Market, Stock Trading, Stock Exchange, Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Long Short-term Memory (LSTM), Text Categorization, OpenAI, OpenAI GPT-4 API, Finance APIs, Chatbots, Large Language Models (LLMs), Prompt Engineering, LangChain, Retrieval-augmented Generation (RAG), Generative Pre-trained Transformer 3 (GPT-3), OpenAI GPT-3 API, Gemini, Anthropic, Claude, Gemini API, Generative Artificial Intelligence (GenAI), Llama 2, Gunicorn, Predictive Modeling, Natural Language Understanding (NLU), Forecasting, Stock Price Analysis, Stock Market Techinical Analysis, Financial Marketing, Big Data, Social Media Analytics, Sequence Models, Data Cleaning, Google Cloud ML, Customer Segmentation, MLflow, Trend Analysis, Generative Pre-trained Transformers (GPT), Open-source LLMs, HTML Parsing, Job Schedulers, BERT, Software Development, Vector Stores, Vector Search, ChromaDB, Word Embedding, Llama 3, Generative Pre-trained Transformer 4 (GPT-4), FAISS, TextGrad
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring