Daniel Bredun, Developer in Rzeszow, Poland
Daniel is available for hire
Hire Daniel

Daniel Bredun

Verified Expert  in Engineering

Bio

Daniel is a data scientist and engineer who is a whiz at the data lifecycle. He excels at crafting efficient data pipelines, designing databases, conducting advanced analyses, and harnessing machine learning. Coupled with his proficiency in cloud storage systems, Daniel has consistently driven business success. Even in the face of challenging constraints, his passion for problem-solving ensures top-tier, long-term solutions.

Portfolio

Geoeconomics AI
Python, Data Engineering, Large Language Models (LLMs), Pandas, NumPy...
StubHub
SQL, T-SQL (Transact-SQL), Microsoft SQL Server, Snowflake...
New Columbia Solar
Salesforce, Salesforce API, SOQL, Salesforce Object Query Language (SOQL)...

Experience

  • SQL - 5 years
  • Python - 5 years
  • Amazon Web Services (AWS) - 3 years
  • Bash - 3 years
  • TensorFlow - 3 years
  • Data Mapping - 3 years
  • Apache Kafka - 2 years
  • PySpark - 2 years

Availability

Full-time

Preferred Environment

PyCharm, MacOS

The most amazing...

...data collection I've done was from an ancient public API, boosting it from 10 to 60,000 data points per minute by reverse-engineering their web portal requests.

Work Experience

Senior Data Engineer

2024 - 2025
Geoeconomics AI
  • Supported an early-stage AI startup by developing code and infrastructure for the LLM, including web scraping of input data, applying RAG to process it, and crafting an LLM-powered feature extraction pipeline.
  • Deployed the infrastructure to AWS to facilitate high fault tolerance and high scaling potential using Lambda and Athena.
  • Designed a live geospatial data visualization for the web application using D3.js, ensuring the appropriate database-side architecture and indexing to work smoothly regardless of the number of data points.
Technologies: Python, Data Engineering, Large Language Models (LLMs), Pandas, NumPy, Matplotlib, Artificial Intelligence (AI), Retrieval-augmented Generation (RAG), React, TypeScript, Apache Spark, Apache Airflow, Data Pipelines

SQL Data Engineer

2024 - 2024
StubHub
  • Co-led the migration of an ERP system from SQL Server to Snowflake and dbt, speeding up journal generation 10-fold while significantly improving the development experience.
  • Led a major internal SQL Server database refactoring project, reducing system issues by 50% and saving 20 hours of employee time per month.
  • Instituted documentation of critical processes, previously shared informally, significantly speeding up new joiners' time to autonomy.
Technologies: SQL, T-SQL (Transact-SQL), Microsoft SQL Server, Snowflake, Data Build Tool (dbt), Data Migration, Data Classification, Azure, B2C, Big Data, ETL Tools, Functional Programming

Senior Integration Engineer

2022 - 2024
New Columbia Solar
  • Led a series of comprehensive integration projects to connect five internal software tools (Salesforce, AWS RDS database, Excel spreadsheets, Contract Logix, and Intacct), saving 500+ hours of manual work monthly.
  • Worked closely with the COO to migrate nuanced business logic from Excel spreadsheets to Salesforce, including finances, inventory management, budget forecasting, asset management, and sales, reducing data quality issues across the organization by 85%.
  • Designed and deployed a RAG-enhanced LLM extraction pipeline to categorize and assign asset maintenance requests, increasing time to resolution by 55%.
Technologies: Salesforce, Salesforce API, SOQL, Salesforce Object Query Language (SOQL), HubSpot, Microsoft Graph API, Google APIs, REST APIs, CRM APIs, Apex, APEX Code, Batch Apex, Apex Classes, Apex Triggers, B2B, Cloud, Cloud Platforms, Amazon Athena, ETL Tools, Functional Programming, Amazon CloudWatch, ECS

Data Analyst

2023 - 2023
Movement of Mothers
  • Reconciled court case data from various sources and analyzed it, informing the legislature in California.
  • Designed and executed a systematic, unbiased survey to gather critical data, facilitating insightful analysis and decision-making.
  • Worked with stakeholders across multiple nonprofit organizations to gather and understand the data in question.
Technologies: Data Analysis, SQL, Data Visualization, Data Reporting, Data Classification, Data Cleaning, B2C

Data Science Research Assistant

2022 - 2023
The University of Chicago
  • Deployed machine learning (ML) models using free and proprietary tools, such as Kubernetes and funcX, for scalable use by the scientific community.
  • Collaborated on developing a platform for publishing and sharing AI models for research purposes.
  • Authored ML models predicting the physical properties of new compounds based on their chemical composition.
Technologies: Data Science, Kubernetes, Neural Networks, PyTorch, Docker, Anaconda, PyCharm, Statistics, Machine Learning Operations (MLOps), Git, Ubuntu, Data Modeling, Machine Learning, Python, Scikit-learn, Jupyter, Data Scientist, Microservices, Leadership, Data Classification, Data Cleaning, Artificial Intelligence (AI), Analytics, Cloud, Cloud Platforms, Functional Programming, Amazon CloudWatch, ECS

Senior Data Science and Engineering Consultant

2019 - 2023
New Columbia Solar
  • Designed and deployed a relational data warehouse and object-oriented data pipeline for asset management data on AWS.
  • Saved over $40,000 monthly in lost profits through an automated predictive model for prompt anomaly detection.
  • Achieved a 9% revenue increase from new assets by identifying performance factors in existing ones.
  • Reduced maintenance time from nine to three days by building a custom web application for asset monitoring and contributing to the 10% efficiency increase.
  • Led a team of three to automate investor reporting, saving over 100 hours of manual work monthly and reducing costs by 12%.
Technologies: Apache Airflow, PostgreSQL, Amazon Web Services (AWS), Python, Statistics, Data Warehouse Design, Time Series Analysis, Pandas, Google Sheets API, RESTful Services, Google Cloud Platform (GCP), Google Sheets, Dashboard Design, Dashboards, Data Modeling, REST APIs, Databases, PL/SQL, Data Warehousing, Business Intelligence (BI), Machine Learning, Data Engineering, Data Science, PyCharm, Amazon RDS, Amazon EC2, Redshift, Data Pipelines, AWS IAM, Amazon S3 (AWS S3), ECharts, Vue, DevOps, APIs, NumPy, Django, Jupyter, Database Administration (DBA), SQL, Microsoft Excel, JavaScript, GitHub, ETL, Amazon Athena, Data Scientist, Data Build Tool (dbt), CI/CD Pipelines, Node.js, Microservices, Proof of Concept (POC), Jira, Performance Optimization, Data Architecture, Leadership, Data Quality Analysis, Data Cleansing, Data Reporting, Database Migration, Firebase, Amazon Aurora, Database Optimization, Terraform, Data Mapping, AWS Glue, AWS Lambda, Linux, Salesforce Object Query Language (SOQL), Salesforce API, Data Migration, Salesforce, Data Classification, Excel 365, Data Cleaning, Artificial Intelligence (AI), Analytics, B2B, Cloud, Cloud Platforms, ETL Tools, Functional Programming, Amazon CloudWatch, Amazon Elastic Container Registry (ECR), ECS

Data Analytics and Engineering

2022 - 2022
Tesla
  • Reduced data storage costs by migrating from Vertica to a data lake using Parquet on Amazon S3. The migration was accomplished via Hudi on Apache Spark.
  • Diagnosed and resolved inefficiency in data replication by automating table schema synchronization.
  • Sped up PostgreSQL data replication by 300% by migrating it from ETL to Apache Kafka data streaming.
Technologies: Spark, PySpark, MySQL, Apache Kafka, Amazon S3 (AWS S3), Apache Hudi, Data Lakes, Parquet, Database Replication, Kubernetes, Docker, Vertica, InfluxDB, Presto, Pandas, PyCharm, Git, Bash, Data Engineering, Ubuntu, REST APIs, Databases, PL/SQL, Oracle, Data Warehousing, Python, Data Pipelines, Test-driven Deployment, Protobuf, NumPy, SQL, GitHub, ETL, Message Queues, CI/CD Pipelines, Microservices, RabbitMQ, Jira, Apache Spark, Performance Optimization, BigQuery, Snowflake, Data Reporting, Databricks, Database Migration, NoSQL, Cloud Firestore, Database Optimization, Scala, Data Mapping, Data Cleaning, Hadoop, Big Data, Cloud Platforms, Distributed Systems, ETL Tools, Functional Programming, AWS Lake Formation

Junior Data Analyst

2019 - 2019
Prodigal Sun Solar
  • Increased client's revenue by 5% through a hierarchical statistical hypothesis test to compare solar panel manufacturers.
  • Devised a creative optimization for API calling procedure, reducing its time from 3.65 days to 53 seconds.
  • Built an automated ETL system in Python for processing XML, JSON, and CSV data from solar APIs.
Technologies: Data Analysis, R, Pandas, NumPy, Scikit-learn, Hypothesis Testing, Git, PostgreSQL, Data Visualization, Matplotlib, RESTful Services, Dashboard Design, Tableau, Dashboards, Data Modeling, REST APIs, Databases, Data Analytics, Business Intelligence (BI), PyCharm, Python, APIs, GitHub, MongoDB, Leadership, Data Quality Analysis, Data Cleansing, Data Mapping, Data Cleaning, Artificial Intelligence (AI), Analytics, B2B, Cloud, Cloud Platforms, Functional Programming, Amazon CloudWatch, ECS

Experience

HEAReader: Sync-reading Books Voiced by Real Humans

https://github.com/Breedoon/BookSync
I developed HEAReader, a sync-reading books solution voiced by real humans. It used a TensorFlow-based algorithm for word-to-word matching of audiobooks with books, enabling synchronous reading. Also, I learned Swift and created an iOS app to serve as a proof of concept (POC) for the algorithm.

MDtoLongPDF: Converting Markdown to Pageless PDFs

https://github.com/Breedoon/MDtoLongPDF
Pagination in PDF has become irrelevant as most documents are not intended for printing. However, page breaks still disrupt the content flow, splitting sections, breaking tables, and moving figures around, which leads to wasted space, all to serve a function that is no longer needed.

MDtoLongPDF is a tool intended to solve this issue by converting unpaginated formats like Markdown and HTML into a single, extensive PDF page. This tool eliminates unnecessary page breaks, enabling seamless content rendering. I personally rely on it for creating documents and resumes.

AdmitMe

I worked on AdmitMe, an app that helped 300+ high school graduates in Ukraine find the colleges they were most likely to get into based on historical admissions data scraped from the government website and their exam scores. It achieved 89% of accuracy.

Certifications

DECEMBER 2018 - PRESENT

MTA: SQL Development

Microsoft

DECEMBER 2018 - PRESENT

MTA: Python Development

Microsoft

Skills

Libraries/APIs

Pandas, NumPy, Matplotlib, PySpark, PyTorch, TensorFlow, Scikit-learn, Google Sheets API, REST APIs, DeepSpeech, Vue, Protobuf, Node.js, Salesforce API, Google APIs, React

Tools

PyCharm, Git, GitHub, Apache Airflow, Jupyter, Google Sheets, Tableau, Jira, Amazon CloudWatch, AWS IAM, Prince XML, Pandoc, Microsoft Excel, Amazon Athena, RabbitMQ, BigQuery, Terraform, AWS Glue, Batch Apex, Amazon Elastic Container Registry (ECR)

Languages

Python, SQL, R, Bash, JavaScript, Java, Markdown, HTML, Swift 5, C++, GraphQL, Snowflake, Scala, T-SQL (Transact-SQL), Salesforce Object Query Language (SOQL), SOQL, Apex, APEX Code, TypeScript

Paradigms

ETL, Functional Programming, Test-driven Deployment, DevOps, Business Intelligence (BI), Microservices, B2B, B2C

Platforms

MacOS, Amazon Web Services (AWS), Salesforce, Amazon EC2, Apache Kafka, Docker, Ubuntu, Apache Hudi, Kubernetes, Anaconda, Google Cloud Platform (GCP), Oracle, Databricks, Firebase, AWS Lambda, Linux, HubSpot, Azure

Storage

PostgreSQL, Amazon S3 (AWS S3), Database Administration (DBA), Databases, Database Migration, Data Pipelines, Data Lakes, PL/SQL, NoSQL, Amazon Aurora, Redshift, MySQL, Database Replication, Vertica, InfluxDB, MongoDB, Cloud Firestore, Microsoft SQL Server

Frameworks

Apache Spark, Hadoop, Spark, Presto, Django

Other

Data Engineering, Data Analysis, Data Science, Data Visualization, Data Warehousing, Data Reporting, Database Optimization, Data Mapping, Cloud, ETL Tools, ECS, Machine Learning, Amazon RDS, Data Warehouse Design, Neural Networks, Time Series Analysis, APIs, Hypothesis Testing, RESTful Services, Dashboard Design, Dashboards, Data Modeling, Data Analytics, Message Queues, Data Scientist, CI/CD Pipelines, Proof of Concept (POC), Performance Optimization, Data Architecture, Leadership, Data Cleansing, Data Migration, Data Classification, Excel 365, Data Cleaning, Artificial Intelligence (AI), Large Language Models (LLMs), Analytics, Big Data, Cloud Platforms, Parquet, Deep Learning, Web Scraping, Modeling, Statistics, ECharts, Machine Learning Operations (MLOps), Data Build Tool (dbt), Data Quality Analysis, Geotechnical Engineering, Microsoft Graph API, CRM APIs, Apex Classes, Apex Triggers, Distributed Systems, AWS Lake Formation, Retrieval-augmented Generation (RAG)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring