Daniel is available for hire

Daniel Bredun

Verified Expert in Engineering

Data Scientist and Developer

Location

Rzeszow, Poland

Toptal Member Since

August 29, 2023

Daniel is a data scientist and engineer who is a whiz at the data lifecycle. He excels at crafting efficient data pipelines, designing databases, conducting advanced analyses, and harnessing machine learning. Coupled with his proficiency in cloud storage systems, Daniel has consistently driven business success. Even in the face of challenging constraints, his passion for problem-solving ensures top-tier, long-term solutions.

Data Visualization Data Engineering Data Analysis Data Warehousing Data Reporting PyCharm Python Pandas NumPy Git SQL GitHub PostgreSQL Amazon S3 (AWS S3)ETL Data Mapping Vertica

Portfolio

New Columbia Solar

Apache Airflow, PostgreSQL, Amazon Web Services (AWS), Python, Statistics...

Movement of Mothers

Data Analysis, SQL, Data Visualization, Data Reporting

The University of Chicago

Data Science, Kubernetes, Neural Networks, PyTorch, Docker, Anaconda, PyCharm...

Experience

SQL - 5 years Python - 5 years Amazon Web Services (AWS) - 3 years Bash - 3 years Data Mapping - 3 years TensorFlow - 3 years Apache Kafka - 2 years PySpark - 2 years

Availability

Part-time

Preferred Environment

PyCharm, MacOS

The most amazing...

...data collection I've done was from an ancient public API, boosting it from 10 to 60,000 data points per minute by reverse-engineering their web portal requests.

Work Experience

Senior Data Science and Engineering Consultant

2019 - PRESENT

New Columbia Solar

Designed and deployed a relational data warehouse and object-oriented data pipeline for asset management data on AWS.
Saved over $40,000 monthly in lost profits through an automated predictive model for prompt anomaly detection.
Achieved a 9% revenue increase from new assets by identifying performance factors in existing ones.
Reduced maintenance time from nine to three days by building a custom web application for asset monitoring and contributing to the 10% efficiency increase.
Led a team of three to automate investor reporting, saving over 100 hours of manual work monthly and reducing costs by 12%.

Technologies: Apache Airflow, PostgreSQL, Amazon Web Services (AWS), Python, Statistics, Data Warehouse Design, Time Series Analysis, Pandas, Google Sheets API, RESTful Services, Google Cloud Platform (GCP), Google Sheets, Dashboard Design, Dashboards, Data Modeling, REST APIs, Databases, PL/SQL, Data Warehousing, Business Intelligence (BI), Machine Learning, Data Engineering, Data Science, PyCharm, Amazon RDS, Amazon EC2, Redshift, Data Pipelines, AWS IAM, Amazon S3 (AWS S3), ECharts, Vue, DevOps, APIs, NumPy, Django, Jupyter, Database Administration (DBA), SQL, Microsoft Excel, JavaScript, GitHub, ETL, Amazon Athena, Data Scientist, Data Build Tool (dbt), CI/CD Pipelines, Node.js, Microservices, Proof of Concept (POC), Jira, Performance Optimization, Data Architecture, Leadership, Data Quality Analysis, Data Cleansing, Data Reporting, Database Migration, Firebase, Amazon Aurora, Database Optimization, Terraform, Data Mapping

Data Analyst

2023 - 2023

Movement of Mothers

Reconciled court case data from various sources and analyzed it, informing the legislature in California.
Designed and executed a systematic, unbiased survey to gather critical data, facilitating insightful analysis and decision-making.
Worked with stakeholders across multiple nonprofit organizations to gather and understand the data in question.

Technologies: Data Analysis, SQL, Data Visualization, Data Reporting

Data Science Research Assistant

2022 - 2023

The University of Chicago

Deployed machine learning (ML) models using free and proprietary tools, such as Kubernetes and funcX, for scalable use by the scientific community.
Collaborated on developing a platform for publishing and sharing AI models for research purposes.
Authored ML models predicting the physical properties of new compounds based on their chemical composition.

Technologies: Data Science, Kubernetes, Neural Networks, PyTorch, Docker, Anaconda, PyCharm, Statistics, Machine Learning Operations (MLOps), Git, Ubuntu, Data Modeling, Machine Learning, Python, Scikit-learn, Jupyter, Data Scientist, Microservices, Leadership

Data Analytics and Engineering

2022 - 2022

Tesla

Reduced data storage costs by migrating from Vertica to a data lake using Parquet on Amazon S3. The migration was accomplished via Hudi on Apache Spark.
Diagnosed and resolved inefficiency in data replication by automating table schema synchronization.
Sped up PostgreSQL data replication by 300% by migrating it from ETL to Apache Kafka data streaming.

Technologies: Spark, PySpark, MySQL, Apache Kafka, Amazon S3 (AWS S3), Apache Hudi, Data Lakes, Parquet, Database Replication, Kubernetes, Docker, Vertica, InfluxDB, Presto, Pandas, PyCharm, Git, Bash, Data Engineering, Ubuntu, REST APIs, Databases, PL/SQL, Oracle, Data Warehousing, Python, Data Pipelines, Test-driven Deployment, Protobuf, NumPy, SQL, GitHub, ETL, Message Queues, CI/CD Pipelines, Microservices, RabbitMQ, Jira, Apache Spark, Performance Optimization, BigQuery, Snowflake, Data Reporting, Databricks, Database Migration, NoSQL, Cloud Firestore, Database Optimization, Scala, Data Mapping

Junior Data Analyst

2019 - 2019

Prodigal Sun Solar

Increased client's revenue by 5% through a hierarchical statistical hypothesis test to compare solar panel manufacturers.
Devised a creative optimization for API calling procedure, reducing its time from 3.65 days to 53 seconds.
Built an automated ETL system in Python for processing XML, JSON, and CSV data from solar APIs.

Technologies: Data Analysis, R, Pandas, NumPy, Scikit-learn, Hypothesis Testing, Git, PostgreSQL, Data Visualization, Matplotlib, RESTful Services, Dashboard Design, Tableau, Dashboards, Data Modeling, REST APIs, Databases, Data Analytics, Business Intelligence (BI), PyCharm, Python, APIs, GitHub, MongoDB, Leadership, Data Quality Analysis, Data Cleansing, Data Mapping

Experience

HEAReader: Sync-reading Books Voiced by Real Humans

https://github.com/Breedoon/BookSync

I developed HEAReader, a sync-reading books solution voiced by real humans. It used a TensorFlow-based algorithm for word-to-word matching of audiobooks with books, enabling synchronous reading. Also, I learned Swift and created an iOS app to serve as a proof of concept (POC) for the algorithm.

MDtoLongPDF: Converting Markdown to Pageless PDFs

https://github.com/Breedoon/MDtoLongPDF

Pagination in PDF has become irrelevant as most documents are not intended for printing. However, page breaks still disrupt the content flow, splitting sections, breaking tables, and moving figures around, which leads to wasted space, all to serve a function that is no longer needed.

MDtoLongPDF is a tool intended to solve this issue by converting unpaginated formats like Markdown and HTML into a single, extensive PDF page. This tool eliminates unnecessary page breaks, enabling seamless content rendering. I personally rely on it for creating documents and resumes.

AdmitMe

I worked on AdmitMe, an app that helped 300+ high school graduates in Ukraine find the colleges they were most likely to get into based on historical admissions data scraped from the government website and their exam scores. It achieved 89% of accuracy.

Certifications

DECEMBER 2018 - PRESENT

MTA: SQL Development

Microsoft

DECEMBER 2018 - PRESENT

MTA: Python Development

Microsoft

Skills

Libraries/APIs

Pandas, NumPy, Matplotlib, PySpark, PyTorch, TensorFlow, Scikit-learn, Google Sheets API, REST APIs, DeepSpeech, Vue, Protobuf, Node.js

Tools

PyCharm, Git, GitHub, Apache Airflow, Jupyter, Google Sheets, Tableau, Jira, AWS IAM, Prince XML, Microsoft Excel, Amazon Athena, RabbitMQ, BigQuery, Terraform

Languages

Python, SQL, R, Bash, JavaScript, Java, Markdown, HTML, Swift 5, C++, GraphQL, Snowflake, Scala

Paradigms

Data Science, ETL, Test-driven Deployment, DevOps, Business Intelligence (BI), Microservices

Platforms

MacOS, Amazon Web Services (AWS), Amazon EC2, Docker, Ubuntu, Apache Kafka, Apache Hudi, Kubernetes, Anaconda, Google Cloud Platform (GCP), Oracle, Databricks, Firebase

Storage

PostgreSQL, Amazon S3 (AWS S3), Database Administration (DBA), Database Migration, Data Pipelines, Databases, PL/SQL, NoSQL, Amazon Aurora, Redshift, MySQL, Data Lakes, Database Replication, Vertica, InfluxDB, MongoDB, Cloud Firestore

Frameworks

Apache Spark, Spark, Presto, Django

Other

Data Engineering, Data Analysis, Data Visualization, Data Warehousing, Data Reporting, Database Optimization, Data Mapping, Machine Learning, Amazon RDS, Data Warehouse Design, Neural Networks, Time Series Analysis, APIs, Hypothesis Testing, RESTful Services, Dashboard Design, Dashboards, Data Modeling, Data Analytics, Message Queues, Data Scientist, CI/CD Pipelines, Proof of Concept (POC), Performance Optimization, Data Architecture, Leadership, Data Cleansing, Parquet, Deep Learning, Pandoc, Web Scraping, Modeling, Statistics, ECharts, Machine Learning Operations (MLOps), Data Build Tool (dbt), Data Quality Analysis

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring