Maksym Karashchuk, Developer in Warsaw, Poland
Maksym is available for hire
Hire Maksym

Maksym Karashchuk

Verified Expert  in Engineering

Data Engineer and Software Developer

Location
Warsaw, Poland
Toptal Member Since
October 14, 2022

Max is a professional data architect and engineer with almost seven years of experience. With solid programming and excellent communication skills, he has successfully finished multiple long-term projects as a tech lead or senior data engineer. Max is passionate about warehouse automation using metadata-driven ETL. He is a people person who always finds time to guide and advise other colleagues.

Portfolio

Predica
Azure, Data Architecture, Azure Databricks, ETL, Python...
PwC Poland
Azure, Azure SQL Databases, Draw.io, Azure DevOps, CI/CD Pipelines...
Lingaro
Data Quality, Enterprise Architecture, REST APIs, Linux, Java, Python, UML

Experience

Availability

Part-time

Preferred Environment

Azure, Microsoft, Synapse, Python, Azure Data Factory

The most amazing...

...solution I've developed was an automated ETL process that pulls data from five different types of sources with over one million records of the daily workload.

Work Experience

Data Engineer and Data Architect

2020 - PRESENT
Predica
  • Architected a brand new solution with the best cloud security and networking techniques.
  • Delivered more than 20 different smaller and medium-sized projects for more than ten different clients in over a year.
  • Developed a fully isolated network environment for a data warehouse solution to prevent data leakage and improve data security.
  • Managed a small team in a role of a tech lead successfully.
  • Migrated a full-scale data warehouse from Oracle to Synapse, including all ETL processes.
Technologies: Azure, Data Architecture, Azure Databricks, ETL, Python, Artificial Intelligence (AI), Cloud, Enterprise Architecture, Data Migration, Cloud Migration, System Migration, UML

Senior Data Engineer

2020 - 2020
PwC Poland
  • Designed and developed solutions for a dynamic load of file-based sources, such as Excel, CSV, TXT, etc.
  • Gathered and structured all client requirements in one place to be used as a reference in the future.
  • Translated a real production process onto an ETL process for a better near-real-time analysis.
Technologies: Azure, Azure SQL Databases, Draw.io, Azure DevOps, CI/CD Pipelines, Azure Data Factory, UML

Senior Data Quality Analyst

2019 - 2020
Lingaro
  • Built a solution analyzing data discrepancies and data flow between ten and 15 inter-connected applications.
  • Created a monitoring solution for daily data quality verification.
  • Created a tracker of all known data quality problems and environmental issues.
  • Documented and visualized the client's environment with an enterprise architect tool.
  • Managed roles and responsibilities of a small team of data quality experts.
Technologies: Data Quality, Enterprise Architecture, REST APIs, Linux, Java, Python, UML

Data Engineer

2017 - 2019
Lingaro
  • Constructed a flexible and fully metadata-driven solution from scratch.
  • Analyzed, cleaned, and improved legacy code with a custom-built code parser that identifies all inter-object relationships.
  • Created a step-by-step migration process, significantly improving the speed of releases.
  • Designed a well-documented solution using an enterprise architect tool.
  • Established well-defined communication policies between environment application teams for faster reactions to coming changes.
  • Scaled a solution by following client requirements.
Technologies: Azure, Azure SQL Databases, Oracle, Azure Data Factory, Java, Linux, Enterprise Architecture, Microsoft Graph API, Informatica Cloud, Python, UML

Operations Team Member

2016 - 2017
Lingaro
  • Provided and automated support for a large data warehouse, processing around 500,000 records daily.
  • Created training materials and performed training sessions for newcomers.
  • Documented incidents and problems for further resolution by a development team.
Technologies: Oracle, ETL, Informatica Cloud, Linux, TIBCO, Oracle Business Intelligence Enterprise Edition 11g (OBIEE)

Data Lake Analytics

Designed and assisted in implementing a cloud-based application in Azure for data unification purposes in the role of a data architect. I gathered all client requirements regarding network, security, role, responsibilities, backup, and recovery strategies. All these details were effectively documented and visualized with unified modeling language (UML) tools, enabling more than 12 data engineers involved in the project to know what exactly was expected to be built in each phase. In the end, the solution unified data from around 13 sources with different structures in one Azure Synapse data warehouse was ready to be used for further analysis.

Deep Data Quality Analysis in Distributed Environment

Built an advanced Python application for data flow monitoring between over ten distributed identity management systems from different regions. The main difficulty was dealing with various source formats and staying compliant with country or regional laws for data processing, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act of 2018 (CCPA). Furthermore, the whole distributed environment was successfully visualized for the first time during this project, which brought significant value to the client.

Support Shift Automation

Built a Java application that automatically translates information about support shifts provided in an Excel file on SharePoint to Outlook calendar events. This application has resolved many problems with shift attendance by supporting team members and improving their project performance. The solution is flexible enough to recognize new people in a team and add them to the list of support team members for further usage.

Real-time Road Sign Recognition | Master Thesis

Implemented a Python application for real-time road sign recognition. The solution can recognize a type of sign with a maximum speed of 24 frames per second. The approach transforms an incoming image with a signaling gateway (SGW) method and then applies a maximally stable extremal regions (MSER) algorithm to identify interest areas. Each selected location is sent to the trained model to get the road sign's name and probability.

Metadata-driven Analytics Platform

Implemented a metadata-driven solution for the dynamic transformation of hundreds of different file sources for daily, monthly, and yearly reports. The solution increased the speed of data analysis from days to hours in the client's organization. The challenge was that each file had a completely different format and required a lot of cleansing before uploading to the core of the data warehouse (DWH). Furthermore, the solution had to be bulletproof to ensure it couldn't fail at any moment in case of minor changes in the file—since files were received directly from business users.

Real-time Data Replication from Oracle to Synapse

A solution that had to prove the concept of real-time data transfer from the source Oracle DB to Synapse DWH for further advanced data analysis. I participated as the primary data engineer who built the main data flow across the platform up to the Power BI reporting. The real-time data replication from Oracle DB was achieved using Oracle GoldenGate, which was connected to the Event Hub and was constantly propagating changes on the source DB. The idea was to take the data from Azure Event Hub, join it with some reference data in Databricks, and put it to Synapse using streaming datasets. Along the way, the performance had to be measured and optimized to achieve the best possible cost-to-value ratio.

Languages

SQL, Python, UML, Java, Data Control Language (DCL), SQL DDL, SQL DML

Tools

Synapse, Microsoft Power BI, Azure App Service, Oracle Business Intelligence Enterprise Edition 11g (OBIEE), Draw.io, Jira, Azure Key Vault, Oracle GoldenGate

Paradigms

ETL, Data Science, Azure DevOps, REST

Platforms

Azure, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), Azure Synapse, Oracle, Linux, SharePoint, Databricks, Azure Event Hubs

Other

Data Warehousing, Data Architecture, Cloud Migration, Networking, Data Quality, Solution Architecture, Web Security, Artificial Intelligence (AI), Azure Data Lake Analytics, Cloud, Data Analytics, Data Processing, Data Visualization, Virtualization, Cloud Security, Cloud Services, Cloud Storage, Azure Databricks, Data Engineering, Azure Data Factory, Azure Data Lake, Azure Stream Analytics, Big Data, Azure Administrator, Load Balancers, Azure Virtual Machines, Azure Virtual Networks, Informatica Cloud, TIBCO, Enterprise Architecture, Microsoft Graph API, CI/CD Pipelines, Data Migration, System Migration, Backup & Recovery, Streaming

Frameworks

.NET

Libraries/APIs

REST APIs

Storage

Azure SQL Databases, MariaDB, MySQLdb, Database Administration (DBA), Oracle DBA, Database Security, Azure Active Directory, Azure Cosmos DB, MongoDB, Teradata, Azure SQL, DB

2018 - 2020

Master's Degree in Data Science

Polish-Japanese Academy of Information Technology - Warsaw, Poland

2014 - 2018

Bachelor's Degree in Software Engineering

Polish-Japanese Academy of Information Technology - Warsaw, Poland

DECEMBER 2021 - DECEMBER 2022

Microsoft Certified: Azure Data Engineer Associate

Microsoft

JUNE 2021 - PRESENT

Microsoft Certified: Azure Data Fundamentals

Microsoft

JUNE 2020 - JUNE 2023

Microsoft Azure Administrator Associate

Microsoft

JUNE 2020 - PRESENT

Microsoft Certified: Azure Fundamentals

Microsoft

NOVEMBER 2019 - PRESENT

Oracle Database SQL Certified Associate

Oracle

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring