Jakub Kaczanowski
Verified Expert in Engineering
Data Engineering Developer
Adelaide, South Australia, Australia
Toptal member since June 18, 2020
Jakub started taming data nearly 20 years ago, building Access databases in the oil and gas sector. Since then, he's built data solutions for various Australian financial and government clients, co-founded a fintech startup, and freelanced for US and EU multinationals. An expert in BI, analytics, and data warehouse architecture and development, Jakub is much more than a technical resource; he has a sound understanding of the role of insightful data and its commercial application.
Portfolio
Experience
Availability
Preferred Environment
Amazon Web Services (AWS), Tableau, Microsoft SQL Server, Visual Studio, Azure, Microsoft Power BI
The most amazing...
...thing I've built is a railway crossing monitoring tool that identifies unsafe crossings and proposes a safe train speed restriction based on data feeds
Work Experience
Business Intelligence Consultant
Department of Human Services SA
- Worked directly with the director of finance to design and build salary and workforce planning models and reports in SSRS using agile methodology to iterate features into models and reports quickly.
- Built the finance reports to combine data from the budgeting and HR systems to easily visualize historical and current budgeted positions and FTEs vs actual staffing, contracts, and payroll.
- Architected, documented, and built the National Disability Insurance Scheme (NDIS) analysis platform on Microsoft SQL Server, Analysis Services, and Power BI.
- Worked on NDIS analysis platform dashboards that enabled high-level actuals vs budget, plan utilization, and YoY change management reporting with data journey drill-down capabilities down to individual participant, plan, invoice, and provider levels.
- Worked with the DHS equipment program team to architect, build, and deliver galaxy and star(s) schema data marts to enable fast, accurate, and easy-to-use analytics over equipment provision data.
- Replaced existing manual reporting processes with an automated real-time solution that expanded on previous functionality to allow for deeper and more intuitive analysis.
- Mentored and advised the team on the latest data analytics architecture patterns regarding cloud solutions (Databricks, Azure Synapse/Fabric, and dbt), data lakehouses, medallion architecture, virtual data marts, and ELT vs. ETL.
- Mentored and advised the team on their journey from an on-premise environment to a cloud-based agile/DevOps-driven approach.
- Migrated existing data projects and solutions into Azure DevOps.
Data Engineer
REST Super
- Worked with the insurance data uplift team to provide advice, design, development, and technical guidance on implementing an insurance analytics data platform that utilizes ELT best practices optimized for cloud data warehouses.
- Migrated and refactored existing Talend ETLs into modular dbt models, identifying and fixing bugs and issues.
- Implemented automated data quality and deployment tests within the dbt test framework.
- Design and build CI/CD and AD workflows within GitHub and dbt cloud to automate releases.
- Mentored junior staff to uplift data engineering capabilities.
- Showcased the demo solution and progress to clients and business managers.
Data Warehouse Engineer
Stanford University
- Worked with the Stanford Cancer Institute to design and build a modular data platform, conformed genomics data models, and a generic data framework using Azure, DBT, and Databricks.
- Designed, developed, and deployed an Azure data lake and generic metadata-driven automated data factory pipelines to ingest public and private genomics data sources from various platforms (web, FTP, complex flat file, CDM, database).
- Designed, developed, and deployed a Databricks data lakehouse (Unity Catalog) and Python libraries to ingest the data lake into the Unity Catalog. Incorporated automated execution into metadata-driven Azure pipelines.
- Developed Python libraries to handle ingesting and converting complex genomics data types like VCF (via GLOW) and handle very uncommon big data (datasets with millions of columns) into Delta Lake.
- Incorporated dbt (data build tool) to handle modeling data lakehouse raw data into useful data models. Automated CI/CD model deployment and execution via GitHub and Databricks workflow jobs.
- Incorporated custom 'time travel' logic into dbt models, which allows researchers to retrieve data at a specific release, point in time, and current version to enable replicating research and experiments.
- Automated data dictionary generation and deployment into confluence.
- Worked closely with data scientists and research administrators to design enhanced data models to enable the TOBIAS (test of bias) application and other research projects.
- Documented the entire solution within Confluence, including technical design documentation, solution design, deployment configuration, user onboarding, and developer onboarding.
- Designed and developed Power BI operational dashboards to track the data platform, pipelines, and jobs.
Senior Data Engineer - Global Laboratory & Analytics Company
Tentamus Group GmbH - Main
- Designed and developed a data mart to allow faster and deeper reporting for internal and external stakeholders by combining multiple data sources from the various international labs run by Tentamus.
- Ran training sessions to train the data science team on the administration and management of the data platform.
- Scaled platform by onboarding additional datasets from existing data sources and provisioning new data sources.
- Provided additional support and feature enhancement for the data platform.
Senior Data Consultant (Analytics Engineer)
Discovery Parks
- Designed a data platform architectural strategy to enable deep analytics of customer data, marketing automation, and the attribution of online and offline activity, presented to and accepted by the CTO and senior managers.
- Documented, designed, and developed a tactical platform based on my proposal using Azure Data Lakes, Synapse serverless, and virtual data marts to enable rapid development and prototyping of data models via dbt and visualization in Power BI.
- Iterated the tactical model into an enterprise solution underpinned by a Databricks data lakehouse (delta lake) with dbt used to deploy the now more mature data models.
Data and Analytics Developer
Elders Rural
- Provided expertise and advice on an Azure cloud PoC data lake implementation project to support greater analytics by integrating Elders Rural core financial data with that of other business entities acquired by Elders.
- Built integration pipelines in just a few days to replace non-functional pipelines that had months of work.
- Created a certificate-based JWT authentication to allow seamless integration to a data source.
- Developed, designed, and implemented solutions using Azure Synapse (pipelines, data flows, serverless) and Databricks.
Senior Data Engineer
Tentamus
- Analyzed the company's existing data warehousing solution architecture, provided feedback, and wrote a proposal for a new solution and approach.
- Designed and developed a PoC data mart based on the proposal to allow faster and deeper reporting for internal and external stakeholders by combining multiple data sources from the various international labs run by Tentamus.
- Wrote and ran training sessions for the data science time on how to use and add new data sources to the data mart.
Data and Analytics Developer
Australian Rail Transport Corporation
- Provided support in integrating and implementing ARTC's asset management support capabilities, such as improved work planning and evidence-driven proactive asset maintenance based on exceptions, trends, and predictions.
- Designed, developed, and migration of integration data models using Azure Synapse and Parallel data warehouse, Azure Databricks, and Azure Data Factory.
- Provided capability to the corporate services team to migrate a traditional ETL-driven data architecture to an Azure hybrid cloud solution.
Data and Analytics Developer and Architect
Aircraft Hardware West
- Collaborated with senior management to architect and develop inventory management analytics to streamline analysis and forecasting of demand and stock levels to allow AHW to meet contractual SLAs and required stock on hand levels.
- Loaded, integrated, and modeled disparate data sources and reference data using ELT and a relational data lake approach using Pentaho and SQL Server.
- Designed and built a suite of dashboards in Power BI to visualize findings, forecasts, and historical trends and allow further what-if analysis.
- Implemented statistical models to surface complex domain-specific attributes like safety stock and economic order quantity.
- Designed, documented, and built dimensional star schema models to enable complex analysis.
BI Developer and Architect
Department of Health (South Australia)
- Completed a short-term contract within the system performance and service delivery team providing capability ETL design and development, solution architecture, and data warehouse architecture.
- Acted as the architect and built a modular ELT framework using SSIS and SQL to productionize a Qlik Sense-based proof of concept solution.
- Reverse-engineered Qlik script and migrated it into an ELT framework and the interface into the enterprise data warehouse.
Head of Technology, Architect, and Business Intelligence Consultant
Laneway Analytics
- Served as the project owner for Luci, our data analytics platform that Laneway built to help users engage with analytics more quickly and naturally. Used Agile and Scrum.
- Developed Luci, a SPA running on AWS with a React front end, .NET Core, PostgreSQL, and Tableau as the data visualization layer. The analytics component is built using S3, Redshift, Redshift Spectrum, and columnstore SQL Server.
- Acted as the principal data architect at Laneway for our most important clients. Using my Agile approach to data integration and data modeling, I proved that a data warehouse could be functional within weeks, not months or years.
- Developed a complex financial analysis tool utilizing an SQL Server OLAP cube as a calculation engine and Tableau as the visualization layer.
- Delivered rapid architecture, design, and development of a number of tactical data marts built-in SQL server and Redshift for large clients in logistics, manufacturing, and finance. Integrated and modeled data with a logical view ELT-based approach.
Senior Business Intelligence Consultant and Data Architect
Chamonix
- Designed and developed enterprise data warehouse solutions that spanned the full Microsoft BI stack, including data store, integration, data models, and visualizations for numerous clients across the government, health, education, and utilities sector.
- Mapped out and created numerous complex SSAS OLAP solutions utilizing multidimensional and tabular, including Power Pivot.
- Planned and developed an SSIS-based modular ETL framework that integrated data from various RDBMS (Oracle and SQL Server), flat file, and web-based data sources. The framework was successfully deployed to numerous clients.
- Outlined and developed a SQL Server data warehouse for a financial reporting and forecasting tool for a major utility provider in Australia.
- Designed and developed a SQL Server data warehouse featuring a real-time, shift, and historical reporting dashboard solution for ambulance emergency response. Near-real-time data is visualized for the commanders at the incident response center.
- Developed a suite of 30+ complex KPI, financial, and operational reports for a public utility company. Completed in under a month, allowing the project to go live on schedule after IBM had de-scoped the vital reporting assets.
- Provided pre-sales and proof of concept and prototyping support to the sales team to win new business.
- Worked closely with business stakeholders to analyze requirements and provide sound solutions.
- Developed and deployed a number of analytics PoC solutions in Azure for client projects using Azure SQL and Power BI.
Business Intelligence Developer and Data Architect
Department of Business and Innovation (Victoria)
- Created SSIS-driven ETL framework for reporting Australian Vocational Education and Training Management Information Statistical Standard (AVTEMISS) compliant training activity data to the National Centre for Vocational Education Research (NCVER).
- Built metadata-driven ETL (SSIS) and reporting suite (SSRS) for data reconciliation reporting between source systems, data warehouse (SQL Server), and OLAP cubes (SSAS).
- Developed lightweight optimized training activity analysis services multidimensional OLAP cube for faster reporting of commonly used measures for consumption via Excel reporting packs and self-service.
Analyst, Programmer, and Data Architect
Department of Education (Victoria)
- Developed and deployed a centralized single source of truth of school-based reference data for application use.
- Designed and built a multi-faceted data integration process, including change data capture, SQL replication from Oracle and old versions of SQL Server to a centralized SQL Server, and integration services.
- Oversaw the analysis and architecture of the centralized SQL Server database, including all architectural documentation and code implementation, and consolidation of existing reference databases.
- Implemented SQL Server schema-based security to lock down student reference data.
Data Warehouse Developer
Link Group
- Migrated superannuation customer data from an AS/400 source to SQL Server data warehouse.
- Designed and developed a generic SSIS ETL metadata-driven framework for generating extracts and delivering via SFTP B2B link to AMP.
- Fixed, enhanced, and created new dashboards and reports for the SQL Server Reporting Services report pack sent out to customers.
Database Administrator
Building and Plumbing Commission (Victoria)
- Managed the SQL Server environment at the Building and Plumbing Commission (Victoria) as the DBA, including backup and restore, performance, and disaster recovery on a few occasions.
- Identified timing issues with existing backup plans that had invalidated all point-in-time backups; created a new backup process and ran restore scenarios weekly.
- Debugged, profiled, and queried optimization of database applications that used SQL Server as their data tier.
- Wrote a database consolidation plan for management.
- Advised management of risks, data costs, and feasibility of proposed third-party vendor applications in relation to SQL Server.
Analyst and Programmer | Data Migration
National Australia Bank
- Contributed to a large project to roll out a third-party solution to replace a legacy in-house product used for managing unit pricing for superannuation products.
- Migrated historical data and implemented the new solution within the existing National Custodial Services (NCS) framework.
- Updated the profile data in the Oracle database and wrote SQL migration scripts.
- Used DataStage to create and orchestrate data migration processes.
Senior Analyst and Programmer | Database
Department of Health (South Australia)
- Developed a 3-tier enterprise-level community healthcare application using Visual FoxPro and SQL Server. I was involved in the entire lifecycle of the product, including front-end and database analysis, development, and optimization.
- Designed and developed a scheduling module using Visual FoxPro and SQL Server.
- Rewrote the address lookup when a patient presented at a hospital. The original screen had 30+ text boxes. The new version had a single text box and used custom fuzzy logic to create a ranked list of possible matches from an address lookup whitelist.
- Performed database and query optimization using SQL Query profiling and index optimization.
Database Conversion Analyst
Department of Further Education, Employment, Science and Training (South Australia)
- Served as the technical lead on data migration from legacy system to 3-tier distributed architecture.
- Performed a gap analysis of before and after data models and created data mapping tables and complex SQL data migration procedures.
- Created orchestration ETL process in SSIS to perform the data migration.
Senior Analyst and Programmer | Data Migration
Department of Health (South Australia)
- Acted as the technical lead on multi-site data migration from a legacy FoxPro system into a centrally hosted 3-tier enterprise-level solution.
- Developed and executed the migration plans and tools.
- Created a generic repeatable ETL process in DTS/SSIS to perform the data migration.
- Oversaw the data mapping, validation, gap analysis, and cleansing using SQL Server.
- Liaised with each hospital administration team to write test plans, validate trial migrations, and organize the production migrations.
- Trained hospital staff on the use of the new system.
- Rewrote and refactored Crystal Reports and SQL data exports.
Programmer
Steadfast Australia
- Developed and implemented a middleware solution that integrated orders on the web portal with the in-house FoxPro-based tracking system and external transport companies to generate pack orders and print shipping labels.
- Created and supported a web-based portal for customer orders using PHP and MySQL.
- Built financial and operational reporting solutions within TransLogix, the warehouse stock management system.
- Developed end-to-end integration between the order entered on the web portal and the products picked for shipping. Created custom barcodes so workers could use handheld scanners to update workflow status without typing.
Database Developer
Origin Energy
- Developed a MS Access-based application to create a searchable index of technical drawing of all assets at a number of power plants.
Experience
LUCI (Laneway Analytics)
It is a single page application built using React for the front end, .NET Core, and PostgreSQL for the back end and Tableau as the visualization layer. The analytics layer is a combination of an S3 data lake, AWS Redshift, and Columnstore Microsoft SQL Server.
I was responsible for all aspects of our stack and responsible for choosing technology products and solutions. I managed our technology partnerships and start-up programs with Microsoft and AWS.
Python-driven Web Classified Scraper
I used Angular and .NET Core (iis) with SQL Server to create a front and back end for data management. This allowed me to register a new URL to ping or scrape and manage it. I used the Pushbullet API with the Pushbullet iOS app to get notifications on my iPhone.
I scheduled Python scripts to do the following:
- connect to the SQL back end to pull the scraped URL
- beautiful soup to scrape and parse the site
- save scraped details back to the SQL database
- identify new ads and send them to the Pushbullet API which sent it to my phone
It works great and found excellent bargains.
ADA (HESTA/Laneway Analytics)
I architected and began the build of the Data Lake, dimensional model, dashboards, and extended the functionality of Tableau via an embedded wrapper (React) that overcame some of Tableau's limitations and enabled us to tell a more compelling data story.
The solution-focused on member engagement and management KPIs then evolved to provide analytics for churn rate, competition, financial adviser performance, call center workflows. Alteryx trialed for a time as an enterprise ETL tool.
The project gave me a platform to use and evolve my Agile approach to data modeling and analytics.
AWS Redshift + S3 + Parquet + Redshift Spectrum + Tableau + Confluence + Alteryx
Gross Profit Analytics (Laneway Analytics)
I built the relational back end (SQL Server) and multidimensional calculation engine (Microsoft SSAS) that sits at the core of the solution and allows us to compare and contrast a portfolio/hierarchy against:
* prior performance (prior year/half/quarter/month adjusted for seasonality and year to date)
* industry benchmarks
* forecasts
* budgets
* any other object in the portfolio (e.g., compare business units and regions to each other)
* performance management (e.g., individual performance vs. an actual average performance by all staff for the period)
* impact of fx
Tableau dashboards act as the visualization layer, with an AWS API gateway and Lambdas sitting between Tableau and the SSAS cube.
This is another feature in an SSIS-driven export module that shards and parallelizes large export tasks.
Australian Super Member Insurance Analytics Pilot (Laneway Analytics)
I was the data engineer and architect and worked with my team and Australian Super to design and build the integration and data models using agile methodologies. I used my framework for rapidly developing analytics solutions for the first time at a large customer. It allowed us to iterate our solution at an incredible rate to deliver relevant deep insights. The high speed of the deployment and delivery significantly improved user engagement with data.
CSV + SQL Server + Tableau
Private Equity Partners - Whiteroom for Takeover Bid (Laneway Analyics)
As the data architect, I integrated and modeled the data, which enabled the analysis to be completed in a very thorough manner.
AWS + CSV + SQL Server + Tableau
Simplify Spatial (Personal Project)
I was very disappointed by the processing speed of geospatial tagging queries by relational database engines on a particular project. After researching the general approaches of the primary database engines, and leveraging my deep understanding of database principles, I saw a novel approach to the problem. That was in 2013 (for SQL Server 2008R2), my libraries are still many times faster than using the default functionality in 2017 on the latest version of SQL server 2016.
National Accruals Platform (APA/Chamonix)
The tool was built on a Microsoft stack, with SSIS used to integrate data from disparate Oracle, SQL Server and web data sources, SQL Server as the data warehouse, analysis services as the modeling tool, and Excel workbook dashboard report pack. The model performs complex calculations and takes prior usage, seasonality, actual, and forecast weather conditions (sunshine hours, wind speed, temperatures) as inputs to accrue the predicted income for the missing days of usage. The income normalizes over time as actual data becomes available.
South Australian Ambulance Service Real Time Ops dashboards (Chamonix)
I was the data architect and data warehouse developer. I built a robust solution to quickly load data in as it was arriving, lightweight dashboards to render it quickly, and a framework to deal gracefully with deadlocks (complex SQL database concepts like partition switching, deadlock priority, handling lock escalation and schema locks).
SQL Server, SSIS, Reporting Services, SharePoint
Education
Bachelor's Degree in Computer Science
Flinders University - South Australia
Certifications
Exam 467: Designing Business Intelligence Solutions with Microsoft SQL Server
Microsoft
Exam 466: Implementing Data Models and Reports with Microsoft SQL Server
Microsoft
Skills
Tools
DataViz, SSAS, Tableau Development, Business Intelligence Development, Excel Development, Microsoft Development, Microsoft Development, Pentaho Data Integration (Kettle), Visual Studio Development, Microsoft Access Development, Crystal Reports, Data Science, AWS, GitHub, Cognos Analytics 11
Languages
T-SQL, SQL, MDX, Visual Basic, PHP, Visual FoxPro, XML, C#, Python
Frameworks
Data Lakehouse, .NET
Paradigms
Dimensional Modeling, OLAP, ETL, Business Intelligence Development, Database Design, Kimball Methodology, Dataflow Programming, DevOps
Platforms
Windows Development, Azure Design, Databricks, Azure Synapse, BIRT, Pentaho, Oracle Development, AWS, Alteryx, Microsoft Fabric
Storage
SQL Server, Data Integration, SQL Server, SQL Server, SSIS, SSAS Tabular, Database, Database, Azure, Redshift, SQL, Amazon S3, Database Administration (DBA), Database, MySQL, Datastage, PostgreSQL, Data Lakes, Database
Other
Data Engineering, Data Marts, Data Migration, Data Aggregation, Data Modeling, Business Intelligence Development, Data Warehouse, Data Warehouse, SSRS Reports, Software Development, Data Visualization, Data Architecture, Dashboard Design, Dashboard Development, CSV, Dashboard, Reports, Reporting, Query Optimization, Star Schema, ELT, Business Intelligence (BI) Platforms, ETL Tools, Data Science, Data Science, Data Analysis, Azure, Azure Data Factory, Big Data Architecture, Business Intelligence Development, Embedded Business Intelligence, Multidimensional Expressions (MDX), Analytics Development, DAX, Data Build Tool (dbt), Delta Lake, CDC, Big Data Architecture, SSIS Custom Components, Big Data Architecture, Data Mining, CI/CD Pipelines, MDM, Slowly Changing Dimensions (SCD), Computer Science
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring