Innocent Musanzikwa, Data Engineer and Developer in Calgary, AB, Canada
Innocent Musanzikwa

Data Engineer and Developer in Calgary, AB, Canada

Member since August 10, 2021
Inno is a seasoned data engineer and developer who's worked at IRI—a top retail data analytics company—in Africa and North America for the past decade. As a SQL and ETL developer, he has created quality data warehouses using industry-standard techniques like Kimball and DataVaults. As a data engineer, Inno has built highly robust and scalable data pipelines both on-premise and on the cloud using several latest cutting-edge technologies.
Innocent is now available for hire

Portfolio

Experience

Location

Calgary, AB, Canada

Availability

Part-time

Preferred Environment

SQL, PySpark, Python, Hadoop, Apache Hive, Azure Synapse, Oracle, SQL Server Integration Services (SSIS), Azure Data Factory, Data Warehousing

The most amazing...

...big data warehousing and data integration solution I've designed—using Python, SQL, ADF, Hadoop, Hive, and Spark—won an RFP in Canada out of six competitors.

Employment

  • Data Engineer

    2022 - 2022
    SFL Scientific LLC
    • Consulted on an existing SSIS poorly designed data integration project and helped identify bottlenecks and inefficiencies.
    • Redesigned the existing data pipeline using SSIS to be efficient and scalable.
    • Performed SQL tuning and SQL code review for process efficiencies.
    Technologies: SQL, SQL Server Integration Services (SSIS), SSIS, MariaDB, Microsoft SQL Server, Data Transformation, Python, Database Schema Design, iPaaS, CI/CD Pipelines, Relational Databases, Stored Procedure, Data Analyst, Transact-SQL, SQL DML
  • BI and Datawarehouse Expert to Establish Infrastructure

    2021 - 2022
    Airiam Holdings, LLC
    • Designed and developed data pipelines to integrate data from Quickbooks API, Sage Intacct API, and spreadsheets into Azure SQL.
    • Designed and developed a data warehouse in Azure SQL.
    • Designed and created business reports and KPI dashboards using Power BI.
    • Developed complex SQL scripts to manage data transformations and speed up integration.
    Technologies: Business Intelligence (BI), SQL, APIs, SQL Server DBA, Dimensional Modeling, Relational Databases, Microsoft Power BI, Cloud, Git, REST APIs, Synapse, DAX, Dashboard Design, Dashboards, Stored Procedure, Tableau, Data Analyst, Transact-SQL, SQL DML
  • Data Analyst for Migration Project

    2021 - 2021
    JLL - JLLT Data
    • Developed the data pipeline to integrate data from Salesforce to Microsoft SQL.
    • Designed advanced SQL code, e.g., CTE, stored procedures, and functions to manage data transformations.
    • Performed SQL tuning to improve ETL efficiencies and process scalability.
    • Consulted on standard operating procedures and best case scenarios.
    Technologies: SQL, T-SQL, ETL, Salesforce, Data Migration, Relational Databases, Microsoft Power BI, SSRS, Stored Procedure, Data Analyst, Google Sheets, Transact-SQL, SQL DML
  • Director | Data Engineering

    2019 - 2021
    IRI
    • Developed Azure Data Factory pipelines to integrate data from Apache Hive, HDFS, OAuth 2 APIs, and various flat-file types into Azure SQL.
    • Managed a team of onshore and offshore big data developers, assigning tasks and tracking the progress on Jira.
    • Oversaw data strategy and recommendations for new data sources and ongoing projects.
    • Mentored big data engineers to help them develop their skills.
    • Architected new data models and upgraded old data warehouses as per client request or technology change.
    Technologies: Python, Apache Hive, Hadoop, Azure Synapse, Azure Data Factory, Spark SQL, Bash Script, SQL, Azure SQL, Databricks, Data Engineering, ETL, Data Modeling, Databases, Azure, Data, Data Architecture, Business Intelligence (BI), Data Pipelines, Apache Airflow, Data Integration, Big Data, T-SQL, Data Migration, Snowflake, Data Building Tool (DBT), Apache Kafka, ELT, SSIS, Data Transformation, Dimensional Modeling, Relational Databases, Microsoft Power BI, Cloud, Transact-SQL, SQL DML
  • ETL Architect

    2016 - 2019
    IRI
    • Developed SQL-based data warehouses on-premise and on the cloud.
    • Integrated various data sources from flat files to cloud-based data sources like Snowflake, AWS and data lakes into Azure Data Warehouse, and Apache Hive on Hadoop.
    • Created scalable data pipelines and improved efficiencies on the existing ones.
    • Trained and upskilled new data developers and participated in code reviews.
    • Maintained system documentation of all business data components and strategies.
    Technologies: SQL Server Integration Services (SSIS), Azure Synapse, Azure Data Factory, Databricks, PySpark, SQL, Oracle, Apache Hive, Hadoop, Data Warehouse Design, Data Engineering, ETL, Data Modeling, SQL Stored Procedures, Databases, Data, Data Architecture, Business Intelligence (BI), Data Pipelines, Data Integration, Big Data, BigQuery, JavaScript, T-SQL, Data Migration, Snowflake, AWS, AWS EMR, ELT, SSIS, APIs, Data Transformation, MariaDB, SQL Server DBA, Dimensional Modeling, Relational Databases, Microsoft Power BI, Cloud, REST APIs, Transact-SQL, SQL DML
  • SQL Lead Developer

    2012 - 2016
    IRI
    • Developed SQL-based data warehouses and data marts.
    • Wrote SQL queries to provide data for SSRS reports.
    • Used SSIS, Talend, and DataStage for ETL processes depending on the client's requirements.
    • Created custom business reports using SQL Server Reporting Services (SSRS).
    • Managed junior developers and ran stand-up development meetings.
    Technologies: SQL, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), PSQL, MySQL, Talend ETL, IBM InfoSphere (DataStage), Data Warehousing, Data Engineering, ETL, Data Modeling, SQL Stored Procedures, Databases, Data, Data Architecture, Business Intelligence (BI), Data Pipelines, Data Integration, Big Data, T-SQL, Data Migration, ELT, SSIS, Data Transformation, Dimensional Modeling, Relational Databases, Microsoft Power BI, REST APIs, SSAS, SSRS, Dashboard Design, Dashboards, Transact-SQL, SQL DML
  • SQL/ETL Developer and Consultant

    2010 - 2012
    Mi9 Retail (formerly JustEnough Software Corporation)
    • Managed SQL replication between mobile devices and SQL Server.
    • Created SQL data warehouses using the Kimball methodology for reporting purposes.
    • Designed and developed ETL packages using SQL Server Integration Services (SSIS).
    • Designed and developed reports in SQL Server Reporting Services (SSRS).
    • Performed database tuning and code reviews for any code being deployed to production.
    Technologies: SQL, SQL CE, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), Microsoft SQL Server, Data Engineering, ETL, Data Modeling, SQL Stored Procedures, Databases, Data, Data Architecture, Business Intelligence (BI), Data Pipelines, Data Integration, Big Data, T-SQL, Data Migration, SSIS, Data Transformation, Relational Databases, Microsoft Power BI, SSAS, SSRS, Transact-SQL, SQL DML

Experience

  • Data Migration from Azure SQL to Snowflake
    https://github.com/innowarue/ADF

    This project involved migrating data from an Azure SQL database to a Snowflake data warehouse using an Azure Data Factory data pipeline. It took me minutes to create it based on my skill set and proficiency in Data Factory.

    I replaced the authentic data sources with my Azure and Snowflake accounts to make the project publicly available without compromising confidentiality.

  • Data Integration from OAuth2 API

    I created an automated data pipeline to integrate data accessible via an OAuth2-based API in JSON format into a cloud-based data warehouse solution. The solution used Python and Spark on Databricks integrated into an Azure Data Factory pipeline.

  • SQL Server Replication to Mobile Devices

    I created a replication system that synced data between mobile devices and Microsoft SQL Server. Field sales representatives would collect information from the field, upload it to SQL Server using SQL CE and download any updates from SQL Server via the mobile replication I set up.

  • In-place Data Integration for an Acquisition

    I created an in-place ETL integration for a company acquisition and merger, bringing the two companies' data into a single warehouse while continuously delivering weekly reports to the client services and retail service teams.

  • Kafka Streaming and Data Integration

    I created an automated data pipeline to integrate data accessible via a Kafka stream, ingesting it into Spark Streaming using Spark and Python and loading it into a Cloudera Hadoop file system accessible using a Hive data warehouse solution.

Skills

  • Languages

    SQL, Python, Bash Script, T-SQL, Snowflake, Stored Procedure, Transact-SQL, SQL DML, Scala, JavaScript
  • Frameworks

    Hadoop, Spark, AWS EMR
  • Libraries/APIs

    PySpark, REST APIs, Spark Streaming
  • Tools

    Microsoft Power BI, SSRS, BigQuery, Synapse, SSAS, Tableau, Apache Airflow, Git, Google Sheets
  • Paradigms

    ETL, Business Intelligence (BI), Dimensional Modeling
  • Storage

    Apache Hive, SQL Server Integration Services (SSIS), PSQL, Microsoft SQL Server, SQL Stored Procedures, Databases, Data Pipelines, Data Integration, Relational Databases, SQL Server Reporting Services (SSRS), SQL Server DBA, MySQL, Database Replication, PostgreSQL, NoSQL, Azure SQL, MariaDB
  • Other

    Azure Data Factory, Data Warehousing, Data Analysis, Data Engineering, Data, Data Architecture, Big Data, Data Migration, ELT, Data Warehouse Design, SSIS, Data Transformation, Database Schema Design, Data Analyst, Data Modeling, Cloud, APIs, Dashboard Design, Dashboards, Azure Synapse, Web Scraping, Data Building Tool (DBT), AWS, iPaaS, CI/CD Pipelines, DAX
  • Platforms

    Azure, Oracle, Databricks, Apache Kafka, Salesforce

Education

  • Bachelor's Degree in Information Technology
    2013 - 2015
    University of South Africa - Pretoria, South Africa

Certifications

  • Certified Apache Spark and Hadoop Developer
    DECEMBER 2020 - DECEMBER 2022
    Cloudera
  • Analyzing Big Data with Hive
    DECEMBER 2019 - PRESENT
    LinkedIn Learning
  • Advanced NoSQL for Data Science
    DECEMBER 2019 - PRESENT
    LinkedIn Learning

To view more profiles

Join Toptal
Share it with others