Goutham Kumar, Developer in Ajax, ON, Canada
Goutham is available for hire
Hire Goutham

Goutham Kumar

Verified Expert  in Engineering

Bio

With 10+ years of data engineering experience, Goutham has crafted scalable solutions with AWS and Azure. He has developed Azure-based ETL pipelines for Bell Canada with Power BI and implemented an ETL pipeline with Azure Data Factory, Databricks, Snowflake, and Power BI for Walmart. Goutham also used Azure ML for predictive models for the National Bank of Canada, optimized data with Azure SQL and .NET, and created AWS data lake architectures with RDS, S3, Glue, and Tableau at HSBC.

Portfolio

National Bank of Canada
SQL, Python, Microsoft Power BI, Snowflake, ADF, Azure Databricks...
HSBC Bank Canada
Azure Data Factory, Azure Data Lake, Azure SQL, Azure Databricks, PySpark...
Walmart
ETL, Azure Blob Storage, Apache Kafka, Apache Spark Clusters, SQL, ADF, Python...

Experience

  • Microsoft Power BI - 10 years
  • SQL - 10 years
  • Python - 10 years
  • Azure Databricks - 10 years
  • Azure Data Factory - 10 years
  • Data Engineering - 10 years
  • Data Analytics - 10 years
  • Tableau Desktop - 10 years

Availability

Part-time

Preferred Environment

SQL, Azure Data Factory, Azure Databricks, Python, Snowflake, Microsoft Power BI, Azure SQL, Tableau, Data Warehousing, Data Analytics

The most amazing...

...thing I've implemented is use cases for data-driven product improvement and quality using Python, Spark, SQL, Tableau, Power BI, Snowflake, and cloud tech.

Work Experience

Senior Azure Data Engineer

2022 - 2024
National Bank of Canada
  • Collaborated with product owners to define and translate business requirements into technical specifications. Developed solutions using ADF and Databricks, including migrating ETL pipelines to Databricks and dashboards using Power BI.
  • Used Git for version control and Jira to track issues and bugs, resolving bugs related to data and the ETL pipeline on ADF. Gained a basic idea of large language models and Open AI.
  • Built robust data pipelines, optimizing Spark applications and implementing distributed computing systems in the banking sector.
  • Used Power Query, M language, and DAX to create insight-driven dashboards in Power BI.
  • Migrated existing data pipelines in SSIS/Alteryx to ADF pipelines using Databricks. Also built new ones from scratch.
Technologies: SQL, Python, Microsoft Power BI, Snowflake, ADF, Azure Databricks, Azure Data Lake, Azure Synapse, Azure SQL, Azure Storage, PySpark, Azure Blob Storage, Apache Kafka, Apache Spark Clusters, Apache Airflow, Azure Synapse Analytics, Azure Cosmos DB, Data Warehouse Design, Data Engineering, PostgreSQL, Data Pipelines, Large Language Models (LLMs), Agile, MongoDB, Scala, Jira, Jenkins, Docker, NoSQL, Spark, Hadoop, CI/CD Pipelines, Microsoft Azure, Amazon EC2, AWS Glue, HDFS, Apache Hive, Data Modeling, Data Visualization, Fivetran, Data Build Tool (dbt), BigQuery, Google Cloud Platform (GCP), Tableau Desktop, Data Analytics, Tableau Server, Tableau Desktop Pro, Power BI Desktop, Table Calculations, Tableau, BI Reporting, SAP BusinessObjects (BO), Microsoft Fabric, DAX, Power Query, M Language, Azure Event Hubs, OpenAI, Data Migration, Data Science, Data Governance, Data Management, MDM, Big Data, SSAS, SSIS Custom Components, User Experience (UX), User Interface (UI), Databricks, Microsoft SQL Server, Database Architecture, Data Architecture, Database Design, Synapse, ETL Development, Talend ETL, MySQL

Data Engineering Consultant

2019 - 2022
HSBC Bank Canada
  • Saved $10,000 by reconfiguring Azure Blob Storage from hot to cold tier. Diagnosed and fixed failures in ETL pipelines, ensuring accurate data ingestion. Created a data model alongside Power Query, DAX, and M language with RLS in Power BI dashboards.
  • Optimized SQL queries and Spark jobs to reduce processing times, resolving delays in report generation and data processing workflows.
  • Reviewed and refactored code regularly to improve readability and efficiency. Utilized autoscaling in Azure Databricks to dynamically adjust worker nodes based on workload demands, optimizing cost and performance.
  • Built variables and new measures/columns using time intelligence and conditional DAX to achieve business requirements.
Technologies: Azure Data Factory, Azure Data Lake, Azure SQL, Azure Databricks, PySpark, Azure Synapse Analytics, Python, Microsoft Power BI, Apache Spark Clusters, Data Engineering, ETL, Data Warehousing, Query Optimization, Azure Blob Storage, Apache Kafka, Apache Airflow, SQL, Azure Synapse, Azure Storage, Snowflake, Azure Cosmos DB, PostgreSQL, Data Pipelines, Agile, MongoDB, Scala, HBase, Jira, Jenkins, Docker, NoSQL, Spark, Hadoop, CI/CD Pipelines, Microsoft Azure, Amazon EC2, AWS Glue, HDFS, Apache Hive, Data Modeling, Data Visualization, Fivetran, Data Build Tool (dbt), BigQuery, Google Cloud Platform (GCP), Tableau Desktop, Data Analytics, Tableau Server, Tableau Desktop Pro, Power BI Desktop, Table Calculations, Tableau, BI Reporting, SAP BusinessObjects (BO), Microsoft Fabric, Oracle, DAX, Power Query, M Language, Azure Event Hubs, Data Analysis, Amazon Web Services (AWS), Data Migration, Data Science, Data Governance, Data Management, MDM, Big Data, SSAS, SSIS Custom Components, User Experience (UX), User Interface (UI), Databricks, Microsoft SQL Server, Database Architecture, Data Architecture, Database Design, Synapse, ETL Development, Talend ETL, MySQL

Data Platform Engineer

2016 - 2019
Walmart
  • Reduced AWS costs by identifying and eliminating underutilized resources, such as Redshift and Kinesis, implementing cost-saving measures.
  • Resolved issues with AWS Lambda functions and AWS Glue jobs that failed to ingest data from sources such as S3, Kinesis, or external APIs.
  • Used a staging environment to develop and replace further data sources in the release process using user acceptance testing and a production environment.
  • Fixed errors in Amazon Athena SQL queries caused by incorrect syntax or data schema mismatch. Improved query performance by optimizing partitioning and using appropriate data formats.
  • Created amazing and insightful Tableau interactive dashboards for strategic decision-making with underlying data pipelines leveraging AWS Glue alongside Snowflake.
  • Collaborated with product owners to define and translate business requirements into technical specifications. Used Snowflake tables to publish Tableau data sources and consumed those to build interactive dashboards using Parameter/Filter action and LOD.
  • Used Jira to track issues and bugs, resolving bugs related to data and the user story as part of Agile scrums.
  • Created interactivity in the summary of detailed view that led to a high number of uses in a short span of time, and dynamic insights helped stakeholders make useful decisions.
  • Analyzed existing .unx/unv that had data models and relationships as well as BO reports/Xcelsius dashboards. Created corresponding Tableau dashboards using Custom SQL from Oracle and useful KPIs leveraging DATETRUNC, DATEPART, USERNAME, CONTAINS, functions, etc.
  • Used Tableau performance recording for dashboard optimization and custom SQL optimization for smoother data extraction.
Technologies: ETL, Azure Blob Storage, Apache Kafka, Apache Spark Clusters, SQL, ADF, Python, Azure Data Lake, Azure Databricks, Azure Synapse, Azure SQL, Microsoft Power BI, Azure, Azure Storage, PySpark, Azure Synapse Analytics, Snowflake, Azure Cosmos DB, Data Warehouse Design, Apache Airflow, Data Engineering, PostgreSQL, Data Pipelines, Tableau, Big Data, MongoDB, Scala, HBase, Jira, Jenkins, Docker, NoSQL, Spark, Hadoop, CI/CD Pipelines, Microsoft Azure, Amazon EC2, AWS Glue, HDFS, Apache Hive, Data Modeling, Data Visualization, Fivetran, Data Build Tool (dbt), BigQuery, Google Cloud Platform (GCP), Tableau Desktop, Data Analytics, Tableau Server, Tableau Desktop Pro, Power BI Desktop, Table Calculations, BI Reporting, SAP BusinessObjects (BO), Microsoft Fabric, Oracle, DAX, Power Query, M Language, Azure Event Hubs, Data Analysis, Amazon Web Services (AWS), Data Migration, Data Governance, Data Management, MDM, SSAS, SSIS Custom Components, User Experience (UX), User Interface (UI), Databricks, SAP, Microsoft SQL Server, Database Architecture, Data Architecture, Database Design, Synapse, ETL Development, Talend ETL, MySQL

Data Engineer

2014 - 2016
Bell Canada
  • Handled big data platform administration and engineering on multiple Hadoop, Kafka, HBase, and Spark clusters. Containerized nodes using Docker and managed deployment through Kubernetes.
  • Resolved a major challenge in data ingestion into HDFS, Azure Storage, and Azure Data Lake. Created monitoring and alerting systems and worked with data source teams to improve data quality and availability, fixing inconsistent data in Hive tables.
  • Created Hive queries and functions for evaluating, filtering, loading, and storing data. Performed transformations, cleaning, and filtering on imported data using Hive and MapReduce, loading final data into HDFS.
  • Used Power BI with dataflow and dataset to consolidate data sources, and business requirements were achieved in dataflows and Power Query. Some were achieved with M-Language.
Technologies: SQL, Data Pipelines, ETL, Data Warehousing, Query Optimization, Azure Blob Storage, Apache Kafka, Apache Airflow, Apache Spark Clusters, ADF, Azure Databricks, Azure Data Lake, Azure Storage, Azure Synapse, Azure SQL, PySpark, Snowflake, Azure Cosmos DB, Data Engineering, PostgreSQL, Python, Big Data, MongoDB, HBase, Jira, Jenkins, Docker, NoSQL, Spark, Hadoop, CI/CD Pipelines, Microsoft Azure, Amazon EC2, AWS Glue, HDFS, Apache Hive, Data Modeling, Data Visualization, Fivetran, Data Build Tool (dbt), BigQuery, Google Cloud Platform (GCP), Tableau Desktop, Data Analytics, Tableau Server, Tableau Desktop Pro, Power BI Desktop, Table Calculations, Tableau, BI Reporting, SAP BusinessObjects (BO), Microsoft Fabric, Oracle, DAX, Power Query, M Language, Azure Event Hubs, Data Analysis, Amazon Web Services (AWS), Data Migration, Data Governance, Data Management, MDM, SSAS, SSIS Custom Components, User Experience (UX), User Interface (UI), Databricks, SAP, Microsoft SQL Server, Database Architecture, Data Architecture, Synapse, ETL Development, Talend ETL, MySQL

Optimizing Transaction Processing with Streamlined ETL Pipelines

The project aimed to enhance the National Bank of Canada's data processing capabilities, improve data reliability, and optimize cost efficiency by leveraging Microsoft Azure's cloud platform. The primary focus was transforming the existing data pipeline to support real-time analytics, scalable data processing, and efficient resource utilization. Insightful Power BI KPIs and dashboards helped stakeholders in data-driven decision-making.

Autoscaling for Banking Data Processing Workloads

Implemented autoscaling in Azure Databricks to dynamically adjust worker nodes based on workload demand for banking data processing. I optimized cost and performance, ensuring scalable and efficient data handling. A performance analyzer in Power BI helped optimize dashboard performance as well as SAMEPERIODLASTYEAR and USERELATIONSHIP in DAX helped in KPIs and also invoked inactive relationships, respectively.

Data Processing Pipeline for Retail Inventory Management

I addressed and resolved data ingestion issues in AWS Glue and Lambda functions to improve the reliability of retail inventory data pipelines. I implemented solutions to ensure accurate and timely data for Walmart’s inventory management systems.

I used my understanding of legacy SAP BO dashboards and underlying .unx/.unv files to analyze table relationships among fact/dimension tables. I created a Tableau dashboard from scratch using the custom SQL query as DS, where we joined tables to bring in relevant attributes—used Table calculation String/Logical functions alongside date/time functions to create calculated fields, used sets for top N/bottom N functionality, parameters for sheet swapping, etc.

Discussions with stakeholders were done to understand their exact requirements, which were the conversion of BRD to a technical document, the building of calculated fields, various insightful charts like Waterfall charts for yearly growth in connections, network increment, dynamic insights based on filter selection, donut chart for regional percentage contributions, table calculations, sheet swapping, URL/filter actions for the summary to a detailed view of the dashboard, vertical/horizontal container for dashboard structuring, etc.

Big Data Platform Deployment for Telecommunications

I managed and administered multiple big data platforms, including Hadoop, Kafka, HBase, and Spark clusters. I containerized nodes using Docker and orchestrated deployments with Kubernetes, enhancing the scalability and reliability of Bell Canada's data processing infrastructure. A customized calendar table was created to cater to time-based KPIs on the Power BI desktop, which communicates and calculates all the KPIs and further drills down from the summary table. FILTER, CALCULATE, SUMX, etc., are some of the DAX functions that are regularly used in this dashboard.
2010 - 2013

Bachelor's Degree in Computer Science

Rayalaseema University - India

JUNE 2020 - PRESENT

AWS Certified Developer Associate

Udemy

JANUARY 2019 - PRESENT

Microsoft Certified: Power BI Data Analyst Associate

Microsoft

NOVEMBER 2017 - PRESENT

Microsoft Certified: Azure Data Engineer Associate

Microsoft

Libraries/APIs

PySpark

Tools

Microsoft Power BI, Apache Airflow, BigQuery, Tableau Desktop, Power Query, Tableau, SSAS, Synapse, Talend ETL, Jira, Jenkins, Table Calculations, AWS Glue, Tableau Desktop Pro, Power BI Desktop

Languages

SQL, Python, Scala, Snowflake

Paradigms

ETL, Database Design, Agile

Platforms

Azure Synapse, Azure, Azure Synapse Analytics, Google Cloud Platform (GCP), Amazon Web Services (AWS), Databricks, Azure Event Hubs, AWS Lambda, Apache Kafka, Docker, Microsoft Fabric, Oracle, Amazon EC2

Storage

Azure SQL, Azure Storage, NoSQL, Azure Cosmos DB, PostgreSQL, Data Pipelines, Microsoft SQL Server, Database Architecture, MySQL, Amazon S3 (AWS S3), HBase, MongoDB, HDFS, Apache Hive

Frameworks

Spark, Hadoop, ADF

Other

Azure Data Factory, Azure Databricks, Azure Data Lake, Big Data, CI/CD Pipelines, Microsoft Azure, Data Engineering, Data Warehouse Design, Data Modeling, Data Visualization, Fivetran, Data Build Tool (dbt), Data Analytics, DAX, BI Reporting, Data Warehousing, Query Optimization, Data Analysis, Data Migration, Data Governance, Data Management, MDM, SSIS Custom Components, User Experience (UX), User Interface (UI), SAP, Data Architecture, ETL Development, Azure Blob Storage, Apache Spark Clusters, AWS Certified Developer, SAP BusinessObjects (BO), Tableau Server, M Language, Large Language Models (LLMs), OpenAI, Data Science

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring