Samir Kapoor, Google BigQuery Developer in Toronto, ON, Canada
Samir Kapoor

Google BigQuery Developer in Toronto, ON, Canada

Member since April 16, 2021
Samir is a senior data engineer with two decades of experience. His most recent experience has been as a senior big data cloud engineer focusing on the Google Cloud Platform in the digital marketing space. Samir has very strong technical skills and thrives in a fast-paced environment. He's a versatile team member with deep technical knowledge of data pipelines, systems, big data environments, cloud platforms, and databases.
Samir is now available for hire


  • Scotiabank
    Google BigQuery, Google Cloud Functions, Google Cloud Storage...
  • Honda
    IBM Db2, DB2/400, AIX, SQL Server 2012, PHP, Microsoft SQL Server, Google...
  • IBM
    IBM Db2, AIX, Linux, Windows, Performance Tuning, Troubleshooting...



Toronto, ON, Canada



Preferred Environment

Windows, IntelliJ, Python 3, PySpark, Spark SQL, Google Cloud Platform (GCP), AWS, Hadoop

The most amazing...

...project I have worked on was building data pipelines dealing with terabytes of data migration from on-prem Hadoop to cloud GCP.


  • Senior Data Engineer

    2015 - PRESENT
    • Built pipelines in both Python and Java from different channels, e.g., Facebook, LinkedIn, and Google platforms such as Campaign Manager, Google Analytics, Search 360, and AdWords, into a centralized data platform for the digital marketing team.
    • Designed ETL processing for movement of data from the raw zone to various zones, e.g., pre-normalized, normalized, and de-normalized zones using Kylo and NiFi and underlying Spark programs.
    • Built ETL process with tools including Informatica, Datastage, and Alteryx to move data from the landing zone to other various zones, e.g., technical standardized zone, enterprise zone, and consumption zone.
    • Leveraged the event layer using pub/sub in GCP to syndicate marketing data from different channels when data is updated in google campaign manager.
    • Built PySpark programs to move data in and out of the hdfs from different sources, and converted legacy code into Python/PySpark for data analysis on legacy data.
    • Built a Java program to push propensity models/scores from GCP into Google analytics platforms via measurement protocol in order to create audiences/segments, which were further pushed to DV360, DoubleClick search, and bid manager.
    • Designed and developed Python programs to move data from consumption zone to data stores such as Cassandra, Db2, and Druid used by Pega ESM and DSM.
    • Developed logical and physical database models using the ER/Studio and Erwin modeling tool and following defined standards and guidelines.
    • Monitored CPU, memory, paging space, and disk I/O and analyzed those using vmstats, iostats, TOPAS, nmon, svmon and other tools.
    • Served as a subject matter expert in the optimizer area. Helped resolve many optimizer and performance-related issues by analyzing query plans and providing corrective actions to take in order to resolve the performance problem.
    Technologies: Google BigQuery, Google Cloud Functions, Google Cloud Storage, Google Cloud Spanner, Pub/Sub, Spark, Hadoop, Hortonworks Data Platform (HDP), AIX, Linux, Kubernetes, Amazon S3 (AWS S3), MinIO, HDFS, Apache Hive, SQL, IntelliJ, Windows, PySpark, Spark SQL, Google Cloud Platform (GCP), Microsoft Power BI, TCP/IP, Apache Kafka, Fluentd, Logging, Bitbucket, Jira, Artifactory, Microsoft SQL Server, Tableau, JDBC
  • Senior Database Developer

    2014 - 2015
    • Proposed new solutions around existing architecture that included enhancements in current running production environments.
    • Assisted with Db2 on AIX upgrades, including project planning, implementation, validation, and working with user teams.
    • Enhanced existing Db2 monitoring to using in-house in-memory metrics displayed via Google Charts - coded with PHP, Ajax, and JavaScript to display graphs.
    • Installed, upgraded, configured, and maintained Db2 v9.7 databases in an AIX environment.
    • Configured and maintained a Db2 SQL replication environment across AIX systems.
    • Monitored existing propagation environment on a daily basis.
    • Helped impelement database changes on Db2 ZOS environment using SPUFI and monitored via SDSF.
    • Fixed query performance issues by analyzing access plans and performing corrective actions to improver performance. Monitored, troubleshot, and fixed issues related to overall performance, crashes, errors, and cores.
    Technologies: IBM Db2, DB2/400, AIX, SQL Server 2012, PHP, Microsoft SQL Server, Google, CDC, Database Replication, High Availability Disaster Recovery (HADR), Monitoring, JavaScript
  • IBM DB2 LUW Accelerated Value Specialist

    2011 - 2014
    • Delivered a proactive, cost-reducing, and productivity-enhancing advisory service to specific client. Built a foundational understanding of the client overall environment.
    • Helped DBAs resolve issues with DB2 LUW in different environments. Environments include an eCommerce and B2B site along with a 25 to 30TB data warehouse EDW environment.
    • Provided proactive guidance, documentation, services, and recommendations to the team to prevent issues from occurring and to deflect PMRs, whenever possible.
    • Migrated the commerce environments to v95 and the data warehouse environments to 97. Handled other performance-related and tuning tasks, monitoring the site for critical Black Friday shopping.
    • Implemented high availability disaster recovery (HADR) in online transaction processing (OLTP) environments.
    • Implemented data recovery solutions using SRDF failover strategy in a B2B eCommerce site.
    • Created an incremental backup and restore strategy for data warehouse and data mart systems.
    • Provided ongoing database configuration monitoring and tuning with monitoring tools, script enhancements, and implementations for ongoing performance tuning and monitoring.
    Technologies: IBM Db2, AIX, Linux, Windows, Performance Tuning, Troubleshooting, High Availability Disaster Recovery (HADR), Database Replication, Backup & Recovery, Business Continuity & Disaster Recovery (BCDR)
  • IBM DB2 LUW Advanced Support Analyst

    2001 - 2011
    • Provided Level 2 advanced technical support to clients with DB2 LUW Database systems on both DPF and non-DPF environments for all releases up until version 10.1.
    • Advised and guided clients on technical decisions in the use of the Db2 product and identifying and effectively using available resources to resolve questions or problems related to the product.
    • Troubleshot and resolved issues related to installation, configuration, utilization, functionality, updates, compatibility, query performance (optimizer), overall performance across multiple platforms, databases, and network infrastructures.
    • Utilized technical and negotiation skills in collaboration with other support operations/organizations to prioritize and diagnose problems to resolution.
    • Performed problem determination and problem source identification for both defect and usage support for DB2 product, and build testing environments toward creating reproducible scenarios for any reported issues.
    • Collaborated with management, team leads, and other support staff in client-focus initiatives to reduce customer complaints and improve customer satisfaction rates.
    • Served as a subject matter expert in the optimizer area. Helped clients resolve many optimizer/performance-related issues by analyzing query plans and providing corrective actions to take in order to resolve the performance problem.
    • Analyzed and resolved query performance issues for Db2, Oracle, and SQL Server access plans.
    Technologies: IBM Db2, AIX, Performance Tuning, Monitoring, Troubleshooting, Database Replication, High Availability Disaster Recovery (HADR), Business Continuity & Disaster Recovery (BCDR), Core, Memory Leaks, Memory Management, SQL Performance, Tuning Advisory, Backup & Recovery, Data Loading, Load Testing


  • Custom Database Performance Monitoring Tool

    Enhanced existing Db2 monitoring to using in-house in-memory metrics displayed via Google Charts - coded with ODBC connection, PHP, and JSON to display graphs. Real-time and ad-hoc monitoring functionalities incorporated with historical data were saved in the backend SQL Server database.

  • Marketing Data Syndication - Ads Data Hub - Google Cloud Platform

    Design, develop and deliver an industry-leading digital marketing intelligence and optimization engine centralized around the amalgamation and activation of organization core first, second, and third party data.

    This proprietary platform build is intended to empower and advance digital marketers within the organization, providing the team with real-time access to performance marketing and consumer demand insights. The successful implementation of the Ads Data Hub platform will provide the organization's digital marketing program with vital business intelligence measures, subsequently differentiating our position in the ad ecosystem, leading to a more efficient return on marketing investment.

  • Anti-money Laundering - Data Pipelines

    Created and managed the project design document and end-to-end mapping document. Leveraged Hadoop and Hortonworks cluster to perform large-scale data extraction and ingestion into a data lake. Implemented an extraction project using both DataStage and PySpark programs. Deployed the data in various zones, eg. raw, enterprise, and consumption zone. The data was used further by SAS application for anti-money laundering use cases.


  • Languages

    SQL, Python 3, PHP, HTML, JavaScript, Java 8
  • Platforms

    AIX, Windows, Google Cloud Platform (GCP), Linux, Hortonworks Data Platform (HDP), Kubernetes, Apache Kafka
  • Storage

    Databases, IBM Db2, SQL Performance, Microsoft SQL Server, Google Cloud Storage, HDFS, Apache Hive, JSON, Database Replication, Google Cloud, Google Cloud Spanner, Amazon S3 (AWS S3), DB2/400, SQL Server 2012
  • Other

    Data Engineering, Google BigQuery, Monitoring, Performance Tuning, Troubleshooting, Backup & Recovery, Programming, Cloud, Google Cloud Functions, Pub/Sub, Google, CDC, High Availability Disaster Recovery (HADR), Business Continuity & Disaster Recovery (BCDR), Core, Memory Leaks, Memory Management, Tuning Advisory, Data Loading, Google Marketing Platform, Ad Campaigns, APIs, Big Data, AWS, MinIO, TCP/IP
  • Frameworks

    Hadoop, Spark, OAuth 2
  • Libraries/APIs

    PySpark, ODBC, Google Campaign Manager API, AdWords API, Facebook API, LinkedIn API, Fluent API, JDBC
  • Tools

    IntelliJ, Spark SQL, Logging, Bitbucket, Jira, Artifactory, Google Cloud Console, IBM InfoSphere (DataStage), Cisco Tidal Enterprise Scheduler, Microsoft Power BI, Fluentd, Tableau
  • Paradigms

    Load Testing


  • Bachelor's Degree in Electrical Engineering
    1996 - 2001
    Ryerson University - Toronto, Ontario


  • Certified Advanced Technical Analyst, DB2

To view more profiles

Join Toptal
Share it with others