Samir Kapoor, Developer in Toronto, ON, Canada
Samir is available for hire
Hire Samir

Samir Kapoor

Verified Expert  in Engineering

Cloud Developer

Location
Toronto, ON, Canada
Toptal Member Since
April 16, 2021

Samir is a senior data engineer with two decades of experience. His most recent experience has been as a senior big data cloud engineer focusing on the Google Cloud Platform in the digital marketing space. Samir has very strong technical skills and thrives in a fast-paced environment. He's a versatile team member with deep technical knowledge of data pipelines, systems, big data environments, cloud platforms, and databases.

Portfolio

Scotiabank
Google BigQuery, Google Cloud Functions, Google Cloud Storage...
Honda
IBM Db2, DB2/400, AIX, SQL Server 2012, PHP, Microsoft SQL Server, Google, CDC...
IBM
IBM Db2, AIX, Linux, Windows, Performance Tuning, Troubleshooting...

Experience

Availability

Part-time

Preferred Environment

Windows, IntelliJ IDEA, Python 3, PySpark, Spark SQL, Google Cloud Platform (GCP), Amazon Web Services (AWS), Hadoop

The most amazing...

...project I have worked on was building data pipelines dealing with terabytes of data migration from on-prem Hadoop to cloud GCP.

Work Experience

Senior Data Engineer

2015 - PRESENT
Scotiabank
  • Built pipelines in both Python and Java from different channels, e.g., Facebook, LinkedIn, and Google platforms such as Campaign Manager, Google Analytics, Search 360, and AdWords, into a centralized data platform for the digital marketing team.
  • Designed ETL processing for movement of data from the raw zone to various zones, e.g., pre-normalized, normalized, and de-normalized zones using Kylo and NiFi and underlying Spark programs.
  • Built ETL process with tools including Informatica, Datastage, and Alteryx to move data from the landing zone to other various zones, e.g., technical standardized zone, enterprise zone, and consumption zone.
  • Leveraged the event layer using pub/sub in GCP to syndicate marketing data from different channels when data is updated in google campaign manager.
  • Built PySpark programs to move data in and out of the hdfs from different sources, and converted legacy code into Python/PySpark for data analysis on legacy data.
  • Built a Java program to push propensity models/scores from GCP into Google analytics platforms via measurement protocol in order to create audiences/segments, which were further pushed to DV360, DoubleClick search, and bid manager.
  • Designed and developed Python programs to move data from consumption zone to data stores such as Cassandra, Db2, and Druid used by Pega ESM and DSM.
  • Developed logical and physical database models using the ER/Studio and Erwin modeling tool and following defined standards and guidelines.
  • Monitored CPU, memory, paging space, and disk I/O and analyzed those using vmstats, iostats, TOPAS, nmon, svmon and other tools.
  • Served as a subject matter expert in the optimizer area. Helped resolve many optimizer and performance-related issues by analyzing query plans and providing corrective actions to take in order to resolve the performance problem.
Technologies: Google BigQuery, Google Cloud Functions, Google Cloud Storage, Google Cloud Spanner, Pub/Sub, Spark, Hadoop, Hortonworks Data Platform (HDP), AIX, Linux, Kubernetes, Amazon S3 (AWS S3), MinIO, HDFS, Apache Hive, SQL, IntelliJ IDEA, Windows, PySpark, Spark SQL, Google Cloud Platform (GCP), Microsoft Power BI, TCP/IP, Apache Kafka, Fluentd, Logging, Bitbucket, Jira, Artifactory, Microsoft SQL Server, Tableau, JDBC, Database Performance

Senior Database Developer

2014 - 2015
Honda
  • Proposed new solutions around existing architecture that included enhancements in current running production environments.
  • Assisted with Db2 on AIX upgrades, including project planning, implementation, validation, and working with user teams.
  • Enhanced existing Db2 monitoring to using in-house in-memory metrics displayed via Google Charts - coded with PHP, Ajax, and JavaScript to display graphs.
  • Installed, upgraded, configured, and maintained Db2 v9.7 databases in an AIX environment.
  • Configured and maintained a Db2 SQL replication environment across AIX systems.
  • Monitored existing propagation environment on a daily basis.
  • Helped impelement database changes on Db2 ZOS environment using SPUFI and monitored via SDSF.
  • Fixed query performance issues by analyzing access plans and performing corrective actions to improver performance. Monitored, troubleshot, and fixed issues related to overall performance, crashes, errors, and cores.
Technologies: IBM Db2, DB2/400, AIX, SQL Server 2012, PHP, Microsoft SQL Server, Google, CDC, Database Replication, High Availability Disaster Recovery (HADR), Monitoring, JavaScript

IBM DB2 LUW Accelerated Value Specialist

2011 - 2014
IBM
  • Delivered a proactive, cost-reducing, and productivity-enhancing advisory service to specific client. Built a foundational understanding of the client overall environment.
  • Helped DBAs resolve issues with DB2 LUW in different environments. Environments include an eCommerce and B2B site along with a 25 to 30TB data warehouse EDW environment.
  • Provided proactive guidance, documentation, services, and recommendations to the team to prevent issues from occurring and to deflect PMRs, whenever possible.
  • Migrated the commerce environments to v95 and the data warehouse environments to 97. Handled other performance-related and tuning tasks, monitoring the site for critical Black Friday shopping.
  • Implemented high availability disaster recovery (HADR) in online transaction processing (OLTP) environments.
  • Implemented data recovery solutions using SRDF failover strategy in a B2B eCommerce site.
  • Created an incremental backup and restore strategy for data warehouse and data mart systems.
  • Provided ongoing database configuration monitoring and tuning with monitoring tools, script enhancements, and implementations for ongoing performance tuning and monitoring.
Technologies: IBM Db2, AIX, Linux, Windows, Performance Tuning, Troubleshooting, High Availability Disaster Recovery (HADR), Database Replication, Backup & Recovery, Business Continuity & Disaster Recovery (BCDR)

IBM DB2 LUW Advanced Support Analyst

2001 - 2011
IBM
  • Provided Level 2 advanced technical support to clients with DB2 LUW Database systems on both DPF and non-DPF environments for all releases up until version 10.1.
  • Advised and guided clients on technical decisions in the use of the Db2 product and identifying and effectively using available resources to resolve questions or problems related to the product.
  • Troubleshot and resolved issues related to installation, configuration, utilization, functionality, updates, compatibility, query performance (optimizer), overall performance across multiple platforms, databases, and network infrastructures.
  • Utilized technical and negotiation skills in collaboration with other support operations/organizations to prioritize and diagnose problems to resolution.
  • Performed problem determination and problem source identification for both defect and usage support for DB2 product, and build testing environments toward creating reproducible scenarios for any reported issues.
  • Collaborated with management, team leads, and other support staff in client-focus initiatives to reduce customer complaints and improve customer satisfaction rates.
  • Served as a subject matter expert in the optimizer area. Helped clients resolve many optimizer/performance-related issues by analyzing query plans and providing corrective actions to take in order to resolve the performance problem.
  • Analyzed and resolved query performance issues for Db2, Oracle, and SQL Server access plans.
Technologies: IBM Db2, AIX, Performance Tuning, Monitoring, Troubleshooting, Database Replication, High Availability Disaster Recovery (HADR), Business Continuity & Disaster Recovery (BCDR), Core, Memory Leaks, Memory Management, SQL Performance, Tuning Advisory, Backup & Recovery, Data Loading, Load Testing

Custom Database Performance Monitoring Tool

Enhanced existing Db2 monitoring to using in-house in-memory metrics displayed via Google Charts - coded with ODBC connection, PHP, and JSON to display graphs. Real-time and ad-hoc monitoring functionalities incorporated with historical data were saved in the backend SQL Server database.

Marketing Data Syndication - Ads Data Hub - Google Cloud Platform

Design, develop and deliver an industry-leading digital marketing intelligence and optimization engine centralized around the amalgamation and activation of organization core first, second, and third party data.

This proprietary platform build is intended to empower and advance digital marketers within the organization, providing the team with real-time access to performance marketing and consumer demand insights. The successful implementation of the Ads Data Hub platform will provide the organization's digital marketing program with vital business intelligence measures, subsequently differentiating our position in the ad ecosystem, leading to a more efficient return on marketing investment.

Anti-money Laundering - Data Pipelines

Created and managed the project design document and end-to-end mapping document. Leveraged Hadoop and Hortonworks cluster to perform large-scale data extraction and ingestion into a data lake. Implemented an extraction project using both DataStage and PySpark programs. Deployed the data in various zones, eg. raw, enterprise, and consumption zone. The data was used further by SAS application for anti-money laundering use cases.

Languages

SQL, Python 3, PHP, HTML, JavaScript, Java 8

Platforms

AIX, Windows, Google Cloud Platform (GCP), Linux, Hortonworks Data Platform (HDP), Amazon Web Services (AWS), Kubernetes, Apache Kafka

Storage

Databases, IBM Db2, SQL Performance, Database Performance, Microsoft SQL Server, Google Cloud Storage, HDFS, Apache Hive, JSON, Database Replication, Google Cloud, PostgreSQL, Database Architecture, Google Cloud Spanner, Amazon S3 (AWS S3), DB2/400, SQL Server 2012

Other

Data Engineering, Google BigQuery, Monitoring, Performance Tuning, Troubleshooting, Backup & Recovery, Programming, Cloud, Google Cloud Functions, Pub/Sub, Google, CDC, High Availability Disaster Recovery (HADR), Business Continuity & Disaster Recovery (BCDR), Core, Memory Leaks, Memory Management, Tuning Advisory, Data Loading, Google Marketing Platform, Ad Campaigns, APIs, Big Data, MinIO, TCP/IP

Frameworks

Hadoop, Spark, OAuth 2

Libraries/APIs

PySpark, ODBC, Google Campaign Manager API, AdWords API, Facebook API, LinkedIn API, Fluent API, JDBC

Tools

IntelliJ IDEA, Spark SQL, Logging, Bitbucket, Jira, Artifactory, Google Cloud Console, IBM InfoSphere (DataStage), Cisco Tidal Enterprise Scheduler, Microsoft Power BI, Fluentd, Tableau

Paradigms

Load Testing

1996 - 2001

Bachelor's Degree in Electrical Engineering

Ryerson University - Toronto, Ontario

NOVEMBER 2008 - PRESENT

Certified Advanced Technical Analyst, DB2

IBM

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring