Samir Kapoor
Verified Expert in Engineering
Cloud Developer
Toronto, ON, Canada
Toptal member since April 16, 2021
Samir is a senior data engineer with two decades of experience. His most recent experience has been as a senior big data cloud engineer focusing on the Google Cloud Platform in the digital marketing space. Samir has very strong technical skills and thrives in a fast-paced environment. He's a versatile team member with deep technical knowledge of data pipelines, systems, big data environments, cloud platforms, and databases.
Portfolio
Experience
- SQL - 20 years
- Google Cloud Platform (GCP) - 7 years
- Hadoop - 6 years
- Spark - 5 years
- Python 3 - 5 years
- PySpark - 5 years
- Google BigQuery - 4 years
- Cloud - 4 years
Availability
Preferred Environment
Windows, IntelliJ IDEA, Python 3, PySpark, Spark SQL, Google Cloud Platform (GCP), Amazon Web Services (AWS), Hadoop
The most amazing...
...project I have worked on was building data pipelines dealing with terabytes of data migration from on-prem Hadoop to cloud GCP.
Work Experience
Senior Data Engineer
Scotiabank
- Built pipelines in both Python and Java from different channels, e.g., Facebook, LinkedIn, and Google platforms such as Campaign Manager, Google Analytics, Search 360, and AdWords, into a centralized data platform for the digital marketing team.
- Designed ETL processing for movement of data from the raw zone to various zones, e.g., pre-normalized, normalized, and de-normalized zones using Kylo and NiFi and underlying Spark programs.
- Built ETL process with tools including Informatica, Datastage, and Alteryx to move data from the landing zone to other various zones, e.g., technical standardized zone, enterprise zone, and consumption zone.
- Leveraged the event layer using pub/sub in GCP to syndicate marketing data from different channels when data is updated in google campaign manager.
- Built PySpark programs to move data in and out of the hdfs from different sources, and converted legacy code into Python/PySpark for data analysis on legacy data.
- Built a Java program to push propensity models/scores from GCP into Google analytics platforms via measurement protocol in order to create audiences/segments, which were further pushed to DV360, DoubleClick search, and bid manager.
- Designed and developed Python programs to move data from consumption zone to data stores such as Cassandra, Db2, and Druid used by Pega ESM and DSM.
- Developed logical and physical database models using the ER/Studio and Erwin modeling tool and following defined standards and guidelines.
- Monitored CPU, memory, paging space, and disk I/O and analyzed those using vmstats, iostats, TOPAS, nmon, svmon and other tools.
- Served as a subject matter expert in the optimizer area. Helped resolve many optimizer and performance-related issues by analyzing query plans and providing corrective actions to take in order to resolve the performance problem.
Senior Database Developer
Honda
- Proposed new solutions around existing architecture that included enhancements in current running production environments.
- Assisted with Db2 on AIX upgrades, including project planning, implementation, validation, and working with user teams.
- Enhanced existing Db2 monitoring to using in-house in-memory metrics displayed via Google Charts - coded with PHP, Ajax, and JavaScript to display graphs.
- Installed, upgraded, configured, and maintained Db2 v9.7 databases in an AIX environment.
- Configured and maintained a Db2 SQL replication environment across AIX systems.
- Monitored existing propagation environment on a daily basis.
- Helped impelement database changes on Db2 ZOS environment using SPUFI and monitored via SDSF.
- Fixed query performance issues by analyzing access plans and performing corrective actions to improver performance. Monitored, troubleshot, and fixed issues related to overall performance, crashes, errors, and cores.
IBM DB2 LUW Accelerated Value Specialist
IBM
- Delivered a proactive, cost-reducing, and productivity-enhancing advisory service to specific client. Built a foundational understanding of the client overall environment.
- Helped DBAs resolve issues with DB2 LUW in different environments. Environments include an eCommerce and B2B site along with a 25 to 30TB data warehouse EDW environment.
- Provided proactive guidance, documentation, services, and recommendations to the team to prevent issues from occurring and to deflect PMRs, whenever possible.
- Migrated the commerce environments to v95 and the data warehouse environments to 97. Handled other performance-related and tuning tasks, monitoring the site for critical Black Friday shopping.
- Implemented high availability disaster recovery (HADR) in online transaction processing (OLTP) environments.
- Implemented data recovery solutions using SRDF failover strategy in a B2B eCommerce site.
- Created an incremental backup and restore strategy for data warehouse and data mart systems.
- Provided ongoing database configuration monitoring and tuning with monitoring tools, script enhancements, and implementations for ongoing performance tuning and monitoring.
IBM DB2 LUW Advanced Support Analyst
IBM
- Provided Level 2 advanced technical support to clients with DB2 LUW Database systems on both DPF and non-DPF environments for all releases up until version 10.1.
- Advised and guided clients on technical decisions in the use of the Db2 product and identifying and effectively using available resources to resolve questions or problems related to the product.
- Troubleshot and resolved issues related to installation, configuration, utilization, functionality, updates, compatibility, query performance (optimizer), overall performance across multiple platforms, databases, and network infrastructures.
- Utilized technical and negotiation skills in collaboration with other support operations/organizations to prioritize and diagnose problems to resolution.
- Performed problem determination and problem source identification for both defect and usage support for DB2 product, and build testing environments toward creating reproducible scenarios for any reported issues.
- Collaborated with management, team leads, and other support staff in client-focus initiatives to reduce customer complaints and improve customer satisfaction rates.
- Served as a subject matter expert in the optimizer area. Helped clients resolve many optimizer/performance-related issues by analyzing query plans and providing corrective actions to take in order to resolve the performance problem.
- Analyzed and resolved query performance issues for Db2, Oracle, and SQL Server access plans.
Experience
Custom Database Performance Monitoring Tool
Marketing Data Syndication - Ads Data Hub - Google Cloud Platform
This proprietary platform build is intended to empower and advance digital marketers within the organization, providing the team with real-time access to performance marketing and consumer demand insights. The successful implementation of the Ads Data Hub platform will provide the organization's digital marketing program with vital business intelligence measures, subsequently differentiating our position in the ad ecosystem, leading to a more efficient return on marketing investment.
Anti-money Laundering - Data Pipelines
Education
Bachelor's Degree in Electrical Engineering
Ryerson University - Toronto, Ontario
Certifications
Certified Advanced Technical Analyst, DB2
IBM
Skills
Libraries/APIs
PySpark, ODBC, Google Campaign Manager API, Google Ads API, Facebook API, LinkedIn API, Fluent API, JDBC
Tools
IntelliJ IDEA, Spark SQL, Logging, Bitbucket, Jira, Artifactory, Google Cloud Console, IBM InfoSphere (DataStage), Cisco Tidal Enterprise Scheduler, Git, Apache Airflow, Microsoft Power BI, Fluentd, Tableau
Languages
SQL, Python 3, PHP, Python, Cypher, HTML, JavaScript, Java 8, Go
Platforms
AIX, Windows, Google Cloud Platform (GCP), Linux, Hortonworks Data Platform (HDP), Databricks, Amazon Web Services (AWS), Kubernetes, Apache Kafka
Storage
Databases, IBM Db2, SQL Performance, Database Performance, Data Pipelines, Data Validation, Microsoft SQL Server, Google Cloud Storage, HDFS, Apache Hive, JSON, Database Replication, Google Cloud, PostgreSQL, Database Architecture, Google Cloud Spanner, Amazon S3 (AWS S3), DB2/400, SQL Server 2012
Frameworks
Hadoop, Spark, OAuth 2, Delta Live Tables (DLT), Flask
Paradigms
Load Testing, ETL
Other
Data Engineering, Google BigQuery, Monitoring, Performance Tuning, Troubleshooting, Backup & Recovery, Data Migration, Programming, Cloud, Google Cloud Functions, Pub/Sub, Google, CDC, High Availability Disaster Recovery (HADR), Business Continuity & Disaster Recovery (BCDR), Core, Memory Leaks, Memory Management, Tuning Advisory, Data Loading, Google Marketing Platform, Ad Campaigns, APIs, Big Data, Distributed Systems, Data Architecture, Google Cloud Key Management Services (KMS), Large-scale Data Migration, Security, MinIO, TCP/IP
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring