Vikram Goyal
Verified Expert in Engineering
Data Engineer and Developer
Vaughan, ON, Canada
Toptal member since October 5, 2020
Vikram specializes in leveraging cloud technologies, especially GCP and AWS, to solve business problems. He is an expert in implementing solutions such as data lakes, enterprise data marts, and application data masking. As an accomplished and resourceful software professional with over 19 years of experience, Vikram believes it's crucial to understand and analyze all aspects of a problem before choosing a technology or approach to solve it.
Portfolio
Experience
- Excel VBA - 10 years
- Big Data - 7 years
- SQL Server 2014 - 6 years
- SQL - 6 years
- Microsoft Parallel Data Warehouse (PDW) - 5 years
- Apache Hive - 5 years
- PySpark - 1 year
- Spark SQL - 1 year
Availability
Preferred Environment
Excel VBA, Python, Apache Hive, Visual Studio, SQL Server 2014, PySpark, Google Cloud Platform (GCP), Google BigQuery, Google Cloud Functions, Databricks
The most amazing...
...thing about my experience is that I implement varied data solutions such as data lakes, enterprise data marts, and application data masking.
Work Experience
Senior Cloud Data Engineer
Economical Insurance
- Designed and implemented a file ingestion framework to ingest data to Google BigQuery using BigQuery Python APIs and Airflow.
- Created solutions such as loading historic data from on-prem Hive to GCP BigQuery using Scala-Spark, Databricks, and BigQuery and loading SAS data from on-prem to GCP BigQuery using PySpark, Databricks, and BigQuery.
- Migrated diverse sources to GCP BigQuery, ensuring data consistency and accuracy.
- Architected and implemented a solution to create GCP REST APIs using Cloud Run, Apigee, and Python to provide access to underlying BigQuery data to be used by various teams.
- Implemented FinOps reports using Tableau to track the cost of using cloud split by different parameters such as cloud service, team, environment, etc.
- Analyzed and optimized BigQuery queries to reduce the cost of data storage and data retrieval using various approaches, such as re-partitioning tables and rewriting the queries to use optimal joins and where clauses.
- Used Azure Graph APIs to create reusable functions to report a list of AD groups, AD group-to-user mapping, etc.
- Created a solution to implement customer data deletion/obfuscation to meet the business guidelines using Java, Data Catalog, and BigQuery.
Senior Cloud Data Engineer
BMO Bank of Montreal
- Created the solution design blueprint to migrate 145 applications from on-prem (Cloudera) to AWS data lake on S3 and further to load data marts using Scala-Spark on Hive/Redshift.
- Built a data ingestion framework using Spark-Scala to load data from S3 to Redshift.
- Developed multiple solutions on Athena, such as data encryption/decryption and access control using Lake Formation, etc.
- Migrated an on-prem legacy system (Cloudera) with jobs in Pentaho and Hive/Oozie to AWS using technologies such as Redshift, Spark-Scala, Airflow, etc.
- Implemented a solution to convert single-segmented/multi-segmented EBCDIC files (coming from mainframes) to ASCII using Scala-Spark.
- Optimized SQL code to implement SCD Type-1 and Type-2 loads to Redshift.
Data Engineer
Manulife
- Implemented a PySpark framework to ingest data from files (delimited, fixed width, and Excel) into Apache Hive tables. As a result, the ingestion process was simplified and resulted in an effort saving of more than 50%.
- Facilitated the calculation of assets under management across various dimensions after complex data transformations and calculations using data curation scripts created in HQL, Oozie, and shell scripts.
- Built code templates using VBA macros to create metadata files for data ingestion and data curation into SCD1 and SCD2 tables. This helped to reduce code errors and development time by around 30%.
Technology Architect
Infosys
- Created a data ingestion framework for loading varied data such as multi-structured VSAM, XML, JSON, zip, and fixed-width files; Microsoft SQL Server; Oracle; etc., to a data lake on HDFS using Apache Hive, Apache Sqoop, SSIS, and Python.
- Led a team of four professionals to create two complex data marts using T-SQL on Microsoft PDW, implementing load strategies such as SCD1, SCD2, and fact upsert.
- Wrote common data warehouse load strategies to help reduce the development time by nearly 30%.
- Created reusable components in PySpark for implementing data load strategies such as SCD1, SCD2, and fact upsert. This led to a development effort saving of 30%.
- Implemented a solution to ingest data of a complex XML (both in terms of structure and data volume) to a data lake using Apache Hive. This solution resulted in a cost savings of $300,000 for the client.
- Created two frameworks: one using Windows PowerShell to send data extracts from views created on mart tables to external systems and another using Microsoft SQL Server to automatically generate and update stats on all tables for a given database.
- Automated the complete data masking process, getting data from the source until saving off the masked data, by building a new framework using shell scripting and Oracle. This helped reduce processing time for creating masked copies by nearly 50%.
- Created data comparison tools using Excel macros to compare source and masked data copies to ensure the integrity and completeness of masked data, which helped save around 70% of the validation effort.
Experience
Data Curation Framework
Hadoop Data Ingestion Framework
Customer Data Deletion/Obfuscation Application
Education
Bachelor of Engineering Degree in Electronics and Electrical Communication
Punjab Engineering College - Chandigarh, India
Certifications
AWS Certified Cloud Practitioner
Amazon Web Services
Architecting Microsoft Azure Solutions
Microsoft
Administering Microsoft SQL Server 2012/2014 Databases
Microsoft
Implementing a Data Warehouse with Microsoft SQL Server 2012/2014
Microsoft
Querying Microsoft SQL Server 2012/2014
Microsoft
Skills
Libraries/APIs
PySpark
Tools
Microsoft Excel, Excel 2013, Oozie, Spark SQL, Apache Sqoop, Amazon Athena, Amazon Redshift Spectrum, Tableau, Google Cloud Composer, Apache Airflow
Languages
Excel VBA, T-SQL (Transact-SQL), SQL, Python, Scala, Java
Paradigms
ETL
Storage
SQL Server 2014, Apache Hive, Microsoft Parallel Data Warehouse (PDW), Microsoft SQL Server, Data Lakes, SQL Server 2012, Databases, MySQL, Google Cloud SQL, Google Cloud, SQL Server Integration Services (SSIS)
Platforms
Google Cloud Platform (GCP), Databricks, AWS Lambda, AWS IoT, Pentaho, Amazon Web Services (AWS)
Frameworks
Windows PowerShell
Other
Slowly Changing Dimensions (SCD), Data Engineering, Data Warehousing, Excel Macros, Data Warehouse Design, Google BigQuery, Data Migration, Big Data, Google Cloud Functions, GSM, Shell Scripting, Amazon Redshift, Data Masking
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring