Sagar is available for hire

Sagar Sharma

Verified Expert in Engineering

Data Engineer and DB Developer

Location

Toronto, ON, Canada

Toptal Member Since

June 18, 2020

Sagar is a seasoned data professional with more than ten years of work experience with relational databases and three years with big data—specializing in designing and scaling data systems and processes. He is a hardworking individual with a constant desire to learn new things and to make a positive impact on the organization. Sagar possesses excellent communication skills and is a motivated team player with the ability to work independently.

Portfolio

PepsiCo

Snowflake, SQL, Python, Amazon Web Services (AWS), Microsoft SQL Server...

Curvo Labs

Apache Airflow, Amazon Web Services (AWS), Python 3, Web Scraping, Scala, Spark...

Colorescience

Salesforce, MySQL, PostgreSQL, Python, Amazon Web Services (AWS)...

Experience

SQL - 10 years ETL Development - 8 years Business Intelligence (BI) - 8 years Python - 7 years Data Build Tool (dbt) - 5 years Hadoop - 5 years Snowflake - 4 years Informatica Cloud - 2 years

Availability

Part-time

Preferred Environment

IntelliJ IDEA, Sublime Text, Linux, MacOS

The most amazing...

...project was building a data lake with Hadoop—this included a Hadoop ecosystem installation from scratch and building data pipelines to move data into the lake.

Work Experience

Data Engineer

2020 - 2023

PepsiCo

Built a Snowflake data warehouse to replace the existing SQL Server-SSAS setup.
Built data pipelines to move existing data from the SQL Server to Snowflake.
Created data models in dbt to process incoming data and process it into the Snowflake data warehouse, replacing existing SSIS packages.
Built data pipelines using Python libraries like Pandas and PySpark, connected them to third-party data providers and then loaded them into Snowflake. Deployed the DAGs to Apache Airflow.
Developed a data quality framework in dbt to monitor incoming data for outages and anomalies based on business rules. Reported the alerts using Monte Carlo.

Technologies: Snowflake, SQL, Python, Amazon Web Services (AWS), Microsoft SQL Server, SQL Server Integration Services (SSIS), Amazon S3 (AWS S3), Apache Airflow, Data Build Tool (dbt), Monte Carlo

Senior Data Engineer

2019 - 2021

Curvo Labs

Implemented Amazon Redshift data warehouse from scratch to serve as a reporting database.
Designed ETL jobs in AWS Data Pipelines to move data from the production database to Redshift.
Built data pipelines using Python for data transformations and web scraping using Beautiful Soup and Selenium WebDriver.
Integrated an Apache Airflow instance from scratch to orchestrate data pipelines and later moved this to AWS MWAA when it became available.
Implemented AWS QuickSight as the primary reporting tool and designed reports later embedded into a web application.
Built data pipelines using Spark and Scala for distributed data processing and transformation and deployed them in AWS Glue.
Implemented a portion of a web application that embedded AWS QuickSight reports. The tech stack used Node.js, React, TypeScript, GraphQL, Ant Design, and Jest for testing. Features included user authentication, data access, and dashboard embedding.

Technologies: Apache Airflow, Amazon Web Services (AWS), Python 3, Web Scraping, Scala, Spark, Spark SQL, Pandas, Amazon Virtual Private Cloud (VPC), TypeScript, GraphQL, Elasticsearch, NestJS, React, Beautiful Soup, Selenium WebDriver, Jest, Data Engineering, ETL, Snowflake, Amazon QuickSight

Senior Data Engineer

2017 - 2020

Colorescience

Designed, developed, and maintained a reporting data warehouse built using PostgreSQL (AWS RDS).
Built data pipelines to move data from production and third-party systems to a centralized data warehouse.
Connected to third-party APIs to import data on an incremental basis, such as Salesforce, Sailthru, and CrowdTwist.
Managed resources on AWS, including Elastic Compute Cloud (EC2), Virtual Private Cloud (VPC) and Networking, and Relational Database Service (RDS).
Implemented and managed Looker Instance, including connecting data sources, building LookML (both YAML and JSON), creating views/explores and models, setting up access privileges, persistent derived table (PDT), and scheduling stakeholder reports.

Technologies: Salesforce, MySQL, PostgreSQL, Python, Amazon Web Services (AWS), Data Pipelines, Looker, Data Engineering, Data Build Tool (dbt), ETL, Business Intelligence (BI)

Senior Data Engineer

2019 - 2019

Curvo Labs

Built a new data pipeline framework orchestrated in Apache Airflow. Also, Airflow was set up with Docker.
Wrote new data pipelines in Python and scheduled them in Airflow.
Performed various on-demand tasks using Apache Spark and Scala and deployed in AWS Glue.
Established Amazon Redshift as a centralized data warehouse and moved the data to Redshift from Amazon S3, production systems, and third-party applications.
Set up the mode to create enterprise reports from the data moved to Redshift.

Technologies: Web Scraping, AWS Glue, Docker, Amazon S3 (AWS S3), Redshift, Apache Airflow, Spark ML, Scala, Apache Spark, Python, Amazon Web Services (AWS), Data Pipelines, Data Engineering, ETL

Senior Business Intelligence Engineer

2016 - 2018

Altus Group Limited

Built a reporting data warehouse using Pentaho, PostgreSQL, and Informatica.
Designed a database schema in PostgreSQL to represent the reporting use case.
Created ETL tasks in Informatica to move data from the production systems into PostgreSQL.
Built reports and dashboards using a Pentaho report designer and deployed them to the BI server.

Technologies: Reporting, Pentaho, Informatica, Microsoft SQL Server, PostgreSQL, Data Pipelines, Data Engineering, ETL, Business Intelligence (BI)

Data Engineer

2014 - 2016

Wave Accounting, Inc.

Developed, designed, and maintained big data and business intelligence solutions at Wave.
Designed and scheduled complex ETL workflows and jobs using Pentaho Data Integration (Kettle) to load data into the data systems.
Wrote custom Python scripts to access third-party APIs and download data into the data systems.
Developed complex SQL queries including JOINS, subqueries, and common table expressions to address ad hoc business analytics and other requirements.
Coordinated with the product and executive teams to gather and understand business requirements.
Built an end-to-end relational data warehouse—including infrastructure, schema design, optimization, and administration.
Developed and designed a Hadoop cluster using Hortonworks HDP 2.0. Tasks included installing and configuring a Hadoop ecosystem and designing the Hadoop Distributed File System (HDFS).
Designed and scheduled Sqoop jobs to load data into the HDFS from the production systems.

Technologies: Pentaho, Ansible, Apache Sqoop, Apache Hive, MySQL, Hadoop, PostgreSQL, Sisense, Microsoft SQL Server, Python, Data Pipelines, Data Engineering, ETL, Business Intelligence (BI)

Business Intelligence Developer

2011 - 2014

Eyereturn Marketing, Inc.

Designed real-time reporting solutions using a SQL Server—SSIS, SSAS, SSRS, and Pentaho business intelligence (BI) tools, including MySQL, Mondrian, and Pentaho.
Created custom automated and scheduled reports using Eclipse BIRT and Pentaho Report Designer.
Built custom ETL tasks to transform data for custom reports using Kettle (Pentaho Data Integration).
Designed and optimized database schemas to make reporting faster and efficient.
Created, maintained, and scheduled custom data processors to pull and manipulate data from HDFS using Pig, Sqoop, and Oozie (Cloudera Hadoop).

Technologies: MySQL, Apache Sqoop, Apache Pig, Hadoop, Apache Hive, Pentaho, SSAS, SQL Server Integration Services (SSIS), Microsoft SQL Server, Business Intelligence (BI)

Database Analyst

2010 - 2011

George Brown College

Handled the database administration in the organization using Blackbaud’s Raiser’s Edge NXT.
Updated and maintained the alumni database using the Microsoft SQL Server.
Conducted data validation and verification to ensure the accuracy and quality of the data.
Performed multiple queries at a complex level for reports and provided information for divisional and marketing purposes.

Technologies: Microsoft SQL Server

Software Engineer

2007 - 2009

Tata Consultancy Services

Provided post-implementation support and training for an enterprise-level banking application (TCS BaNCS) to 25,000+ corporate end-users.
Handled different modules of the banking operations such as everyday banking, loans, and mortgages, capital markets, and foreign exchange.
Analyzed client business needs and translated them into functional and operational requirements.
Communicated successfully with various people, including subject matter experts, to establish a technical vision, business units, development teams, and support teams.

Technologies: Oracle, SQL, HTML, Java

Experience

Data Lake Using Hadoop (Hortonworks HDP 2.0)

I built a data lake using Hadoop.

My Tasks:
• Installed and configured Hadoop ecosystem components on the RackSpace cloud big data platform.
• Automated the above process using Ansible.
• Designed and scheduled Sqoop jobs to import data from MySQL and PostgreSQL production tables into HDFS.
• Set up Hive tables from HDFS files to enable SQL-like querying on HDFS data.

Sisense Rebuild Project

I built a data warehouse for a global animal health company. The goal was to help business users by giving them a single source of truth about their mobile application.

My Tasks:
• Redesigned the data model in Sisense ElastiCube. I used a snowflake schema to establish database relationships.
• Created a build schedule of the above-created data model.
• Created dashboards, based on the client's requirements.
• Set up email schedules for the dashboards to relevant stakeholders.

Redshift | SnowFlake Migration

I helped a client to migrate from Redshift to SnowFlake. Redshift was becoming expensive and had limited resources. We set up Snowflake on S3. S3 was used as storage and Snowflake was used as the computing engine.

Skills

Languages

SQL, Python, Snowflake, CSS, HTML, Scala, T-SQL (Transact-SQL), JavaScript, Java, SAS, Python 3, TypeScript, GraphQL

Tools

Looker, Pentaho Data Integration (Kettle), Informatica ETL, Pentaho Mondrian OLAP Engine, Sisense, Apache Sqoop, AWS Glue, Apache Airflow, Amazon Elastic MapReduce (EMR), Sublime Text, IntelliJ IDEA, Ansible, SSAS, Spark SQL, Amazon Virtual Private Cloud (VPC), Oozie, Amazon QuickSight

Paradigms

ETL Implementation & Design, ETL, RESTful Development, MapReduce, Business Intelligence (BI)

Storage

SQL Server 2008, Databases, SQL Server 2012, SQL Server 2014, SQL Server Integration Services (SSIS), HDFS, SQL Server Analysis Services (SSAS), Apache Hive, PostgreSQL, MySQL, Redshift, Data Pipelines, Microsoft SQL Server, Amazon S3 (AWS S3), MongoDB, Elasticsearch, SAS SQL

Other

ETL Tools, ETL Development, Pentaho Reports, Data Build Tool (dbt), Data Warehouse Design, Semantic UI, Data Analysis, Pentaho Dashboard, Informatica, Data Engineering, Data Warehousing, Informatica Cloud, Reporting, Web Scraping, Mechanical Engineering, Marketing Strategy, Monte Carlo

Frameworks

Express.js, Bootstrap 3+, Materialize, Foundation CSS, Flask, Hadoop, Spark, Apache Spark, NestJS, Jest

Libraries/APIs

REST APIs, NumPy, Pandas, DreamFactory, React, Passport.js, Node.js, PySpark, Spark ML, Beautiful Soup, Selenium WebDriver

Platforms

Amazon EC2, Windows, MacOS, Linux, Apache Pig, Apache Kafka, Amazon Web Services (AWS), Airbyte, AWS Lambda, Pentaho, Oracle, Salesforce, Docker

Education

2011 - 2011

Certificate in SAS Certified Base Programmer

SAS Institute Canada - Toronto, Canada

2010 - 2010

Postgraduate Certificate in Strategic Relationship Marketing

George Brown College - Toronto, Canada

2003 - 2007

Bachelor's Degree in Mechanical Engineering

YMCA University of Science and Technology - Faridabad, India

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring