
Sagar Sharma
Verified Expert in Engineering
Data Engineer and DB Developer
Toronto, ON, Canada
Toptal member since June 18, 2020
Sagar is a seasoned data professional with more than ten years of work experience with relational databases and three years with big data—specializing in designing and scaling data systems and processes. He is a hardworking individual with a constant desire to learn new things and to make a positive impact on the organization. Sagar possesses excellent communication skills and is a motivated team player with the ability to work independently.
Portfolio
Experience
- SQL - 10 years
- Business Intelligence (BI) - 8 years
- ETL Development - 8 years
- Python - 7 years
- Data Pipelines - 6 years
- Data Build Tool (dbt) - 5 years
- Looker - 5 years
- Snowflake - 4 years
Availability
Preferred Environment
IntelliJ IDEA, Sublime Text, Linux, MacOS
The most amazing...
...project was building a data lake with Hadoop—this included a Hadoop ecosystem installation from scratch and building data pipelines to move data into the lake.
Work Experience
Data Engineer
PepsiCo
- Built a Snowflake data warehouse to replace the existing SQL Server-SSAS setup.
- Built data pipelines to move existing data from the SQL Server to Snowflake.
- Created data models in dbt to process incoming data and process it into the Snowflake data warehouse, replacing existing SSIS packages.
- Built data pipelines using Python libraries like Pandas and PySpark, connected them to third-party data providers and then loaded them into Snowflake. Deployed the DAGs to Apache Airflow.
- Developed a data quality framework in dbt to monitor incoming data for outages and anomalies based on business rules. Reported the alerts using Monte Carlo.
Senior Data Engineer
Curvo Labs
- Implemented Amazon Redshift data warehouse from scratch to serve as a reporting database.
- Designed ETL jobs in AWS Data Pipelines to move data from the production database to Redshift.
- Built data pipelines using Python for data transformations and web scraping using Beautiful Soup and Selenium WebDriver.
- Integrated an Apache Airflow instance from scratch to orchestrate data pipelines and later moved this to Amazon MWAA when it became available.
- Implemented Amazon QuickSight as the primary reporting tool and designed reports later embedded into a web application.
- Built data pipelines using Spark and Scala for distributed data processing and transformation and deployed them in AWS Glue.
- Implemented a portion of a web application that embedded Amazon QuickSight reports. The tech stack used Node.js, React, TypeScript, GraphQL, Ant Design, and Jest for testing. Features included user authentication, data access, and dashboard embedding.
Enterprise Data Analyst
Quant Collective LLC
- Coordinated with stakeholders to gather business requirements and converted them to technical specifications.
- Designed the data warehouse as per the business requirements.
- Developed data pipelines using Airflow and dbt to build a BigQuery data warehouse.
Senior Data Engineer
Colorescience
- Designed, developed, and maintained a reporting data warehouse built using PostgreSQL (AWS RDS).
- Built data pipelines to move data from production and third-party systems to a centralized data warehouse.
- Connected to third-party APIs to import data on an incremental basis, such as Salesforce, Sailthru, and CrowdTwist.
- Managed resources on AWS, including Elastic Compute Cloud (EC2), Virtual Private Cloud (VPC) and Networking, and Relational Database Service (RDS).
- Implemented and managed Looker Instance, including connecting data sources, building LookML (both YAML and JSON), creating views/explores and models, setting up access privileges, persistent derived table (PDT), and scheduling stakeholder reports.
Senior Data Engineer
Curvo Labs
- Built a new data pipeline framework orchestrated in Apache Airflow. Also, Airflow was set up with Docker.
- Wrote new data pipelines in Python and scheduled them in Airflow.
- Performed various on-demand tasks using Apache Spark and Scala and deployed in AWS Glue.
- Established Amazon Redshift as a centralized data warehouse and moved the data to Redshift from Amazon S3, production systems, and third-party applications.
- Set up the mode to create enterprise reports from the data moved to Redshift.
Senior Business Intelligence Engineer
Altus Group Limited
- Built a reporting data warehouse using Pentaho, PostgreSQL, and Informatica.
- Designed a database schema in PostgreSQL to represent the reporting use case.
- Created ETL tasks in Informatica to move data from the production systems into PostgreSQL.
- Built reports and dashboards using a Pentaho report designer and deployed them to the BI server.
Data Engineer
Wave Accounting, Inc.
- Developed, designed, and maintained big data and business intelligence solutions at Wave.
- Designed and scheduled complex ETL workflows and jobs using Pentaho Data Integration (Kettle) to load data into the data systems.
- Wrote custom Python scripts to access third-party APIs and download data into the data systems.
- Developed complex SQL queries including JOINS, subqueries, and common table expressions to address ad hoc business analytics and other requirements.
- Coordinated with the product and executive teams to gather and understand business requirements.
- Built an end-to-end relational data warehouse—including infrastructure, schema design, optimization, and administration.
- Developed and designed a Hadoop cluster using Hortonworks HDP 2.0. Tasks included installing and configuring a Hadoop ecosystem and designing the Hadoop Distributed File System (HDFS).
- Designed and scheduled Sqoop jobs to load data into the HDFS from the production systems.
Business Intelligence Developer
Eyereturn Marketing, Inc.
- Designed real-time reporting solutions using a SQL Server—SSIS, SSAS, SSRS, and Pentaho business intelligence (BI) tools, including MySQL, Mondrian, and Pentaho.
- Created custom automated and scheduled reports using Eclipse BIRT and Pentaho Report Designer.
- Built custom ETL tasks to transform data for custom reports using Kettle (Pentaho Data Integration).
- Designed and optimized database schemas to make reporting faster and efficient.
- Created, maintained, and scheduled custom data processors to pull and manipulate data from HDFS using Pig, Sqoop, and Oozie (Cloudera Hadoop).
Database Analyst
George Brown College
- Handled the database administration in the organization using Blackbaud’s Raiser’s Edge NXT.
- Updated and maintained the alumni database using the Microsoft SQL Server.
- Conducted data validation and verification to ensure the accuracy and quality of the data.
- Performed multiple queries at a complex level for reports and provided information for divisional and marketing purposes.
Software Engineer
Tata Consultancy Services
- Provided post-implementation support and training for an enterprise-level banking application (TCS BaNCS) to 25,000+ corporate end-users.
- Handled different modules of the banking operations such as everyday banking, loans, and mortgages, capital markets, and foreign exchange.
- Analyzed client business needs and translated them into functional and operational requirements.
- Communicated successfully with various people, including subject matter experts, to establish a technical vision, business units, development teams, and support teams.
Experience
Data Lake Using Hadoop (Hortonworks HDP 2.0)
My Tasks:
• Installed and configured Hadoop ecosystem components on the RackSpace cloud big data platform.
• Automated the above process using Ansible.
• Designed and scheduled Sqoop jobs to import data from MySQL and PostgreSQL production tables into HDFS.
• Set up Hive tables from HDFS files to enable SQL-like querying on HDFS data.
Sisense Rebuild Project
My Tasks:
• Redesigned the data model in Sisense ElastiCube. I used a snowflake schema to establish database relationships.
• Created a build schedule of the above-created data model.
• Created dashboards, based on the client's requirements.
• Set up email schedules for the dashboards to relevant stakeholders.
Redshift | SnowFlake Migration
Education
Certificate in SAS Certified Base Programmer
SAS Institute Canada - Toronto, Canada
Postgraduate Certificate in Strategic Relationship Marketing
George Brown College - Toronto, Canada
Bachelor's Degree in Mechanical Engineering
YMCA University of Science and Technology - Faridabad, India
Skills
Libraries/APIs
REST APIs, NumPy, Pandas, DreamFactory, React, Passport.js, Node.js, PySpark, Beautiful Soup, Spark ML, Selenium WebDriver
Tools
Looker, Pentaho Data Integration (Kettle), Informatica ETL, Pentaho Mondrian OLAP Engine, Sisense, Apache Sqoop, AWS Glue, Apache Airflow, Spark SQL, Amazon Elastic MapReduce (EMR), Sublime Text, IntelliJ IDEA, Ansible, SSAS, Amazon Virtual Private Cloud (VPC), Amazon QuickSight
Languages
SQL, Python, Snowflake, CSS, HTML, Scala, T-SQL (Transact-SQL), Python 3, JavaScript, Java, SAS, TypeScript, GraphQL
Paradigms
ETL Implementation & Design, ETL, RESTful Development, MapReduce, Business Intelligence (BI)
Storage
SQL Server 2008, Databases, SQL Server 2012, SQL Server 2014, SQL Server Integration Services (SSIS), HDFS, SQL Server Analysis Services (SSAS), Apache Hive, PostgreSQL, MySQL, Redshift, Data Pipelines, Microsoft SQL Server, Amazon S3 (AWS S3), MongoDB, Elasticsearch
Frameworks
Express.js, Bootstrap 3+, Materialize, Foundation CSS, Flask, Hadoop, Spark, Apache Spark, NestJS, Jest
Platforms
Amazon EC2, Windows, MacOS, Linux, Apache Pig, Apache Kafka, Amazon Web Services (AWS), Airbyte, AWS Lambda, Pentaho, Oracle, Salesforce, Docker
Other
ETL Tools, ETL Development, Pentaho Reports, Data Build Tool (dbt), Data Warehouse Design, Semantic UI, Data Analysis, Pentaho Dashboard, Informatica, Data Engineering, Data Warehousing, Mechanical Engineering, Informatica Cloud, Reporting, Web Scraping, Marketing Strategy, Monte Carlo, Data Analytics, Google BigQuery
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring