Priyanshu is available for hire

Priyanshu Hasija

Verified Expert in Engineering

ETL and Big Data Developer

Location

Gurugram, India

Toptal Member Since

June 18, 2020

Priyanshu is an AWS-certified solutions architect with 10 years of experience in delivering strategic and data-oriented solutions. With expertise in a wide array of data technologies, including SQL, NoSQL, cloud databases, and data warehousing, he has developed and executed data strategies that have improved the efficiency, accuracy, and reliability of critical business processes. His proficiency in data modeling, ETL development, and data visualization has proved fruitful for the clients

Scala AWS Lambda Data Engineering Big Data Spark Data Pipelines Hadoop Git Node.js Python Bitbucket Amazon S3 (AWS S3)HDFS Redshift PostgreSQL MapReduce

Portfolio

Accent Technologies Inc.

Python, PySpark, Apache Spark, Data Analysis, SQL, Apache Cassandra...

PVH

Spark, Amazon Web Services (AWS), Apache Airflow, GitLab, Data Modeling...

KLM Royal Dutch Airlines

Spark, Amazon Web Services (AWS), Python, Scala, Data Pipelines

Experience

Data Pipelines - 4 years Data Engineering - 4 years Spark - 4 years Big Data - 4 years ETL Implementation & Design - 3 years Python - 2 years Scala - 2 years Redshift - 2 years

Availability

Part-time

Preferred Environment

Linux, Unix

The most amazing...

...thing i have developed was a real-time tracking pipeline for a logistics company which can monitor end-to-end movement of the shipment.

Work Experience

Data Engineer

2023 - 2023

Accent Technologies Inc.

Developed architecture to ingest streaming data into AWS S3.
Created ETL pipelines to transform AWS S3 data and load it into Elasticsearch and Cassandra.
Optimized Spark jobs, which brought down the processing time of data from 5 hours to 30 minutes.

Technologies: Python, PySpark, Apache Spark, Data Analysis, SQL, Apache Cassandra, Apache Kafka, Elasticsearch

Data Architect

2022 - 2023

PVH

Built the exploratory model to understand the potential of available data within PVH, CRM, and e-Commerce and enabled recommendations for customer segmentation, customer lifetime value, churn prediction, and product design.
Identified patterns in the data to discover new potential business use cases.
Discovered yet unrecognised patterns in the data to improve the existing business use cases.
Defined and enforced data standards and policies within the organization. This included ensuring data quality, security, privacy, and compliance with regulatory requirements.
Created conceptual, logical, and physical data models to represent the data needs of the PVH data analytics team. These models served as the basis for the development of data systems and applications.

Technologies: Spark, Amazon Web Services (AWS), Apache Airflow, GitLab, Data Modeling, Big Data, Solutioning

Senior Data Engineer

2021 - 2022

KLM Royal Dutch Airlines

Recommended infrastructure changes to improve storage capacity or performance, which eventually reduced the infrastructure cost.
Performed automation of code deployment by creating CI/CD pipelines.
Maintained the integrity of data by designing backup and recovery procedures.

Technologies: Spark, Amazon Web Services (AWS), Python, Scala, Data Pipelines

Senior Data Engineer

2020 - 2021

Bang the Table

Architected the entire solution to extract data from MySQL, transformed data in ETL pipelines, and made it ready for data warehousing.
Created Spark ETL jobs and set up the entire framework to trigger these ETL jobs on AWS.
Designed and set up orchestration strategies using Apache Airflow to transform data in both near-real time and batch fashion.

Technologies: ETL, Spark, Apache Airflow, Amazon Web Services (AWS)

Expert Spark Developer

2020 - 2020

PatternEx, Inc. (via Toptal)

Developed rule engine in Spark scala and successfully deployed over to prod cluster.
Worked on Scala documents and prepared unit test cases.
Developed Scala utilities.

Technologies: Scala, PySpark, Spark

Big Data Developer

2015 - 2017

InfoObjects, Inc.

Created efficient Spark jobs to extract the required information from raw OMOP parquet files.
Deployed Spark jobs on Amazon EMR using data pipelines.
Developed Lambda functions for triggering the required data pipeline.
Estimated the time for tasks and prepared a well-defined plan to achieve estimations.
Handled product and client interactions properly from end-to-end.

Technologies: Amazon Web Services (AWS), Python, Unix Shell Scripting, Java, AWS Data Pipeline Service, MySQL, Shell Scripting, Amazon Athena, Amazon S3 (AWS S3), Amazon Elastic MapReduce (EMR), Scala, Apache Spark

Programmer Analyst

2014 - 2015

Cognizant

Provided the team with a vision of the project objectives.
Motivated and inspired team members.
Reported the status of team activities against the program plan or schedule.
Interacted with product customers and helped them to resolve their issues through detailed analysis.
Developed MapReduce jobs as per the project requirements.
Created efficient Spark jobs for fetching real-time sensor data and assigned the alarms to specified engineers as per the business logic.

Technologies: Apache ZooKeeper, Apache Kafka, Apache Hive, Spark, MapReduce, HDFS, Hadoop

Experience

Roambee IoT

The Roambee Corporation is an IoT supply chain and enterprise asset visibility company. It offers real-time visibility of assets and goods outside the four walls of a global enterprise with patented hardware and software technology combined with an array of sensor data, proprietary analytics, predictive reporting, and open APIs.

Roambee bees (devices) continuously send heartbeats that involve many useful components like coordinates, temperature, battery life, and pictures. We gather this information on AWS S3, and then real-time tracking of the goods is shown on the UI. The front end was built on Node.js and the back end with Spark real-time streaming.

Nuveen Asset Insights

The project is based on an ETL model where data is collected from various sources, namely MDM and Salesforce, in an S3 data lake and fed to the PySpark pipelines for data quality checks and processing. Data flows through various layers from raw to curated to conform. The final data is moved to Redshift for analytics and reporting using Tableau.

AWS | ETL | Analytics

This project deals with ETL pipelines hosted on AWS. Incoming data from various sources like SFDC and Mulesoft. Real-time streams are ingested into a data lake on S3. As data arrives on S3, Lambda placed on S3 buckets triggers the data pipeline jobs written on PySpark and hosted on AWS EMR/AWS Glue. Data then moves from raw to transformed and then to the conformed layer, which is loaded to Redshift for analytics and reporting.

Python Interface | SQLAlchemy

Project deals with processing experimental images and extracting required information. I designed the Python interface, which creates a connection with PostgreSQL using SQLAlchemy. The interface takes raw experimental images as input and based on client requirements. The required information is extracted from the images which are then inserted into the database. The interface has the functionality to process, download, and delete images from the database as per the requirements.

Skills

Languages

Scala, Python, Java, SQL

Tools

Amazon Simple Queue Service (SQS), Git, Bitbucket, Apache ZooKeeper, Amazon CloudWatch, Apache Solr, Terraform, AWS CloudFormation, Amazon Elastic MapReduce (EMR), Amazon Athena, AWS Glue, Apache Airflow, GitLab

Platforms

AWS Lambda, Unix, Linux, Apache Kafka, Amazon Web Services (AWS), AWS IoT

Other

EMR, Data Engineering, Big Data, Internet of Things (IoT), AWS Certified Solution Architect, Solution Architecture, Data Architecture, Shell Scripting, Unix Shell Scripting, Data Modeling, Solutioning, Data Analysis, Apache Cassandra

Frameworks

Spark, Hadoop, Apache Spark

Libraries/APIs

Node.js, PySpark

Paradigms

ETL Implementation & Design, MapReduce, ETL

Storage

HBase, Amazon S3 (AWS S3), Data Pipelines, HDFS, Apache Hive, Redshift, PostgreSQL, MySQL, AWS Data Pipeline Service, Elasticsearch

Education

2009 - 2013

Bachelor of Technology Degree in Computer Science

Kurukshetra University - Kurukshetra, India

Certifications

JUNE 2020 - JUNE 2023

AWS Solutions Architect—Professional

Amazon Web Services

JULY 2019 - JULY 2022

AWS Solution Architect—Associate

Amazon Web Services

AUGUST 2014 - PRESENT

Oracle Certified Java Programmer 6

Oracle

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring