Priyanshu Hasija, Developer in Gurugram, India
Priyanshu is available for hire
Hire Priyanshu

Priyanshu Hasija

Verified Expert  in Engineering

ETL and Big Data Developer

Gurugram, India
Toptal Member Since
June 18, 2020

Priyanshu is an AWS-certified solutions architect with 10 years of experience in delivering strategic and data-oriented solutions. With expertise in a wide array of data technologies, including SQL, NoSQL, cloud databases, and data warehousing, he has developed and executed data strategies that have improved the efficiency, accuracy, and reliability of critical business processes. His proficiency in data modeling, ETL development, and data visualization has proved fruitful for the clients


Accent Technologies Inc.
Python, PySpark, Apache Spark, Data Analysis, SQL, Apache Cassandra...
Spark, Amazon Web Services (AWS), Apache Airflow, GitLab, Data Modeling...
KLM Royal Dutch Airlines
Spark, Amazon Web Services (AWS), Python, Scala, Data Pipelines




Preferred Environment

Linux, Unix

The most amazing...

...thing i have developed was a real-time tracking pipeline for a logistics company which can monitor end-to-end movement of the shipment.

Work Experience

Data Engineer

2023 - 2023
Accent Technologies Inc.
  • Developed architecture to ingest streaming data into AWS S3.
  • Created ETL pipelines to transform AWS S3 data and load it into Elasticsearch and Cassandra.
  • Optimized Spark jobs, which brought down the processing time of data from 5 hours to 30 minutes.
Technologies: Python, PySpark, Apache Spark, Data Analysis, SQL, Apache Cassandra, Apache Kafka, Elasticsearch

Data Architect

2022 - 2023
  • Built the exploratory model to understand the potential of available data within PVH, CRM, and e-Commerce and enabled recommendations for customer segmentation, customer lifetime value, churn prediction, and product design.
  • Identified patterns in the data to discover new potential business use cases.
  • Discovered yet unrecognised patterns in the data to improve the existing business use cases.
  • Defined and enforced data standards and policies within the organization. This included ensuring data quality, security, privacy, and compliance with regulatory requirements.
  • Created conceptual, logical, and physical data models to represent the data needs of the PVH data analytics team. These models served as the basis for the development of data systems and applications.
Technologies: Spark, Amazon Web Services (AWS), Apache Airflow, GitLab, Data Modeling, Big Data, Solutioning

Senior Data Engineer

2021 - 2022
KLM Royal Dutch Airlines
  • Recommended infrastructure changes to improve storage capacity or performance, which eventually reduced the infrastructure cost.
  • Performed automation of code deployment by creating CI/CD pipelines.
  • Maintained the integrity of data by designing backup and recovery procedures.
Technologies: Spark, Amazon Web Services (AWS), Python, Scala, Data Pipelines

Senior Data Engineer

2020 - 2021
Bang the Table
  • Architected the entire solution to extract data from MySQL, transformed data in ETL pipelines, and made it ready for data warehousing.
  • Created Spark ETL jobs and set up the entire framework to trigger these ETL jobs on AWS.
  • Designed and set up orchestration strategies using Apache Airflow to transform data in both near-real time and batch fashion.
Technologies: ETL, Spark, Apache Airflow, Amazon Web Services (AWS)

Expert Spark Developer

2020 - 2020
PatternEx, Inc. (via Toptal)
  • Developed rule engine in Spark scala and successfully deployed over to prod cluster.
  • Worked on Scala documents and prepared unit test cases.
  • Developed Scala utilities.
Technologies: Scala, PySpark, Spark

Big Data Developer

2015 - 2017
InfoObjects, Inc.
  • Created efficient Spark jobs to extract the required information from raw OMOP parquet files.
  • Deployed Spark jobs on Amazon EMR using data pipelines.
  • Developed Lambda functions for triggering the required data pipeline.
  • Estimated the time for tasks and prepared a well-defined plan to achieve estimations.
  • Handled product and client interactions properly from end-to-end.
Technologies: Amazon Web Services (AWS), Python, Unix Shell Scripting, Java, AWS Data Pipeline Service, MySQL, Shell Scripting, Amazon Athena, Amazon S3 (AWS S3), Amazon Elastic MapReduce (EMR), Scala, Apache Spark

Programmer Analyst

2014 - 2015
  • Provided the team with a vision of the project objectives.
  • Motivated and inspired team members.
  • Reported the status of team activities against the program plan or schedule.
  • Interacted with product customers and helped them to resolve their issues through detailed analysis.
  • Developed MapReduce jobs as per the project requirements.
  • Created efficient Spark jobs for fetching real-time sensor data and assigned the alarms to specified engineers as per the business logic.
Technologies: Apache ZooKeeper, Apache Kafka, Apache Hive, Spark, MapReduce, HDFS, Hadoop

Roambee IoT

The Roambee Corporation is an IoT supply chain and enterprise asset visibility company. It offers real-time visibility of assets and goods outside the four walls of a global enterprise with patented hardware and software technology combined with an array of sensor data, proprietary analytics, predictive reporting, and open APIs.

Roambee bees (devices) continuously send heartbeats that involve many useful components like coordinates, temperature, battery life, and pictures. We gather this information on AWS S3, and then real-time tracking of the goods is shown on the UI. The front end was built on Node.js and the back end with Spark real-time streaming.

Nuveen Asset Insights

The project is based on an ETL model where data is collected from various sources, namely MDM and Salesforce, in an S3 data lake and fed to the PySpark pipelines for data quality checks and processing. Data flows through various layers from raw to curated to conform. The final data is moved to Redshift for analytics and reporting using Tableau.

AWS | ETL | Analytics

This project deals with ETL pipelines hosted on AWS. Incoming data from various sources like SFDC and Mulesoft. Real-time streams are ingested into a data lake on S3. As data arrives on S3, Lambda placed on S3 buckets triggers the data pipeline jobs written on PySpark and hosted on AWS EMR/AWS Glue. Data then moves from raw to transformed and then to the conformed layer, which is loaded to Redshift for analytics and reporting.

Python Interface | SQLAlchemy

Project deals with processing experimental images and extracting required information. I designed the Python interface, which creates a connection with PostgreSQL using SQLAlchemy. The interface takes raw experimental images as input and based on client requirements. The required information is extracted from the images which are then inserted into the database. The interface has the functionality to process, download, and delete images from the database as per the requirements.


Scala, Python, Java, SQL


Amazon Simple Queue Service (SQS), Git, Bitbucket, Apache ZooKeeper, Amazon CloudWatch, Apache Solr, Terraform, AWS CloudFormation, Amazon Elastic MapReduce (EMR), Amazon Athena, AWS Glue, Apache Airflow, GitLab


AWS Lambda, Unix, Linux, Apache Kafka, Amazon Web Services (AWS), AWS IoT


EMR, Data Engineering, Big Data, Internet of Things (IoT), AWS Certified Solution Architect, Solution Architecture, Data Architecture, Shell Scripting, Unix Shell Scripting, Data Modeling, Solutioning, Data Analysis, Apache Cassandra


Spark, Hadoop, Apache Spark


Node.js, PySpark


ETL Implementation & Design, MapReduce, ETL


HBase, Amazon S3 (AWS S3), Data Pipelines, HDFS, Apache Hive, Redshift, PostgreSQL, MySQL, AWS Data Pipeline Service, Elasticsearch

2009 - 2013

Bachelor of Technology Degree in Computer Science

Kurukshetra University - Kurukshetra, India

JUNE 2020 - JUNE 2023

AWS Solutions Architect—Professional

Amazon Web Services

JULY 2019 - JULY 2022

AWS Solution Architect—Associate

Amazon Web Services


Oracle Certified Java Programmer 6