
Priyanshu Hasija
Verified Expert in Engineering
ETL and Big Data Developer
Priyanshu is an AWS-certified solutions architect with 10 years of experience in delivering strategic and data-oriented solutions. With expertise in a wide array of data technologies, including SQL, NoSQL, cloud databases, and data warehousing, he has developed and executed data strategies that have improved the efficiency, accuracy, and reliability of critical business processes. His proficiency in data modeling, ETL development, and data visualization has proved fruitful for the clients
Portfolio
Experience
Availability
Preferred Environment
Linux, Unix
The most amazing...
...thing i have developed was a real-time tracking pipeline for a logistics company which can monitor end-to-end movement of the shipment.
Work Experience
Data Engineer
Accent Technologies Inc.
- Developed architecture to ingest streaming data into AWS S3.
- Created ETL pipelines to transform AWS S3 data and load it into Elasticsearch and Cassandra.
- Optimized Spark jobs, which brought down the processing time of data from 5 hours to 30 minutes.
Data Architect
PVH
- Built the exploratory model to understand the potential of available data within PVH, CRM, and e-Commerce and enabled recommendations for customer segmentation, customer lifetime value, churn prediction, and product design.
- Identified patterns in the data to discover new potential business use cases.
- Discovered yet unrecognised patterns in the data to improve the existing business use cases.
- Defined and enforced data standards and policies within the organization. This included ensuring data quality, security, privacy, and compliance with regulatory requirements.
- Created conceptual, logical, and physical data models to represent the data needs of the PVH data analytics team. These models served as the basis for the development of data systems and applications.
Senior Data Engineer
KLM Royal Dutch Airlines
- Recommended infrastructure changes to improve storage capacity or performance, which eventually reduced the infrastructure cost.
- Performed automation of code deployment by creating CI/CD pipelines.
- Maintained the integrity of data by designing backup and recovery procedures.
Senior Data Engineer
Bang the Table
- Architected the entire solution to extract data from MySQL, transformed data in ETL pipelines, and made it ready for data warehousing.
- Created Spark ETL jobs and set up the entire framework to trigger these ETL jobs on AWS.
- Designed and set up orchestration strategies using Apache Airflow to transform data in both near-real time and batch fashion.
Expert Spark Developer
PatternEx, Inc. (via Toptal)
- Developed rule engine in Spark scala and successfully deployed over to prod cluster.
- Worked on Scala documents and prepared unit test cases.
- Developed Scala utilities.
Big Data Developer
InfoObjects, Inc.
- Created efficient Spark jobs to extract the required information from raw OMOP parquet files.
- Deployed Spark jobs on Amazon EMR using data pipelines.
- Developed Lambda functions for triggering the required data pipeline.
- Estimated the time for tasks and prepared a well-defined plan to achieve estimations.
- Handled product and client interactions properly from end-to-end.
Programmer Analyst
Cognizant
- Provided the team with a vision of the project objectives.
- Motivated and inspired team members.
- Reported the status of team activities against the program plan or schedule.
- Interacted with product customers and helped them to resolve their issues through detailed analysis.
- Developed MapReduce jobs as per the project requirements.
- Created efficient Spark jobs for fetching real-time sensor data and assigned the alarms to specified engineers as per the business logic.
Experience
Roambee IoT
Roambee bees (devices) continuously send heartbeats that involve many useful components like coordinates, temperature, battery life, and pictures. We gather this information on AWS S3, and then real-time tracking of the goods is shown on the UI. The front end was built on Node.js and the back end with Spark real-time streaming.
Nuveen Asset Insights
AWS | ETL | Analytics
Python Interface | SQLAlchemy
Skills
Languages
Scala, Python, Java, SQL
Tools
Amazon Simple Queue Service (SQS), Git, Bitbucket, Apache ZooKeeper, Amazon CloudWatch, Apache Solr, Terraform, AWS CloudFormation, Amazon Elastic MapReduce (EMR), Amazon Athena, AWS Glue, Apache Airflow, GitLab
Platforms
AWS Lambda, Unix, Linux, Apache Kafka, Amazon Web Services (AWS), AWS IoT
Other
EMR, Data Engineering, Big Data, Internet of Things (IoT), AWS Certified Solution Architect, Solution Architecture, Data Architecture, Shell Scripting, Unix Shell Scripting, Data Modeling, Solutioning, Data Analysis, Apache Cassandra
Frameworks
Spark, Hadoop, Apache Spark
Libraries/APIs
Node.js, PySpark
Paradigms
ETL Implementation & Design, MapReduce, ETL
Storage
HBase, Amazon S3 (AWS S3), Data Pipelines, HDFS, Apache Hive, Redshift, PostgreSQL, MySQL, AWS Data Pipeline Service, Elasticsearch
Education
Bachelor of Technology Degree in Computer Science
Kurukshetra University - Kurukshetra, India
Certifications
AWS Solutions Architect—Professional
Amazon Web Services
AWS Solution Architect—Associate
Amazon Web Services
Oracle Certified Java Programmer 6
Oracle