Priyanshu Bahuguna
Verified Expert in Engineering
Data Engineer and Developer
Noida, Uttar Pradesh, India
Toptal member since April 3, 2024
Priyanshu boasts over 18 years of experience in software development, where he has managed projects within multi-cloud environments, leading the delivery of data pipelines on Google Cloud Platform (GCP) and AWS while collaborating with global stakeholders. Priyanshu's accomplishments include achieving a 1.70% savings in efforts after implementing a data pipeline on AWS and delivering a 2.60% faster execution on an end-to-end data pipeline implemented on GCP for data curation and ML modeling.
Portfolio
Experience
- SQL - 10 years
- Amazon Web Services (AWS) - 5 years
- Snowflake - 5 years
- ETL Implementation & Design - 5 years
- Python 3 - 5 years
- Google Cloud Platform (GCP) - 5 years
- Data Quality Governance - 3 years
- Data Build Tool (dbt) - 3 years
Availability
Preferred Environment
Jupyter Notebook
The most amazing...
...project I've worked on involved migrating an on-prem system to the GCP data platform, reducing overall runtime to 60% while scaling the data volumes to 1.5x.
Work Experience
Lead Data Engineer
Cisco
- Designed and implemented medallion data architecture for curating telemetry data and deploying machine learning (ML) models on top of that.
- Tracked defects for the United States in Jira. Became well-versed with Agile methodologies.
- Reduced runtime of the end-to-end (E2E) data pipeline by 80% and achieved 70% savings in efforts for manual data investigation.
- Demonstrated expertise in designing and implementing high-performance, reusable, and scalable data models.
- Leveraged cutting-edge technologies and frameworks with a proven ability to lead teams in tackling even the most challenging problems head-on.
- Implemented an end-to-end data pipeline utilizing cloud-native technologies on AWS, automating the ingestion and enrichment of telemetry data, ML models, and summarization, reducing the time required to prepare vulnerability reports by 70%.
Senior Technical Lead
HCL Technologies
- Spearheaded the development and productization of machine learning model pipelines using Kubeflow on Google Kubernetes Engine (GKE), streamlining machine learning (ML) model deployment and management.
- Tracked and fixed bugs in Jira and utilized Agile methodologies.
- Leveraged operators within Kubeflow for distributed training and hyperparameter tuning, enhancing model performance and accuracy. Deployed machine learning (ML) models as scalable services within the Kubeflow environment.
- Managed a team of engineers dedicated to enhancing and supporting the eCommerce checkout/catalog modules. Collaborated closely with L1 and L2 teams to prioritize and resolve production incidents within agreed Service Level Agreements (SLA).
- Designed and implemented an Airflow-based framework for managing directed acyclic graph (DAG) dependencies, enabling efficient manual DAG execution and seamless workflow orchestration.
- Oversaw the operational aspects of a prominent mobility player's production eCommerce platform.
Technical Lead
Infosys
- Led the development and oversight of core banking solutions for a premier financial institution in the United States.
- Spearheaded the development of innovative features to augment existing banking solutions, enhancing functionality and user experience.
- Orchestrated the seamless migration of applications from legacy technology frameworks to contemporary Java frameworks, ensuring improved performance, scalability, and maintainability.
- Oversaw the production support of critical banking services, ensuring key financial systems' continuous operation and stability.
Experience
Syslog Data Pipeline
I architected the E2E data pipeline using serverless offerings to curate the data and chose a medallion architecture for designing a data warehouse after looking at the use cases. The goal was to provide transformed data to ML models and allow other consumers to access the data for their use cases.
In the medallion architecture, the data is segregated into bronze/silver and gold layers, with each subsequent layer providing more enriched data.
Stakeholders can then choose which degree of enrichment they want to consume. Data ingestion was done through files pushed to landing zones from where it was consumed by Dataflow jobs. I converted Snowflake queries into data build tool (dbt) workflows for better handling and quality mismatch.
Created a custom audit logging framework to capture all writing operations in the database, allowing efficient audit logging alerts for any unwanted write operations.
Education
Bachelor's Degree in Information Technology
Jaypee Institute of Information Technology - Noida, India
Skills
Libraries/APIs
PySpark
Tools
Cloud Dataflow, Google Kubernetes Engine (GKE), Terraform
Languages
Java, Python 3, Snowflake, SQL, Python
Paradigms
ETL Implementation & Design
Platforms
Google Cloud Platform (GCP), Amazon Web Services (AWS), Jupyter Notebook, Kubeflow
Storage
Databases
Other
Data Build Tool (dbt), Data Quality Governance, Stakeholder Engagement, CI/CD Pipelines, Cloud Migration, Oracle ATG Commerce, Algorithms, Machine Learning Operations (MLOps), APIs
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring