
Ja-Yuan Pendley
Verified Expert in Engineering
Data Engineer and Developer
New York, NY, United States
Toptal member since April 17, 2026
Ja-Yuan is an accomplished data and cloud engineer with 11+ years of experience designing, modernizing, and scaling enterprise-grade data platforms across AWS, Azure, and GCP. He excels at developing high-performance pipelines using Kafka, Hive, Scala, PySpark, Spark, Python, Databricks, and Airflow. Adept at aligning architecture with business strategy, Ja-Yuan elevates data reliability and governance and leads cross-functional teams to deliver secure, compliant, and scalable data ecosystems.
Portfolio
Experience
- Amazon Athena - 8 years
- Delta Lake - 8 years
- Hadoop - 8 years
- Google Cloud Platform (GCP) - 8 years
- Azure Databricks - 8 years
- ETL - 7 years
- Python - 7 years
- Snowflake - 6 years
Preferred Environment
AWS IoT
The most amazing...
...solution I've delivered is an end-to-end clinical-trial data platform, enabling advanced analytics, regulatory reporting, and real-time data availability.
Work Experience
Lead Azure Big Data Engineer
Pfizer
- Developed Azure Databricks pipelines for clinical-trial data integration and advanced analytics.
- Implemented a Delta Lake architecture with Bronze, Silver, and Gold layers to ensure regulatory traceability and auditability.
- Deployed Azure DevOps YAML pipelines for automated CI/CD, notebook versioning, and environment promotion.
Senior AWS Data Engineer
Credit Suisse Group
- Built AWS Glue ETL frameworks for market-risk and compliance data processing.
- Improved pipeline performance, observability, and compliance alignment through optimized ETL orchestration and governed access.
- Delivered an end-to-end market-risk and compliance data platform on AWS using Glue, S3, and Redshift to support regulatory reporting, cross-domain analytics, and automated data-quality controls.
Experience
Clinical-trial Data Platform
Skills
Libraries/APIs
PySpark
Tools
Terraform, Amazon CloudWatch, AWS Step Functions, Amazon Athena, Amazon Redshift Spectrum, AWS Glue, Confluence, Azure Key Vault, Microsoft Power BI
Frameworks
Hadoop, Apache Spark
Languages
Python, Snowflake, YAML
Paradigms
ETL, Azure DevOps
Platforms
Google Cloud Platform (GCP), AWS IoT, AWS Lambda, Apache Kafka, Azure Functions, Azure Synapse
Storage
PostgreSQL, MongoDB, Apache Hive, Amazon S3 (AWS S3)
Other
Azure Databricks, EMR, AWS Lake Formation, Amazon Redshift, Microsoft Purview, Delta Lake, Azure Data Factory (ADF)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring