
Pragyan Subedi
Verified Expert in Engineering
Data Science and Machine Learning Developer
Kathmandu, Central Development Region, Nepal
Toptal member since June 3, 2020
Pragyan is an applied machine learning engineer with a track record of working in numerous Silicon Valley startups across diverse industries. He has worked with datasets as large as 1+ billion data points, managed a 10-person data scientist team, built end-to-end machine learning pipelines from the ground up, and taught data science to over 100,000 students globally. Pragyan enjoys building large language models and is passionate about building production-ready deep learning models.
Portfolio
Experience
- Python - 6 years
- Data Science - 5 years
- Statistics - 5 years
- Data Visualization - 5 years
- Machine Learning - 5 years
- SQL - 4 years
- Natural Language Processing (NLP) - 3 years
- Deep Learning - 3 years
Availability
Preferred Environment
Git, Jira, Slack, Jupyter Notebook, Amazon EC2
The most amazing...
...deep learning model I've built was a voice-to-image synthesizer that mathematically constructs an image based on the provided audio description.
Work Experience
Machine Learning Engineer
Ten Lives / Terraferma Foods Inc. (YC S22)
- Contributed to a substantial increase in the company's novel DNA sequence yield, significantly reducing years of experimentation time through the strategic application of deep learning.
- Orchestrated the development of a robust data ETL pipeline, enabling the processing of over 100 million DNA sequences through distributed parallel computing on supercomputers leveraging the compute power of 190+ CPU cores and over 1 TB of RAM.
- Architected and developed SOTA deep learning models for diverse applications, including DNA sequence expression prediction, classification, and novel DNA sequence generation.
- Designed and implemented an end-to-end machine learning pipeline that spanned from handling extensive big-data processing to seamlessly delivering model predictions through custom-built APIs.
- Engineered and developed a full-stack DNA analytics dashboard for viewing data samples and summaries, generating model predictions, analyzing DNA sequences, and more.
- Implemented both white box and black box methods to enhance model interpretability.
Principal Data Scientist
Kharpann Enterprises Pvt
- Led the conception and development of The Click Reader, a platform that teaches data science to more than 100,000 students globally from 50+ countries. Sold the product to a US-based enterprise in December 2021.
- Secured the top 1% position (rank 30 out of 3,308) in the SIIM-ISIC Melanoma Classification Kaggle competition by building a computer vision algorithm with 94.4% accuracy in classifying malignant vs. benign images of skin cancer.
- Led the scraping, data pre-processing, and data visualization of garbage accumulation in Mt. Everest, along with a historical mapping of climbers according to their nationalities on a world map.
- Led the development of an asset allocation platform called myaaml.com that allocated investment portfolios into bonds, stocks, and cash based on market conditions.
- Curated 300,000+ words of content covering linear algebra, calculus, probability, statistics, numerical computation, and information theory for Fuse AI, a multinational AI education platform.
Data Visualization Expert and Business Analyst (QlikSense)
UOB Asset Management Ltd
- Acted as a data visualization expert and planned, created, and updated 11 different dashboard views for aiding the firm's business development team improve relationships with clients investing over a billion dollars in assets.
- Acted as a business analyst/product owner and extracted requirements from the firm's business development team (4+ people) to understand their various data visualization needs.
- Consolidated, critically analyzed and extracted information from over eight different financial datasets to successfully bring the data visualization project to fruition.
Airflow ETL Engineer
Idelic
- Migrated the ETL codebase for over ten trucking customers from legacy Celery jobs to Airflow DAGs with 30+ DAG migrations completed during the engagement.
- Wrote a dynamic DAG-generating Python library for standardizing extraction, transformation, and data load for multiple customer integrations.
- Contributed to ETL projects involving the use of REST API, SOAP API, and SFTP for data extraction, IXF models for transformation, and asynchronous and synchronous data loading.
- Collaborated seamlessly with a team of 10+ ETL engineers over different timezones following Agile principles.
Full-stack Data Scientist
A Property Tech Startup
- Built an end-to-end machine learning solution for predicting property prices, computing uncertainty of predictions, and predicting the days to rent for such properties.
- Implemented a continuous and automated machine learning procedure based on newly collected data for daily model retraining and redeploying.
- Architected the procedure for model backtesting as a function of time and implemented model drift monitoring.
- Developed client-facing APIs for serving model predictions and implemented mechanisms for assessing the quality of each prediction according to multiple business rules.
- Performed data cleaning and exploratory data analysis on real-world property datasets containing hundreds of thousands of data points and over 50 variables.
Associate Data Scientist
F1Soft International Pvt
- Wrote the in-house statistical guidebooks implementing a series of statistical tests for exploratory data analysis for the entire data science team.
- Built deep learning time-series models and imbalanced class classifiers for predicting loan defaults, credit card approvals, and more.
- Performed time-series forecasting for analyzing frequency, volume, and value of financial transactions in banks.
- Architected a financial analytical platform for banks from the ground up.
Python Developer
Hyperloop Nepal Pvt. Ltd. (Tootle)
- Developed the data analytical dashboards and platforms for the in-house marketing and operation teams of the ridesharing company.
- Fixed the pricing structure and introduced dynamic pricing, which increased the net revenue of each ride by a significant amount.
- Forecasted the number of daily rides on the ridesharing platform and achieved 98% accuracy on predictions.
- Predicted the churn of drivers and riders using predictive analytics and modeling.
- Segregated land areas geographically based on the number of drivers and riders in each area using unsupervised learning.
Data Analyst
HamroGSM
- Oversaw the collection and analysis of information related to the newly launched GSM handsets and their specifications.
- Developed an analytical platform to monitor the product's online reach, including web traffic, session length, and bounce rate.
- Analyzed the product's engagement to help content curators develop better content around the unboxing and review videos of GSM handsets.
Experience
Segment Anything Model (SAM) Implementation Breakdown Notebooks
https://github.com/PragyanSubedi/Segment-Anything-Model-BreakdownYou can learn more about the model here: segment-anything.com.
The Click Reader
https://www.theclickreader.com/As the product lead for this project, I led the curation and review of over 10 data science courses on the platform. The courses are as follows:
• Python for Data Science
• NumPy for Data Science
• Pandas for Data Science
• Matplotlib for Data Science
• Data Analysis with Python
• Supervised Machine Learning with Python
• Deep Learning Theoretical Course
• Convolutional Neural Network Theoretical Course
• Linear Algebra Mini Course
• Time-series Forecasting with TensorFlow 2.0 Full Course
Lending Club Loan Dataset EDA
https://www.kaggle.com/pragyanbo/a-hitchhiker-s-guide-to-lending-club-loan-dataThe project was very interesting as there were a lot of features to explore. I completed the entire project in a month. The analysis is publicly available on Kaggle and has received a silver medal on the platform.
Time-series Analysis and Modeling of US GDP Data
https://www.kaggle.com/pragyanbo/time-series-analysis-with-pythonEducation
Bachelor of Engineering Degree in Computer Engineering
Kathmandu Engineering College - Kathmandu, Nepal
Skills
Libraries/APIs
TensorFlow, Pandas, OpenCV, NumPy, PyTorch, REST APIs
Tools
Apache Airflow, Qlik Sense, ChatGPT, PyCharm, Slack, Jira, Git, Amazon SageMaker
Languages
Python, SQL, Snowflake, JavaScript
Paradigms
Business Intelligence (BI), ETL
Storage
MySQL, PostgreSQL, Neo4j, Amazon S3 (AWS S3)
Platforms
Jupyter Notebook, Amazon EC2, QlikView
Other
Data Science, Machine Learning, Deep Learning, Statistics, Mathematics, Data Visualization, BI Reports, Predictive Analytics, Predictive Modeling, Time Series Analysis, Artificial Intelligence (AI), Data Analysis, Data Analytics, Data Reporting, Convolutional Neural Networks (CNNs), Big Data, Analytics, Marketing Analytics, Time Series, APIs, Computer Vision, Data Scraping, Web Scraping, Business Planning, MLflow, SOAP, Natural Language Processing (NLP), Forecasting, Generative Pre-trained Transformers (GPT)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring