Verified Expert in Engineering
Artificial Intelligence Engineer and Developer
Salman is a data scientist with half a decade of experience designing and implementing data and machine learning pipelines. He has won three international grand challenges sponsored by Amazon Web Services (AWS) and published multiple research papers in top journals and conferences.
PyCharm, PyTorch, TensorFlow, Jupyter Notebook, OpenCV, Computer Vision Algorithms, Pandas, AI Programming, Large Language Models (LLMs), Cloud
The most amazing...
...project I've led won 1st place in an Amazon contest and was #1 at the International Conference on Medical Image Computing and Computer Assisted Intervention.
LLM Fine-tuning Expert
- Developed pipelines to fine-tune open-source LLMs on custom data.
- Built Stable Diffusion pipelines to fine-tune custom data.
- Developed a LangChain-based agent to optimize the workflow.
LLM Prompt Engineer
NIC MAP Vision LLC
- Engineered LLM prompts to design the legal document Q/A in the chatbot.
- Engineered LLM prompts to design the Q/A on document queries for the platform.
- Engineered LLM prompts to design the summarization of legal documents.
Machine Learning Developer
Atmospheric Data Solutions
- Designed data pipelines to manage vast amounts of data for an atmospheric-related project.
- Designed machine learning algorithms to improve wind speed predictions.
- Converted existing codebase from R to Python and optimized machine learning pipelines.
- Developed deep learning pipelines for BMI detection on complex data for a personal healthcare assistant.
- Developed deep learning algorithms that efficiently handle small datasets, enhancing their robustness by leveraging the distribution derived from limited data.
- Optimized existing models and reduced sizes of the models from 250MB to just 50MB.
Machine Learning Developer | Models Build and Models Fine Tune
- Designed and implemented deep learning large language model (LLM) pipelines on huge data sets.
- Optimized the existing training pipeline from both time and computation perspectives.
- Implemented custom attention heads for multiple LLMs.
Senior Data Scientist
- Implemented a machine learning pipeline for vessel delay prediction at Khalifa Port in the UAE. Reduction in error rate from more than 24 hours to two hours. This resulted in better use of resources, including data mining and ML at Khalifa Port.
- Executed the machine learning pipeline for job category detection through text mining.
- Implemented the pipeline to detect Arabic content originality through text mining.
- Implemented auto fault prediction in chips during manufacturing.
Graduate Research Assistant
Texas A&M University
- Researched T-cell and Receptor sequence contact prediction on human protein sequences using deep learning. (NLP).
- Investigated cancer region detection in whole slide images (WSI) in collaboration with the University of Chicago.
- Achieved the challenge of each WSI taking GBs to be stored, so it's impossible to use direct deep learning methods like image classification and segmentation.
- Implemented a deep learning pipeline for event and accident detection on self-driving car synthetic data.
- Executed an Arabic OCR detection pipeline based on EasyOCR adjustments.
- Worked on a handwriting recognition tool for Arabic schools.
National University of Computer and Emerging Sciences
- Researched breast cancer detection using whole slide images, computerized medical imaging, and graphics.
- Worked on a low-cost pathology project that received a $13.68 million grant for breast cancer detection.
- Worked on Amal. It wasn't just a project but served as an awareness campaign too. I was the lead to start a movement about low-cost pathology—breast cancer detection—in Pakistan using artificial intelligence.
National University of Computer and Emerging Sciences
- Developed a deep learning pipeline to detect breast cancer based on low-cost pathology by extracting whole slide images from a scanned microscopic mobile video.
- Designed a Python library and package to optimize training for whole slide images called OpTorch. Optimized the PyTorch training pipeline library for WSI. Published OpTorch research paper in a well-reputed conference.
- Built a deep learning pipeline to detect brain tumors based on CAT scan Images.
PMNet | A Probability Map-based Scaled Network for Breast Cancer Diagnosishttps://pubmed.ncbi.nlm.nih.gov/33578222/
Our approach yielded an f1-score of 88.9 (±1.7)%, which outperformed the benchmark f1-score of 81.2 (±1.3)% on patch level and achieved an average dice coefficient of 69.8% on 10 whole slide images compared to the benchmark average dice coefficient of 61.5% on BACH dataset.
Similarly, on the Dryad test dataset comprising 173 whole slide images, we achieved an average dice coefficient of 82.7% compared to the previous state-of-art of 76% without fine-tuning on this dataset. We further proposed a method to generate patch-level annotations for the image-level TCGA breast cancer database that will be useful for future deep learning methods.
Bias Adjustable Activation Network for Imbalanced Data | Diabetic Foot Ulcer Challenge 2021
Detecting diabetic foot ulcers is fundamental for healthcare specialists to prevent amputations. In this work, we performed multiple experiments to benchmark results on the grand. To adjust the bias of the convolutional neural networks, we also proposed a custom-designed activation layer based on softmax to handle the probability skew of the classes.
We achieved the second position in the validation set with a macro F1 score of 0.593 and the third position in the test set with a macro F1 score of 0.596 for the Diabetic Foot Ulcer Detection 2021 Grand Challenge.
PRNet | A Progressive Resolution-based Network for Radiograph-based Disease Classificationhttps://ieeexplore.ieee.org/document/9708553
Considering AI can play a significant role in accurately detecting such diseases, EE-RDS conducted a multi-class classification challenge by providing chest X-rays of pneumonia, COVID-19, and regular patients. We proposed PRNet, a novel deep learning pipeline, and achieved 96.3% accuracy, winning the second position on the test set leader board.
OpTorch | Optimized Deep Learning Architectures for Resource Limited Environmentshttps://arxiv.org/abs/2105.00619
In this paper, we proposed optimized deep learning pipelines in multiple aspects of training, including time and memory. OpTorch is a machine learning library designed to overcome weaknesses in existing implementations of neural network training. It provides features to train complex neural networks with limited computational resources.
OpTorch achieved the same accuracy as existing libraries on CIFAR-10 and CIFAR-100 datasets while reducing memory usage to approximately 50%. We also explored the effect of weights on total memory usage in deep learning pipelines.
In our experiments, parallel encoding-decoding along with sequential checkpoints result in a much-improved memory and time usage while keeping the accuracy similar to existing pipelines.
Python, C++, R
PyTorch, TensorFlow, OpenCV, Pandas, Keras, Spark ML, NumPy, XGBoost
Amazon SageMaker, PyCharm, Azure Machine Learning, BigQuery
Data Science, Best Practices, Business Intelligence (BI)
Jupyter Notebook, Azure, Mobile
Data Pipelines, PostgreSQL, MySQL
Machine Learning, Computer Vision, Natural Language Processing (NLP), Deep Learning, Image Processing, JSTransformers, Custom Models, Artificial Intelligence (AI), Cloud, Neural Networks, Artificial Neural Networks (ANN), Generative Adversarial Networks (GANs), Code Review, Source Code Review, Task Analysis, Technical Hiring, Interviewing, Facial Recognition, Computer Vision Algorithms, Language Models, Text Generation, Fine-tuning, Data Inference, Classification Algorithms, Classification, Text Classification, AI Programming, AI Design, Large Language Models (LLMs), GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, Generative Pre-trained Transformer 3 (GPT-3), OpenAI GPT-3 API, Data Scientist, Algorithms, Data Analysis, Dashboards, Reports, Research, Time Series, Statistical Analysis, Data Analytics, ChatGPT, OpenAI, Object Detection, Performance Optimization, Stable Diffusion, APIs, Chatbots, Data Visualization, Financial Forecasting, Image Generation, Leadership, Reinforcement Learning, Legal Documentation, Benchmarking, Open Neural Network Exchange (ONNX), API Integration, Integration, Wearables, Biometrics, Data Reporting, Point Clouds, Signal Processing, Health, Models, JupyterLab, Weather, Random Forests, Google Data Studio, Web Development
PhD in Computer Science
Texas A&M University - College Station, TX, USA
Bachelor's Degree in Computer Science
National University of Computer and Emerging Sciences - Islamabad, Pakistan
Winner of Object Detection for Dash CAM Images AI-challenege
Motive (Former KeepTruckin)
Winner of Chest-XRAY COVID-19 Grand Challenge
Amazon Web Services
Winner of Diabetic Foot Ulcer Detection Grand Challenge
Certificate of Achievement
The Manchester Metropolitan University