Hasan Ali
Verified Expert in Engineering
AI Developer
Sydney, New South Wales, Australia
Toptal member since May 17, 2021
With over six years of experience spanning big data, data analytics, machine learning, and MLOps, Hasan has worked with a diverse range of companies, from Fortune 500 corporations to nimble startups. He believes in the philosophy of 3H—honesty, humility, and helpfulness. He loves sharing knowledge via boot camps and speaker events. He is currently building LLMs and GenAI products.
Portfolio
Experience
- Python - 7 years
- Pandas - 6 years
- Machine Learning - 5 years
- Natural Language Processing (NLP) - 5 years
- Deep Learning - 5 years
- Computer Vision - 5 years
- Optical Character Recognition (OCR) - 3 years
- Generative Pre-trained Transformer 3 (GPT-3) - 1 year
Availability
Preferred Environment
Visual Studio Code (VS Code), Linux, Git
The most amazing...
...product I've built is an LLM that gives personalized tips and insights to save electricity. Users can also ask any questions related to their power usage.
Work Experience
Lead Data Scientist
Delta Energia
- Built an MVP to disaggregate electricity into its contributing appliances in households. The product uses image generation (generative AI) at its core.
- Built a long-chain-based AI assistant to give insights and tips on saving energy. The users could also ask questions about their power consumption and appliance behavior.
- Created a multimodal network to combine images (cloud images) and tabular data (sensor data) to predict cloud coverage coverage. The model mitigated grid issues and reduced problems due to unforeseen changes in weather by 30%.
- Created analytics dashboards, user segmentation models, and other data analytics tools to help the business team extract more profound insights into the user's consumption behavior.
- Built SageMaker pipelines for deployment in production, which made the inference time 150% faster than before.
ML Engineer via Toptal
Doorstead
- Developed a Cox-PH model to predict the number of days until a house will be rented out. This new model improved the core business metrics by 20%.
- Developed a Streamlit tool using FAISS and k-nearest neighbors algorithm to find houses similar to a target property. This helped the business team make better pricing decisions in the rental home business.
- Used computer vision to identify areas, schools, roads, and other constructions from satellite imagery and map them area-wise using GIS tools.
- Contributed to building pricing models and recommendation engine models for the property based on market-based user preferences and the demand and supply gap of the market.
Data Scientist via Toptal
ML Software Company
- Worked on invoice processing. The product uses object detection and multi-model networks for table detection and extraction in images/PDFs. Using suitable postprocessing, the output was produced as a spreadsheet. Hence, IMGs/PDFs are transformed into Excel.
- Contributed to cheque processing. Built a named entity recognition and seq-2seq model to extract print information and handwritten texts respectively from the cheque and save it digitally.
- Worked on blurring personal identifiable information (PII) on government documents like driver's licenses, identification cards, etc., using text detection models and NLP.
- Used Azure Machine Learning to transform bank forms into digital forms saved in databases. Custom labeling was performed to extract specific info from the form.
Data Scientist
Various Freelance Clients
- Implemented a segmentation model for the detection of metal loss in the oil pipelines. Earlier manually operated, this process was automated with a hundredfold reduction in time and just one human-in-the-loop (HITL).
- Built an image classification model to identify multiple defects in oil pipelines with class imbalance accurately.
- Created batch processing pipelines for sensor data of over 100 gigabytes using Apache Spark and big data, enabling ETL on the incredible amount of sensor data coming to the client.
- Worked on demographic classification and identified fake users on a CSR scholarship platform. Built an ensemble of five algorithms for the same with individual hyperparameter-tuning.
- Contributed to inventory forecasting. Worked on time series forecasting for clients like Cadbury using probabilistic and Bi-LSTM models. Post the pandemic, this model was really helpful in their inventory forecasting.
- Worked on yield estimation, using drone data to extract farmer land from given coordinates and then identify the different crops using computer vision. Also, based on the data, estimated the yield if the crop was sugarcane.
Experience
Drone Survey Analysis Using Deep Learning
https://drive.google.com/file/d/1nx77nCsOD0cpK9pQpQHYAqcrpM8H3K-D/view?usp=sharingTASK
• Extract land images for every farmer.
• Identify crops on those images.
• Estimate the output of each crop for each farmer.
SOLUTION
QGIS was used to process the ECW file. I built a Python script to process the data and extract farmer land, using the coordinates from that ECW file and saving it in PNG format.
After that, an image classifier with a sliding window was used to detect the crops in the image. Later in the project, we used image segmentation to identify the total area occupied by the crop. Deep learning and image processing were used to estimate the crop yield for a particular farmer from that area.
Image Redaction Using Object Detection and OCR
https://drive.google.com/file/d/1umXP0MR6niCCca9u1p1YJ8fiaNuOfvkC/view?usp=sharingThe application could be used to:
1. Blur critical information on government-issued documents.
2. Blur the patient's name and other personal details on x-rays and other medical reports to make the diagnosis independent of the patient.
Solution: The product used custom object detection to detect text, faces, and document boundaries. Using boundaries, the main area was extracted and sent to OCR. The next step was to blur the areas in the image that show any critical PII.
Anomaly and Metal Loss Detection in Oil Pipelines
Solution: Oil pipelines have multiple anomalies, such as weld, sleeve, and T, with very high imbalance. First, we created and engineered a training dataset with a proportionate distribution of anomalies and normalities. Then, we trained a fully connected CNN and ML algorithm solution along with some post-processing. All these measures were used to ensure that we didn't miss any anomalies. So, overall, it was kind of a double-surety architecture.
For metal loss detection (highly imbalanced data): Again, we created and engineered a training dataset and used an image segmentation (modified U-net) with a custom loss function.
Impact: We automated the anomaly detection process and saved 1,000+ person-hours and $100,000+ every week.
Inventory Forecasting for Mondelez (Cadbury)
LLM Assistant for An Edtech Company
Education
Bachelor's Degree in Data Science
UPTU - New Delhi, India
Certifications
Prompt Engineering
DeepLearning.ai
Spark Fundamentals II
CognitiveClass.ai
Spark Fundamentals I
CognitiveClass.ai
Skills
Libraries/APIs
TensorFlow, NumPy, Pandas, PySpark, PyTorch, OpenCV
Tools
Azure Machine Learning, Amazon SageMaker, GIS, Jupyter, Git, Apache Airflow
Languages
SQL, Python, Python 3
Platforms
Linux, Docker, Apache Kafka, Kubernetes
Storage
PostgreSQL, MongoDB, Elasticsearch
Frameworks
Hadoop, LlamaIndex
Other
Computer Vision, Machine Learning, Deep Learning, Image Processing, Natural Language Processing (NLP), Data Science, Optical Character Recognition (OCR), Artificial Intelligence (AI), Generative Pre-trained Transformer 3 (GPT-3), MLflow, LangChain, Multivariate Statistical Modeling, Generative Pre-trained Transformers (GPT), FAISS, Statistical Modeling, Time Series, Big Data, Large Language Models (LLMs), OpenAI GPT-3 API, Prompt Engineering
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring