Hasan Ali, Developer in Sydney, New South Wales, Australia
Hasan is available for hire
Hire Hasan

Hasan Ali

Verified Expert  in Engineering

Bio

With over six years of experience spanning big data, data analytics, machine learning, and MLOps, Hasan has worked with a diverse range of companies, from Fortune 500 corporations to nimble startups. He believes in the philosophy of 3H—honesty, humility, and helpfulness. He loves sharing knowledge via boot camps and speaker events. He is currently building LLMs and GenAI products.

Portfolio

Delta Energia
Amazon SageMaker, TensorFlow, Deep Learning, Apache Kafka, Kubernetes, Docker...
Doorstead
Computer Vision, Deep Learning, Data Science, Pandas, NumPy, FAISS...
ML Software Company
Data Science, Natural Language Processing (NLP), TensorFlow, PyTorch...

Experience

  • Python - 7 years
  • Pandas - 6 years
  • Machine Learning - 5 years
  • Natural Language Processing (NLP) - 5 years
  • Deep Learning - 5 years
  • Computer Vision - 5 years
  • Optical Character Recognition (OCR) - 3 years
  • Generative Pre-trained Transformer 3 (GPT-3) - 1 year

Availability

Full-time

Preferred Environment

Visual Studio Code (VS Code), Linux, Git

The most amazing...

...product I've built is an LLM that gives personalized tips and insights to save electricity. Users can also ask any questions related to their power usage.

Work Experience

Lead Data Scientist

2022 - 2023
Delta Energia
  • Built an MVP to disaggregate electricity into its contributing appliances in households. The product uses image generation (generative AI) at its core.
  • Built a long-chain-based AI assistant to give insights and tips on saving energy. The users could also ask questions about their power consumption and appliance behavior.
  • Created a multimodal network to combine images (cloud images) and tabular data (sensor data) to predict cloud coverage coverage. The model mitigated grid issues and reduced problems due to unforeseen changes in weather by 30%.
  • Created analytics dashboards, user segmentation models, and other data analytics tools to help the business team extract more profound insights into the user's consumption behavior.
  • Built SageMaker pipelines for deployment in production, which made the inference time 150% faster than before.
Technologies: Amazon SageMaker, TensorFlow, Deep Learning, Apache Kafka, Kubernetes, Docker, Data Science, Python, MLflow, Elasticsearch, PyTorch, Apache Airflow, SQL, Generative Pre-trained Transformers (GPT), PostgreSQL, Artificial Intelligence (AI), LangChain, Generative Pre-trained Transformer 3 (GPT-3)

ML Engineer via Toptal

2021 - 2022
Doorstead
  • Developed a Cox-PH model to predict the number of days until a house will be rented out. This new model improved the core business metrics by 20%.
  • Developed a Streamlit tool using FAISS and k-nearest neighbors algorithm to find houses similar to a target property. This helped the business team make better pricing decisions in the rental home business.
  • Used computer vision to identify areas, schools, roads, and other constructions from satellite imagery and map them area-wise using GIS tools.
  • Contributed to building pricing models and recommendation engine models for the property based on market-based user preferences and the demand and supply gap of the market.
Technologies: Computer Vision, Deep Learning, Data Science, Pandas, NumPy, FAISS, Amazon SageMaker, GIS, TensorFlow, Python, Natural Language Processing (NLP), Statistical Modeling, Docker, Linux, Git, Apache Airflow, SQL, Artificial Intelligence (AI)

Data Scientist via Toptal

2020 - 2021
ML Software Company
  • Worked on invoice processing. The product uses object detection and multi-model networks for table detection and extraction in images/PDFs. Using suitable postprocessing, the output was produced as a spreadsheet. Hence, IMGs/PDFs are transformed into Excel.
  • Contributed to cheque processing. Built a named entity recognition and seq-2seq model to extract print information and handwritten texts respectively from the cheque and save it digitally.
  • Worked on blurring personal identifiable information (PII) on government documents like driver's licenses, identification cards, etc., using text detection models and NLP.
  • Used Azure Machine Learning to transform bank forms into digital forms saved in databases. Custom labeling was performed to extract specific info from the form.
Technologies: Data Science, Natural Language Processing (NLP), TensorFlow, PyTorch, Machine Learning, Deep Learning, Azure Machine Learning, Python, Computer Vision, OpenCV, Optical Character Recognition (OCR), Time Series, SQL, Image Processing

Data Scientist

2017 - 2020
Various Freelance Clients
  • Implemented a segmentation model for the detection of metal loss in the oil pipelines. Earlier manually operated, this process was automated with a hundredfold reduction in time and just one human-in-the-loop (HITL).
  • Built an image classification model to identify multiple defects in oil pipelines with class imbalance accurately.
  • Created batch processing pipelines for sensor data of over 100 gigabytes using Apache Spark and big data, enabling ETL on the incredible amount of sensor data coming to the client.
  • Worked on demographic classification and identified fake users on a CSR scholarship platform. Built an ensemble of five algorithms for the same with individual hyperparameter-tuning.
  • Contributed to inventory forecasting. Worked on time series forecasting for clients like Cadbury using probabilistic and Bi-LSTM models. Post the pandemic, this model was really helpful in their inventory forecasting.
  • Worked on yield estimation, using drone data to extract farmer land from given coordinates and then identify the different crops using computer vision. Also, based on the data, estimated the yield if the crop was sugarcane.
Technologies: Computer Vision, Deep Learning, TensorFlow, Pandas, NumPy, MongoDB, PySpark, Python, Big Data, SQL, Image Processing, Natural Language Processing (NLP), PostgreSQL

Drone Survey Analysis Using Deep Learning

https://drive.google.com/file/d/1nx77nCsOD0cpK9pQpQHYAqcrpM8H3K-D/view?usp=sharing
We were provided with coordinates of farmer lands and drone survey data (an ECW file) for an area.

TASK
• Extract land images for every farmer.
• Identify crops on those images.
• Estimate the output of each crop for each farmer.

SOLUTION
QGIS was used to process the ECW file. I built a Python script to process the data and extract farmer land, using the coordinates from that ECW file and saving it in PNG format.

After that, an image classifier with a sliding window was used to detect the crops in the image. Later in the project, we used image segmentation to identify the total area occupied by the crop. Deep learning and image processing were used to estimate the crop yield for a particular farmer from that area.

Image Redaction Using Object Detection and OCR

https://drive.google.com/file/d/1umXP0MR6niCCca9u1p1YJ8fiaNuOfvkC/view?usp=sharing
Objective: Build a product that takes any given document (PDF or image) and blurs the personally identifiable information (PII) on it.

The application could be used to:
1. Blur critical information on government-issued documents.
2. Blur the patient's name and other personal details on x-rays and other medical reports to make the diagnosis independent of the patient.

Solution: The product used custom object detection to detect text, faces, and document boundaries. Using boundaries, the main area was extracted and sent to OCR. The next step was to blur the areas in the image that show any critical PII.

Anomaly and Metal Loss Detection in Oil Pipelines

Task: Detect all anomalies and metal loss inside oil pipelines.

Solution: Oil pipelines have multiple anomalies, such as weld, sleeve, and T, with very high imbalance. First, we created and engineered a training dataset with a proportionate distribution of anomalies and normalities. Then, we trained a fully connected CNN and ML algorithm solution along with some post-processing. All these measures were used to ensure that we didn't miss any anomalies. So, overall, it was kind of a double-surety architecture.

For metal loss detection (highly imbalanced data): Again, we created and engineered a training dataset and used an image segmentation (modified U-net) with a custom loss function.

Impact: We automated the anomaly detection process and saved 1,000+ person-hours and $100,000+ every week.

Inventory Forecasting for Mondelez (Cadbury)

Inventory forecasting for a retail client with 500+ SKUs, including an ensemble of many time series-based models (like ARIMA), probability-based models, and a transformer for time-series forecasting with explainability. While creating the model, I added explainability, so the model predictions came with a confidence interval and an explanation of the reason for the predictions.

LLM Assistant for An Edtech Company

The LLM-based Support Assistant is a cutting-edge solution that provides students with quick and accurate answers to their queries using Large Language Models (LLMs) such as LlamaIndex and OpenAI GPT-3.5-turbo. This assistant leverages the power of natural language processing to comprehend and respond to a wide range of questions related to academic subjects, assignments, exams, and more.
2013 - 2017

Bachelor's Degree in Data Science

UPTU - New Delhi, India

FEBRUARY 2023 - PRESENT

Prompt Engineering

DeepLearning.ai

JULY 2019 - PRESENT

Spark Fundamentals II

CognitiveClass.ai

JUNE 2019 - PRESENT

Spark Fundamentals I

CognitiveClass.ai

Libraries/APIs

TensorFlow, NumPy, Pandas, PySpark, PyTorch, OpenCV

Tools

Azure Machine Learning, Amazon SageMaker, GIS, Jupyter, Git, Apache Airflow

Languages

SQL, Python, Python 3

Platforms

Linux, Docker, Apache Kafka, Kubernetes

Storage

PostgreSQL, MongoDB, Elasticsearch

Frameworks

Hadoop, LlamaIndex

Other

Computer Vision, Machine Learning, Deep Learning, Image Processing, Natural Language Processing (NLP), Data Science, Optical Character Recognition (OCR), Artificial Intelligence (AI), Generative Pre-trained Transformer 3 (GPT-3), MLflow, LangChain, Multivariate Statistical Modeling, Generative Pre-trained Transformers (GPT), FAISS, Statistical Modeling, Time Series, Big Data, Large Language Models (LLMs), OpenAI GPT-3 API, Prompt Engineering

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring