Masum Billal, Developer in Dhaka, Dhaka Division, Bangladesh
Masum is available for hire
Hire Masum

Masum Billal

Verified Expert  in Engineering

Bio

Masum is a versatile and performance-oriented engineer with 7+ years of experience building scalable applications and AI-powered solutions across cloud platforms like AWS and GCP. He is an expert in Python frameworks, databases, automation, asynchronous programming, and architectures in microservices, cloud, and serverless. Masum drives company growth and client success by mentoring teams in adopting best practices for higher code quality and faster development.

Portfolio

Freelance Client
Python, Object-oriented Programming (OOP), SQL, NoSQL, Docker, Kubernetes...
Advanced Mobility Analytics Group
Python, APIs, Amazon Web Services (AWS), AWS CLI, PyTorch, NVIDIA CUDA...
Woven
Python, Pandas, GeoPandas, QA Automation, Pytest, Coverage.py, Code Coverage...

Experience

  • Microservices - 7 years
  • Django - 7 years
  • Machine Learning - 7 years
  • Python - 7 years
  • Data Engineering - 5 years
  • Amazon Web Services (AWS) - 5 years
  • Google Cloud Platform (GCP) - 4 years
  • FastAPI - 4 years

Availability

Part-time

Preferred Environment

Linux, Python, System Architecture, Cloud Computing, Solution Architecture, Back-end Development, Data Engineering, Machine Learning, Amazon Web Services (AWS), Google Cloud Platform (GCP)

The most amazing...

...thing I've done is architect a universal LLM interface and increase API performance by over 400%.

Work Experience

Python Engineer (via Toptal)

2024 - 2024
Freelance Client
  • Leveraged asynchronous FastAPI and SQLAlchemy, improving LLM API performance by over 400% and reducing latency.
  • Designed a universal API framework for LLM integration, cutting vendor onboarding time by over 80%.
  • Collaborated on 8+ cross-functional AI projects, ensuring scalable architecture and seamless deployment.
  • Mentored team members on best practices and improved CI/CD pipelines, leading to fewer bugs and faster development.
Technologies: Python, Object-oriented Programming (OOP), SQL, NoSQL, Docker, Kubernetes, Testing, Retrieval-augmented Generation (RAG), Django, Flask, FastAPI, Data Science, Machine Learning, Artificial Intelligence (AI), Apache Airflow, BigQuery, Google BigQuery, Google Cloud, Google Cloud Platform (GCP), Alembic, SQLAlchemy, Asyncio, Async/Await, Python Asyncio, RESTful Development, Back-end, Software Engineering, ChatGPT, OpenAI, Uvicorn, Gunicorn, Large Language Models (LLMs), PostgreSQL, Continuous Integration (CI), Continuous Delivery (CD), CI/CD Pipelines, GitHub Actions, Pytest, Tox, Coverage.py, Prometheus, Grafana, Middleware, OpenAI API, OpenAI SDK, OpenAI GPT-4 API, OpenAI GPT-3 API, Open-source LLMs, Ollama, Llama, Dify, Information Retrieval, Speech to Text AI, Text to Speech (TTS), Speech to Text, Image Generation, Moderation, AI Agents, Servers, API Development, Apache Kafka, Git, Software Architecture, Agile Sprints, Microservices, Back-end Development, Scalable Application, Scalable Architecture, Back-end APIs, Back-end Performance, Back-end Architecture

Senior Python Developer | Computer Vision Engineer

2023 - 2023
Advanced Mobility Analytics Group
  • Built real-time data pipelines for traffic analytics with 20% lower resource consumption with YOLO, PyTorch, ONNX, and TensorRT.
  • Developed high-performance near-real-time data pipelines for traffic monitoring using YOLO v7 and OpenCV.
  • Refactored legacy codebase into a well-maintained modern repository with high code coverage and best practices.
Technologies: Python, APIs, Amazon Web Services (AWS), AWS CLI, PyTorch, NVIDIA CUDA, NVIDIA TensorRT, Poetry, Databases, Asyncio, Python Asyncio, Containerization, Testing, Asynchronous Programming, Unit Testing, Code Refactoring, Docker, Tensorrt, Open Neural Network Exchange (ONNX), ONNX Runtime, Tox, Algorithms, Real-time Data, Real-time Systems, Real-time Computing, Real-time Vision Systems, Git, Deep Learning, Amazon S3 (AWS S3), Software Architecture, Microservices, Back-end Development, Scalable Application, Back-end, Back-end APIs, Back-end Performance, Back-end Architecture, Startups

Expert Python Developer (via Toptal)

2022 - 2023
Woven
  • Developed and optimized automated testing applications for mapping platforms, improving CI/CD performance and reducing testing costs.
  • Improved data processing and validation performance and CI execution time by around 90% using optimization techniques such as vectorization.
  • Defined best practices and standards for improving code quality and better maintenance.
Technologies: Python, Pandas, GeoPandas, QA Automation, Pytest, Coverage.py, Code Coverage, GIS, Spatial Analysis, Test-driven Development (TDD), Testing, Code Review, Code Refactoring, Unit Testing, Docker, Git

Senior Data Scientist | ML Engineer

2021 - 2023
iXora Solution
  • Engineered event-driven, real-time, and scheduled data pipelines in GCP, Django, Flask, MongoDB, and AWS.
  • Optimized ETL memory usage by over 95%, enabling large-scale data handling.
  • Developed artificially intelligent solutions using Django, NLP, and machine learning.
  • Saved a client thousands of dollars monthly by automating data analysis, cleaning, and reporting tools.
  • Developed an in-house facial recognition-based attendance system with incredibly low latency and 100% accuracy.
Technologies: Django, Django REST Framework, Google Cloud Platform (GCP), Flask, Machine Learning, Data Science, Deep Learning, Data Engineering, Python, Object-oriented Programming (OOP), NoSQL, MySQL, Back-end, Amazon Web Services (AWS), SQL, Continuous Integration (CI), JSON, Business Intelligence (BI), Data Analytics, REST APIs, APIs, CI/CD Pipelines, PySpark, Spark, ETL, Python 3, Document Parsing, Document Processing, Pytest, Code Coverage, Coverage.py, ETL Tools, Data Pipelines, Test-driven Development (TDD), Containerization, Testing, Redis, Unit Testing, Code Refactoring, Supervised Machine Learning, Microsoft SQL Server, Git, Google BigQuery, Amazon S3 (AWS S3), Software Architecture, Microservices, Generative Artificial Intelligence (GenAI), Back-end APIs, Back-end Development, Back-end Performance, Back-end Architecture

Senior Data Scientist | ML Engineer

2019 - 2020
SHOHOZ
  • Implemented user segmentation with clustering, reducing marketing spend.
  • Built fraud detection data pipelines, reducing fraudulent activities by over 50%.
  • Architected data solutions for the government-backed Corona Tracer BD.
Technologies: Azure, Data Engineering, Machine Learning, Recommendation Systems, Amazon S3 (AWS S3), Python, Object-oriented Programming (OOP), NoSQL, MySQL, Amazon Web Services (AWS), SQL, Continuous Integration (CI), JSON, Apache Kafka, Business Intelligence (BI), Data Analytics, ETL, PySpark, REST APIs, APIs, Python 3, Pytest, Code Coverage, Coverage.py, ETL Tools, REST, SaaS, Architecture, Software Design, API Integration, RESTful Microservices, Workflow, Visualization, Integration, Data Pipelines, Data Visualization, Plotly, Shapely, Clustering, Clustering Algorithms, Testing, Unit Testing, Code Refactoring, Git, Deep Learning, Seaborn, Software Architecture, Startups

Machine Learning Engineer

2018 - 2019
Auleek
  • Developed an application to detect architectural components of a floor plan using deep learning.
  • Created an application to determine whether a component can be placed in a floor plan.
  • Automated training and prediction of floor plan images.
Technologies: Machine Learning, Deep Learning, Python, Object-oriented Programming (OOP), PySpark, Spark, REST APIs, APIs, Python 3, Business Services, Git, Seaborn, Generative Artificial Intelligence (GenAI)

RND Software Engineer

2017 - 2018
REVE Systems
  • Engaged in full-stack development for a government e-vet platform.
  • Improved the packet analysis tool for high-throughput networks.
  • Developed the application with technologies like Java, Hibernate, JPA ORM, and Thymeleaf.
Technologies: Java, C++, Back-end, SQL, Socket Programming, Multithreading, REST APIs, APIs, Git, Back-end APIs, Back-end Development, Back-end Performance, Back-end Architecture, Scalability, Scalable Application, Scalable Architecture, Full-stack, Full-stack Development, JavaScript, Templating

Data Scientist | ML Engineer

2016 - 2016
Thread Equation PTE Ltd.
  • Developed an application powered by Django and Django REST framework.
  • Created an NLP-powered application to identify attacks on applications.
  • Integrated the machine learning application into the Django application.
Technologies: Python, REST APIs, APIs, Django, Django REST Framework, Machine Learning, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Pandas, Scikit-learn, Amazon Web Services (AWS), Microservices, REST, SaaS, Software Design, Architecture, Visualization, Integration

Experience

New York Vehicles Crash Data Interactive Visualization

A dynamic data pipeline and an interactive application that showcases vehicle crash incidents utilizing MTA Open Data. The process begins with a pipeline that extracts data from a CSV file and loads it into a DuckDB table. This data is then harnessed by a Streamlit application, which presents engaging and interactive dashboards for users to explore.

Xin-ORM

https://github.com/proafxin/xin
A Pydantic-powered universal ORM wrapper for databases that enables users to do the following:
• Execute queries on a database.
• Read a database table as a data frame.
• Write a data frame to a database table (still under development).
• Flatten and normalize a data frame with a nested structure.
• Serialize a data frame as a list of Pydantic models.
• Deserialize a list of Pydantic models as a data frame.

OCR Scorer

https://github.com/proafxin/tdm-tracker
An application to keep track of scores using optical character recognition (OCR) on screenshots of scoreboards. PUBG does not retain scores for TDM mode, so I created this high-performance API, leveraging Django Ninja, deep learning, OCR technologies, and asynchronous programming.

Corona Tracer

A joint venture of the Bangladesh government and SHOHOZ to fight COVID-19.

I was the data lead, building data pipelines, analytics, and visualizations. Due to the nature of the pandemic, one challenge was delivering the project within a very tight timeline. Ultimately, we created automatic real-time data pipelines and provided government officials with analytics, reports, and visualizations.

ShiftSmart

https://shiftsmart.com/
Built a data pipeline to handle real-time and stored data. The architecture involved microservices that ingest data from MongoDB and parse and write them into Google BigQuery. I enabled data transformation between NoSQL and SQL data, schema validation, and security handling.

Additionally, I used a Flask-based back-end application to facilitate intercommunication between the microservices in the pipeline.

DeepCortex

DeepCortex allows users to define machine learning processes in pipelines and execute them.

I worked as a data science team lead. We designed and developed the data science-related back end of DeepCortex.

Profiling Users Through Clustering

Profiled users automatically based on some criteria and generated scores.

Clustering was one of the steps of that pipeline. K-means was used to cluster users, and k-means++ was used as a seeding initialization technique to improve the clustering quality.

Facial Recognition API

Built the back end of an application to automatically register employees' attendance using the Django REST Framework (DFR) and Python OpenCV.

One of the challenges was to persist the data involved in this application.

Food Recommendation & Trend Analysis

Developed a recommendation system for a food delivery application using supervised machine learning algorithms.

Before feature engineering and tuning, matrix factorization was used as the primary choice model. I also utilized data mining and machine learning techniques for trend analysis.

Fraud Detection

An automated intelligent fraud detection mechanism that I developed. I also created data pipelines to automatically generate analytics and reports, as well as a notification system to notify the respective officials. Business actions were taken based on these fraud reports.

Fifa Simulation in APIs

https://github.com/proafxin/football_manager
Developed RESTful APIs using DFR. This is a pet project of mine, essentially a virtual simulation of a real-world football manager role, which gives the client a sense of the quality of my code.

In testing, the code has 100% coverage. Continuous integration is used to make sure no bad code is being pushed or merged.

I used Tox and GitHub Actions to ensure the project can be successfully deployed on various platforms and environments.

Dashboard for Cyberattacks

A Django-powered dashboard with REST APIs. I developed the dashboard using Django REST framework. The dashboard contained analytics, visualization, and other relevant information on the attempted cyberattacks on the application.

Pandas Extras

https://github.com/proafxin/pd-extras
A project that has some functions on top of Pandas, for example, writing a Pandas data frame to a database directly. The df.to_sql is severely insufficient for this purpose. It overwrites the current table, and it also requires manually creating an SQLAlchemy engine for connection.

Data frame-to-database is meant to remove all the extra steps from this writing process. Currently, the goal is to support SQL and NoSQL databases, including data warehouses such as Google BigQuery or Apache Cassandra. For SQL databases, SQLAlchemy is used internally to generalize all SQL database connections.

ETL Job Scheduling with Apache Airflow

https://github.com/proafxin/airflow
A project where I used Apache Airflow to schedule an ETL job to read data from the MySQL database, run aggregations, and write analytical data to a database.

Internally, SQLAlchemy and PyMySQL are used to connect to the database and communicate with it for reading and writing data. Pandera validates a Pandas data frame created from data retrieved in the SQL query.

Bug Tracker

https://github.com/proafxin/bug-tracker
A FastAPI-based back-end app that exposes a set of asynchronous back-end RESTful APIs using Python, FastAPI, SQLAlchemy, and MySQL for tracking bugs and stories.

The app is currently under development to reach 100% test coverage, with the minimum functionalities of creating and editing stories or bugs. The application can also be used as a Docker container. Poetry and Tox are used internally to maintain dependencies.

Seeding Methods in K-means Clustering

https://github.com/proafxin/seeding-kmeans
Research that I did on k-means clustering, specifically seeding methods for initializing the centers for Lloyd's algorithm. The research aims to improve the classical seeding method and discusses potential issues with the k-means++ paper.

Reffer: An Open-source Bibliography Management Solution

A community-based open-source bibliography solution that enables users to store, search, and add all their research references. This innovative solution is highly performant, scalable, and lightning fast.

Certifications

APRIL 2025 - PRESENT

AWS Cloud Technology Consultant

Amazon Web Services

APRIL 2025 - PRESENT

AWS Cloud Solutions Architect

Amazon Web Services

APRIL 2025 - PRESENT

AWS Fundamentals Specialization

Amazon Web Services

APRIL 2025 - PRESENT

IBM Data Science Professional Certificate

IBM | via Coursera

Skills

Libraries/APIs

Pandas, SciPy, NumPy, TensorFlow, PyTorch, OpenCV, Asyncio, Python Asyncio, API Development, Matplotlib, Scikit-learn, REST APIs, PySpark, SQLAlchemy, PyMongo, Pyodbc, PyMySQL, Shapely, OpenAI API, Pydantic, Folium, Back-end APIs

Tools

Pytest, Git, Seaborn, Jira, Coverage.py, Apache Airflow, GIS, Plotly, Jupyter, AWS CLI, Open Neural Network Exchange (ONNX), BigQuery, ChatGPT, Uvicorn, Grafana, Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), Amazon CloudWatch, AWS SDK, Amazon Virtual Private Cloud (VPC)

Languages

C++, Python, Java, SQL, Python 3, JavaScript

Frameworks

Django, Django REST Framework, Alembic, Django Ninja, Flask, Spark, Streamlit

Paradigms

Scrum, Object-oriented Programming (OOP), REST, Microservices, Testing, Asynchronous Programming, Code Refactoring, Continuous Integration (CI), Business Intelligence (BI), Socket Programming, ETL, RESTful Development, Test-driven Development (TDD), Unit Testing, Continuous Delivery (CD), Real-time Systems, Data-driven Methodology, Distributed Computing, Automation, Scalable Application, Back-end Architecture, Templating

Platforms

Jupyter Notebook, Google Cloud Platform (GCP), Amazon Web Services (AWS), AWS Lambda, Azure, Apache Kafka, Docker, NVIDIA CUDA, Kubernetes, Ollama, Amazon EC2

Storage

MongoDB, Amazon S3 (AWS S3), Databases, Redis, Amazon DynamoDB, NoSQL, MySQL, JSON, PostgreSQL, Microsoft SQL Server, Data Validation, Data Pipelines, Google Cloud, ScyllaDB, SQLite

Other

Clustering, Code Coverage, Machine Learning, Data Science, Deep Learning, Data Engineering, Recommendation Systems, Google BigQuery, Back-end, QA Automation, Architecture, Software Design, SaaS, API Integration, FastAPI, Containerization, Code Review, Supervised Machine Learning, Async/Await, Generative Artificial Intelligence (GenAI), Web Scraping, Software Architecture, Amazon RDS, Recurrent Neural Networks (RNNs), Data Visualization, Data Analytics, Data Mining, Agile Sprints, Multithreading, APIs, CI/CD Pipelines, Natural Language Processing (NLP), Document Parsing, Document Processing, ETL Tools, Data Warehousing, Pandera, DataFrames, Generative Pre-trained Transformers (GPT), GeoPandas, RESTful Microservices, Workflow, RESTful Services, Spatial Analysis, Visualization, Integration, Artificial Intelligence (AI), RESTful Web Services, Poetry, Tox, Business Services, Clustering Algorithms, K-means Clustering, NVIDIA TensorRT, Tensorrt, ONNX Runtime, Optical Character Recognition (OCR), EasyOCR, Tesseract, Image Processing, Convolutional Neural Networks (CNNs), Neural Networks, Retrieval-augmented Generation (RAG), Software Engineering, OpenAI, Gunicorn, Large Language Models (LLMs), GitHub Actions, Prometheus, Middleware, OpenAI SDK, OpenAI GPT-4 API, OpenAI GPT-3 API, Open-source LLMs, Llama, Dify, Information Retrieval, Atlas, Asyncpg, Polars, UV, Speech to Text AI, Text to Speech (TTS), Speech to Text, Image Generation, Moderation, AI Agents, Servers, Dagster, DuckDB, Bokeh, Algorithms, Real-time Data, Real-time Computing, Real-time Vision Systems, Data Analysis, CycleGAN, Generative Adversarial Networks (GANs), Analytical Dashboards, Cloud Computing, Big Data, Collaboration, System Architecture, Security Engineering, AWS Big Data, Solution Architecture, AWS Cloud Security, Cloud Infrastructure, Back-end Development, Scalable Architecture, Back-end Performance, Scalable Web Services, Full-stack, Startups, Scalability, Full-stack Development, Data Cleaning, Front-end

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring