Masum Billal, Developer in Dhaka, Dhaka Division, Bangladesh
Masum is available for hire
Hire Masum

Masum Billal

Verified Expert  in Engineering

Software Developer

Dhaka, Dhaka Division, Bangladesh
Toptal Member Since
July 14, 2022

Masum is a data scientist, engineer, and back-end developer with around six years of experience. Coming from a solid mathematical background, he is very strong in problem-solving, back-end development, machine learning, and data science and engineering. Masum is a researcher by hobby and the author of two books and some papers in mathematics and data science published in international journals.


Advanced Mobility Analytics Group
Python, APIs, Amazon Web Services (AWS), AWS CLI, PyTorch, NVIDIA CUDA...
Python, Pandas, GeoPandas, QA Automation, Pytest,, Code Coverage...
iXora Solution
Django, Django REST Framework, Google Cloud Platform (GCP), Flask...




Preferred Environment

Visual Studio Code (VS Code), Linux, Windows, Python

The most amazing...

...thing I've built is the library "pd-extras," which makes life easier for programmers to work with data.

Work Experience

Senior Python Developer and Computer Vision Engineer

2023 - 2023
Advanced Mobility Analytics Group
  • Developed new SDKs and APIs for traffic monitoring data deployed in AWS and shown in user-end dashboards.
  • Optimized and refactored existing libraries, resulting in improvements exceeding 100%.
  • Added best practice rules in place and refactored existing systems. Introduced the team to new tools for improved code maintenance.
Technologies: Python, APIs, Amazon Web Services (AWS), AWS CLI, PyTorch, NVIDIA CUDA, NVIDIA TensorRT, Poetry, Databases, Asyncio, Python Asyncio, Containerization, Testing, Asynchronous Programming, Unit Testing, Code Refactoring, Docker

Expert Python Developer (via Toptal)

2022 - 2023
  • Developed test automation tools and libraries for the GIS-based automated mapping platform team to validate geodata following the NDS.Live standard.
  • Improved coding standards and code quality across the whole team (e.g., increased overall code coverage from 80% to around 95%), documentation, etc., and wrote a wrapper library to be used internally that would reduce development time and effort.
  • Added many optimizations, including existing codes, that saw even hundreds of percentage improvements in running time.
Technologies: Python, Pandas, GeoPandas, QA Automation, Pytest,, Code Coverage, GIS, Spatial Analysis, Test-driven Development (TDD), Testing, Code Review, Code Refactoring, Unit Testing, Docker

Senior Data Scientist and ML Engineer

2021 - 2023
iXora Solution
  • Developed a data pipeline (ETL) to handle real-time and stored data. In this data pipeline, data would be ingested from a back-end application with a NoSQL database.
  • Transformed and validated the data and loaded it into a SQL database and Google BigQuery.
  • Built the back end of an application that allows users to create machine learning and data analysis pipelines and execute them.
  • Led a team of four members and overcame some serious time challenges.
  • Created a web application for attendance checking using facial recognition.
  • Developed a back-end RESTful API server and deployed it in GCP to retrieve credentials required for the ETL pipeline.
Technologies: Django, Django REST Framework, Google Cloud Platform (GCP), Flask, Machine Learning, Data Science, Deep Learning, Data Engineering, Python, Object-oriented Programming (OOP), NoSQL, MySQL, Back-end, Amazon Web Services (AWS), SQL, Continuous Integration (CI), JSON, Business Intelligence (BI), Data Analytics, REST APIs, APIs, CI/CD Pipelines, PySpark, Spark, ETL, Python 3, Document Parsing, Document Processing, Pytest, Code Coverage,, ETL Tools, Data Pipelines, Test-driven Development (TDD), Containerization, Testing, Redis, Unit Testing, Code Refactoring

Senior Data Scientist and ML Engineer

2019 - 2020
  • Built a data analytics and visualization solution for an application used by government officials to monitor nationwide data in Bangladesh during COVID-19.
  • Created a data pipeline (ETL) for transforming and validating NoSQL data and loading and dumping them into the SQL database.
  • Developed data pipelines (ETL) and data analysis and visualization pipelines for insight. Used cron jobs or event triggers on message queue systems to generate reports, analytics, and visualizations or learn machine learning models.
  • Led a team of four to oversee investigations of fraudulent activities in ride-sharing and food delivery applications.
  • Used data mining and machine learning models for trend analysis, user profiling, and recommendations.
Technologies: Azure, Data Engineering, Machine Learning, Recommendation Systems, Amazon S3 (AWS S3), Python, Object-oriented Programming (OOP), NoSQL, MySQL, Amazon Web Services (AWS), SQL, Continuous Integration (CI), JSON, Apache Kafka, Business Intelligence (BI), Data Analytics, ETL, PySpark, REST APIs, APIs, Python 3, Pytest, Code Coverage,, ETL Tools, REST, SaaS, Architecture, Software Design, API Integration, Microservices, RESTful Microservices, Workflow, Visualization, Integration, Data Pipelines, Data Visualization, Plotly, Shapely, Clustering, Clustering Algorithms, Testing, Unit Testing, Code Refactoring

Machine Learning Engineer

2018 - 2019
  • Developed an application to detect architectural components of a floor plan using deep learning.
  • Created an application to determine whether a component can be placed in a floor plan.
  • Automated training and prediction of floor plan images.
Technologies: Machine Learning, Deep Learning, Python, Object-oriented Programming (OOP), PySpark, Spark, REST APIs, APIs, Python 3, Business Services

RND Software Engineer

2017 - 2018
REVE Systems
  • Developed an e-veterinary web application for a ministry in Bangladesh.
  • Worked as a full-stack developer in Java Spring Boot framework, Thymeleaf, Vanilla JS, SQL, etc.
  • Used Thymeleaf for the front end and Hibernate JPA for persistent storage and object-relational mapping (ORM).
  • Developed an application to intercept, analyze, and store networking information using socket programming and multithreading in C and C++.
Technologies: Java, Spring Boot, JPA, C++, Back-end, SQL, Socket Programming, Multithreading, REST APIs, APIs

Data Scientist and ML Engineer

2016 - 2016
Thread Equation PTE Ltd.
  • Developed an application powered by Django and Django REST framework.
  • Created an NLP-powered application to identify attacks on applications.
  • Integrated the machine learning application into the Django application.
Technologies: Python, REST APIs, APIs, Django, Django REST Framework, Machine Learning, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Pandas, Scikit-learn, Amazon Web Services (AWS), Microservices, REST, SaaS, Software Design, Architecture, Visualization, Integration

Corona Tracer

Corona tracer is a joint venture of the Bangladesh government and Shohoz to fight COVID-19.

I worked as the data lead, building data pipelines, analytics, and visualizations. One of the challenges was to deliver the project within a very tight timeline due to the nature of the pandemic. In the end, we created automatic real-time data pipelines and provided analytics, reports, and visualizations to government officials.

Built a data pipeline to handle real-time and stored data. The architecture involved microservices that ingest data from MongoDB and parse and write them into Google BigQuery, data transformation between NoSQL and SQL data, schema validation, and security handling.

Used a Flask-based back-end application to facilitate inter-communication between the microservices in the pipeline.


DeepCortex allows users to define machine learning processes in pipelines and execute them.

I worked as a data science team lead. We designed and developed the data science-related back end of DeepCortex.

Profiling Users Using Clustering
Profiled users automatically based on some criteria and generated scores.

Clustering was one of the steps of that pipeline. k-means was used to cluster users, and k-means++ was used as a seeding initialization technique to improve the clustering quality.

Facial Recognition API

Built the back end of an application to automatically register employees' attendance using the Django REST framework (DFR) and Python OpenCV.

One of the challenges was to persist the data involved in this application.

Fraud Detection

An automated intelligent fraud detection mechanism that I developed. I also created data pipelines to automatically generate analytics and reports and a notification system to notify the respective officials. Business actions were taken based on these fraud reports.

Food Recommendation and Trend Analysis

I developed a recommendation system for a food delivery application using supervised machine learning algorithms.

Before feature engineering and tuning, matrix factorization was used as the primary choice model. I also utilized data mining and machine learning techniques for trend analysis.

Virtual Football Manager
Developed RESTful APIs using the Django REST framework. This is a pet project of mine, essentially a virtual simulation of a real-world football manager role, which gives the client a sense of the quality of my code.

In testing, the code has 100% coverage.

Continuous integration is used to make sure no bad code is being pushed or merged.

Tox and GitHub Actions are used to ensure the project can be successfully deployed in various platforms and environments.

Dashboard for Cyberattacks

A Django-powered dashboard with REST APIs. I developed the dashboard using Django REST framework. The dashboard contained analytics, visualization, and other relevant information on the attempted cyberattacks on the application.

Pandas Extras
It has some functions on top of pandas. For example, write a Pandas data frame to a database directly. The df.to_sql is severely insufficient for this purpose. It not only overwrites the current table, but it also requires manually creating an SQLAlchemy engine for connection. Data frame-to-database is meant to take all the extra steps away from this writing process. Currently, the goal is to support SQL and NoSQL databases, including data warehouses such as Google BigQuery or Apache Cassandra. For SQL databases, SQLAlchemy is used internally for generalizing all SQL database connections.

ETL Job Scheduling with Apache Airflow
In this project, I used Apache Airflow to schedule an ETL job to read data from the MySQL database, run aggregations, and write analytical data to a database.

Internally, SQLAlchemy and PyMySQL are used to connect to the database and communicate with it for reading and writing data.

Pandera is used to validate pandas data frame created from data retrieved in SQL query.

Bug Tracker
A FastAPI-based back-end app that exposes a set of asynchronous back-end RESTful APIs using Python, Fastapi, SQLAlchemy, MySQL, etc., for tracking bugs and stories. Currently under development to reach 100% test coverage with minimum functionalities of creating and editing stories/bugs. The application can also be used as Docker container. Poetry and tox are used internally for maintaining dependencies.

Seeding Methods in K-means Clustering
Research that I did on k-means clustering, specifically seeding methods for initializing the centers for Lloyd's algorithm. The research aims to improve the classical seeding method and discusses potential issues with the k-means++ paper.


Pandas, SciPy, NumPy, OpenCV, Asyncio, Python Asyncio, TensorFlow, PyTorch, Matplotlib, Scikit-learn, REST APIs, PySpark, SQLAlchemy, PyMongo, Pyodbc, Shapely


Pytest, Git, Jira,, Apache Airflow, GIS, Plotly, Jupyter, AWS CLI


Django, Django REST Framework, Flask, Spring Boot, Thymeleaf, Hibernate, JPA, Spark


C++, Python, Java, SQL, Python 3


Data Science, Scrum, Object-oriented Programming (OOP), REST, Microservices, Testing, Continuous Integration (CI), Business Intelligence (BI), Socket Programming, ETL, RESTful Development, Test-driven Development (TDD), Asynchronous Programming, Unit Testing, Code Refactoring


Google Cloud Platform (GCP), Azure, Jupyter Notebook, Amazon Web Services (AWS), Apache Kafka, Docker, NVIDIA CUDA


Redis, MongoDB, Amazon S3 (AWS S3), NoSQL, MySQL, JSON, PostgreSQL, Microsoft SQL Server, Data Validation, Databases, Data Pipelines


Clustering, Code Coverage, Machine Learning, Deep Learning, Data Engineering, Recommendation Systems, Back-end, QA Automation, Architecture, Software Design, SaaS, API Integration, Containerization, Google BigQuery, Recurrent Neural Networks (RNNs), Data Visualization, Data Analytics, Data Mining, Agile Sprints, Multithreading, APIs, CI/CD Pipelines, Natural Language Processing (NLP), Document Parsing, Document Processing, ETL Tools, Data Warehousing, Pandera, PyMySQL, DataFrames, Generative Pre-trained Transformers (GPT), GeoPandas, RESTful Microservices, Workflow, RESTful Services, Spatial Analysis, Visualization, Integration, Artificial Intelligence (AI), RESTful Web Services, FastAPI, Poetry, Tox, Business Services, Clustering Algorithms, K-means Clustering, NVIDIA TensorRT, Code Review

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring