Masooma Ali, Developer in Mississauga, Canada
Masooma is available for hire
Hire Masooma

Masooma Ali

Verified Expert  in Engineering

Mathematical Modeling Developer

Location
Mississauga, Canada
Toptal Member Since
August 18, 2022

Masooma has over nine years of experience applying data science to problems in domains like astronomy and theoretical physics. She has also worked with automated soft skills coaching and parenting coaching. She has built ML models using both structured and unstructured data and currently enjoys applying ML for natural language processing (NLP) and audio tasks. Masooma also has extensive experience developing software collaboratively and has delivered projects in industry and academia.

Portfolio

Sesh
PyTorch, TensorFlow, Scikit-learn, Pandas, SQL, Kaldi, Librosa, Python...
GuruOps
Python 3, Amazon Web Services (AWS), Transformers, Scikit-learn, Node.js, Docker
Stepscan
Python 3, Data Visualization

Experience

Availability

Part-time

Preferred Environment

Unix, Visual Studio Code (VS Code), Jupyter Notebook, Python 3

The most amazing...

...thing I've built is an active speaker recognition system which uses audio and video data to place a bounding box around the current speaker in a video.

Work Experience

Senior Data Scientist

2021 - PRESENT
Sesh
  • Translated product feature ideas into solvable ML problems for seven critical features in the company app.
  • Researched, prototyped, and deployed models for sentiment recognition. Determined speech clarity and intelligibility, speaker recognition, jargon identification, NLP semantic similarity tasks, active speaker recognition, and content recommendation.
  • Built low latency microservices using AWS infrastructure. Supported online prediction with latency under 100 ms.
  • Constructed internal tools to surface model predictions for human testing and human, in-the-loop training. Customized tools for a variety of training tasks.
Technologies: PyTorch, TensorFlow, Scikit-learn, Pandas, SQL, Kaldi, Librosa, Python, JavaScript, Go

Data Scientist

2020 - 2020
GuruOps
  • Built a model to convert a blog post into a Twitter thread.
  • Deployed the model on Amazon SageMaker as microservices.
  • Setup the data models and relational database to manage user requests.
  • Managed a junior engineer to develop the back end and front end for the app in React and Node.js.
Technologies: Python 3, Amazon Web Services (AWS), Transformers, Scikit-learn, Node.js, Docker

Applied Scientist

2018 - 2018
Stepscan
  • Developed a novel graph-based algorithm to determine the tracks and posture of people walking on tiles equipped with pressure sensors.
  • Delivered a fully functional and tunable prototype for lab testing.
  • Provided technical documentation for tuning and extending the model.
Technologies: Python 3, Data Visualization

Brand Alignment Analyzer

Designed and deployed an algorithm to determine the semantic and lexical alignment between an investor pitch and a company's marketing material. The algorithm combined traditional ML techniques with a deep learning embedding model to generate insights on the amount of time in the pitch aligned to the marketing material (i.e., on-topic time). I highlighted the sentences on a displayed transcript aligned with the marketing material.

Active Speaker Recognition

Developed a multi-modal deep learning model to identify the active speaker in a video. The input to the model was prepared by stacking ten video frames per second and providing the tracks for detected objects. The model I created combined these images with audio to predict the bounding box around the current speaker if the speaker was detected on screen.

Topic Modeling on Forum Posts

Sesh was looking to adapt its technology to the parenting domain and needed to identify a core set of pressing problems for parents. I scraped over 500,000 posts from parenting forums in North America. I analyzed them with NLP techniques to create a list of topics parents discussed and defined sub-categories to identify parenting issues. To model the topics, I used word embeddings and clustering techniques (UMAP) combined with lexical chaining for sub-categorization. This project had a big data visualization component as I had to present my findings to the product team in an easily consumable format. This work directly impacted the product roadmap for Sesh's offering for parents.

Languages

Python 3, Python 2, Python, SQL, Perl, Fortran, C++, JavaScript, Go

Paradigms

Data Science, High-performance Computing

Platforms

Jupyter Notebook, Unix, Amazon Web Services (AWS), Docker, Visual Studio Code (VS Code)

Other

Markov Chain Monte Carlo (MCMC) Algorithms, Librosa, Data Analysis, Mathematical Modeling, Numerical Simulations, Machine Learning, Numerical Methods, Transformers, Data Visualization, Software Development, Data Modeling, Natural Language Processing (NLP), Data Mining, Web Scraping, GPT, Generative Pre-trained Transformers (GPT)

Libraries/APIs

PyTorch, TensorFlow, Scikit-learn, Pandas, Node.js, Natural Language Toolkit (NLTK), SpaCy, OpenCV, FFmpeg

Tools

MATLAB, Kaldi, Whisper

2014 - 2019

PhD Degree in Physics

University of New Brunswick - Fredericton, Canada

2009 - 2011

Master's Degree in Astrophysics

University of Bonn - Bonn, Germany

2006 - 2009

Bachelor's Degree in Physics

University of Delhi - New Delhi, India

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring