Halim Abbas, Data Scientist and Machine Learning Developer in San Jose, United States
Halim Abbas

Data Scientist and Machine Learning Developer in San Jose, United States

Member since October 24, 2019
Halim is a high tech innovator who's spearheaded world-class data science projects at game-changing tech companies like eBay and Teradata. Formally educated in machine learning, his professional expertise span information retrieval, natural language processing, and big data. Halim has a proven track record of applying state-of-the-art data science techniques across industry verticals such as eCommerce, web/mobile services, airline, and biopharma.
Halim is now available for hire


  • Cognoa
    Analytics, Data Science, Deep Learning, Machine Learning, Leadership...
  • Mathisit, Inc.
    Machine Learning, Image Recognition, Convolutional Neural Networks...
  • Teradata
    Scikit-learn, R, Python, Leadership, Team Leadership, Remote Team Leadership...



San Jose, United States



Preferred Environment

Git, Jupyter, Python

The most amazing...

...project I've worked on is an AI-driven pediatric behavioral health screener.


  • Chief AI Officer

    2016 - PRESENT
    • Recruited, hired, onboarded, and oversaw a data science team.
    • Applied machine learning (ML) and deep learning (DL) to build diagnostic classifiers for pediatric behavioral health conditions.
    • Developed proof points for the efficacy of the product by running properly blinded, sufficiently powered clinical validation studies.
    • Provided timely insights by building and maintaining user analytics pipelines and visualization.
    Technologies: Analytics, Data Science, Deep Learning, Machine Learning, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Healthcare Services, OpenCV, PyTorch, OpenFrameworks, TensorFlow, Pandas, Computer Vision, Object Detection, Object Tracking, Image Processing
  • CTO

    2020 - 2021
    Mathisit, Inc.
    • Advised a team of developers and data scientists on the technical roadmap and algorithm development strategy for a software holding company.
    • Recruited, ramped up, and oversaw a technical team of developers and data scientists.
    • Advised the company's executive leadership on the overall tech strategy and roadmap.
    Technologies: Machine Learning, Image Recognition, Convolutional Neural Networks, Classification Algorithms, Artificial Intelligence (AI), Remote Team Leadership, Computer Vision, Advisory, Technology Consulting, TensorFlow, Pandas, Object Detection, Image Processing, Hugging Face, OCR, Text Recognition
  • Principal Data Scientist

    2014 - 2016
    • Managed Think Big's data science consultation practice in the West Coast region.
    • Worked on big data science problems across multiple industries like eCommerce, fintech, biopharma, and medical imaging.
    • Applied ML techniques to various use cases like recommendation engines, customer profiling, churn modeling, predictive analytics, user segmentation, process optimization, next best action detection, and search relevance ranking.
    • Helped to close multiple sales and build repeatable consulting relationships with large enterprise customers.
    Technologies: Scikit-learn, R, Python, Leadership, Team Leadership, Remote Team Leadership, Advisory, Technical Consulting, Computer Vision, Object Detection, Object Tracking, Image Processing
  • Senior Research Scientist

    2009 - 2012
    • Led an applied research team. Built eBay's first machine-learned search relevance ranking engine from the ground up.
    • Managed multiple research tracks, grew a team of top-talent researchers, oversaw IP processes, and more.
    • Was involved in machine learning, data mining, auction modeling, user modeling and classification, click log analysis, and more.
    Technologies: Java, Hadoop, Computer Vision, Image Processing, OCR, Text Recognition
  • Machine Learning Research Scientist

    2009 - 2009
    • Developed an adaptive multimedia search relevance ranking system using machine learning (ML).
    • Experimented with ML ensemble decision trees using TreeNet.
    • Mentored new hires and ramped them up on the experimental framework.
    • Ran A/B testing experiments to produce evidence in support of improvement hypotheses.
    Technologies: Java
  • Research Lead

    2006 - 2008
    Code Green Networks
    • Developed an NLP system to classify documents reliably on live network feeds.
    • Contributed to the production R&D cycle by writing production code and fixing bugs in Java and C.
    • Supervised offline experimentation to develop more efficient algorithms underlying the product features.
    Technologies: Java, JavaScript
  • Research Staff

    2005 - 2006
    Columbia University — CCLS Lab
    • Developed a statistical-rule-based hybrid ML system for the automatic translation of natural language news headlines.
    • Worked on Arabic/English automated translation systems.
    • Applied validation tests and reported incremental improvements using the BLEU score.
    Technologies: Natural Language Processing (NLP), Java, OCR, Text Recognition


  • ML Approach for the Early Detection of Autism by Combining Questionnaires and Home Video Screening

    Existing screening tools for early detection of autism are expensive, cumbersome, time-intensive, and sometimes fall short in predictive value. In this work, we sought to apply machine learning (ML) to gold standard clinical data obtained across thousands of children at-risk for autism spectrum disorder to create a low-cost, quick, and easy-to-apply autism screening tool.

  • Real-time Document Classification Engine

    NLP based, trainable, configurable, document classification engine that is able to classify documents that are being transferred out of a network in real time in order to block certain types of documents. Part of a DLP (data loss prevention) feature set.

  • eCommerce Search Result Ranking Engine

    ML-based search result ranking solution for a major eCommerce engine serving hundreds of millions of users daily in multiple languages and multiple geos. The engine applies real-time ML to learn and adapt to changing inventory and changing queries, using recent click-logs as training data feeds. Scale required distributed system over Hadoop and data management using a Teradata instance.

  • AI ML Bootcamp

    I created and delivered a full-day boot camp to introduce business partners and venture capitalists to machine learning and AI. I presented them with foundational concepts, mathematical backgrounds, technical details, operational considerations, and business implications.

  • AI Powered Healthcare Mobile App

    I advised and led the development of AI algorithms to power an end-user mobile app to measure and assess the risk of health conditions using input photos, videos, questionnaires, and audio inputs with cutting-edge AI/ML algorithms.

  • Sports Card Marketplace and Social Network

    I advised and led a tech team to develop a fully automated online marketplace and social network around sports card trading. The technology included advanced AI/ML and computer vision models to identify, grade, and appraise user sports cards automatically.


  • Languages

    Python, Java, SQL, PHP, JavaScript, Objective-C, HTML, R, C++, Ruby
  • Paradigms

    Data Science, MapReduce, Functional Programming, Agile Software Development, Microsoft Query
  • Other

    Machine Learning, Artificial Intelligence (AI), Deep Learning, Natural Language Processing (NLP), Big Data, Computer Vision, Computer Science, Supervised Learning, Predictive Modeling, Predictive Analytics, Data Analysis, Data Visualization, Advisory, Technology Consulting, AI Design, Image Processing, Document Processing, OOP Designs, Architecture, SVMs, Convolutional Neural Networks, Recurrent Neural Networks (RNN), Natural Language Understanding (NLU), Natural Language Queries, Unsupervised Learning, Active Learning, Learning Transfer, Object Identification, LSTM Networks, Clustering, Cluster Analysis, Artificial Neural Networks (ANN), Neural Networks, Statistical Methods, Statistical Analysis, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Algorithms, Healthcare Services, Object Detection, Object Tracking, Analytics, Analytical Dashboards, Dashboards, Dashboard Design, Data Analytics, Complex Data Analysis, Data Reporting, eCommerce, Pattern Recognition, BERT, Networks, Naive Bayes, Distributed Systems, Information Retrieval, Website Ranking, Decision Trees, Custom BERT, OCR, Bayesian Statistics, Statistical Modeling, Sales Forecasting, Deep Neural Networks, Image Recognition, Classification Algorithms, Hugging Face, Training, Education, Online Course Design, Signal Processing, Health, Models, Text Recognition
  • Frameworks

    Hadoop, OpenFrameworks
  • Libraries/APIs

    Scikit-learn, Matplotlib, TensorFlow, Keras, LSTM, NLTK, OpenCV, PyTorch, Pandas
  • Tools

    Tableau, Amazon Elastic MapReduce (EMR), Jupyter, Git
  • Platforms

    iOS, Linux, MacOS, Databricks, Amazon EC2, Amazon Web Services (AWS), Mobile
  • Storage

    MySQL, NoSQL, MongoDB, Amazon S3 (AWS S3), Teradata Databases


  • Master's Degree in Machine Learning
    2004 - 2006
    Columbia University - New York City, NY, USA
  • Bachelor's Degree in Computer Engineering
    1998 - 2001
    Carleton University - Ottawa, Canada

To view more profiles

Join Toptal
Share it with others