Halim Abbas, Developer in San Jose, CA, United States
Halim is available for hire
Hire Halim

Halim Abbas

Verified Expert  in Engineering

Data Scientist and Machine Learning Developer

Location
San Jose, CA, United States
Toptal Member Since
October 24, 2019

Halim is a high-tech innovator who's spearheaded world-class data science projects at game-changing tech companies like eBay and Teradata. Formally educated in machine learning, his professional expertise spans information retrieval, natural language processing, and big data. Halim has a proven track record of applying state-of-the-art data science techniques across industry verticals such as eCommerce, web and mobile services, airline, and biopharma.

Portfolio

Cognoa
Analytics, Data Science, Machine Learning, Leadership, Team Leadership...
Martian Learning Inc.
Artificial Intelligence (AI), Machine Learning...
GrantSmiths, LLC
Artificial Intelligence (AI), Research, Leadership, Communication, Advisory...

Experience

Availability

Full-time

Preferred Environment

Git, Jupyter, Python, GitHub

The most amazing...

...project I've worked on is an AI-driven pediatric behavioral health screener.

Work Experience

Chief AI Officer

2016 - PRESENT
Cognoa
  • Recruited, hired, onboarded, and oversaw a data science team.
  • Applied machine learning (ML) and deep learning (DL) to build diagnostic classifiers for pediatric behavioral health conditions.
  • Developed proof points for the efficacy of the product by running properly blinded, sufficiently powered clinical validation studies.
  • Provided timely insights by building and maintaining user analytics pipelines and visualization.
Technologies: Analytics, Data Science, Machine Learning, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Healthcare Services, OpenCV, PyTorch, OpenFrameworks, TensorFlow, Pandas, Computer Vision, Object Detection, Object Tracking, Image Processing, Python, Scikit-learn, Tableau, Amazon Web Services (AWS), Keras, Computer Science, Supervised Learning, Databricks, Training, Data Analysis, Data Management, Data Governance, Amazon S3 (AWS S3), OCR, AWS Lambda, Amazon SageMaker, Amazon DynamoDB, Amazon Textract, Amazon Comprehend, MySQL, APIs, Data Engineering, Microservices Architecture, Databases, MVP Design, SQL, Data Visualization, Data Modeling, GPU Computing, Healthcare, Oncology & Cancer Treatment, Research, CTO, Language Models, Workshop Facilitation, Fine-tuning, Text Classification, User Interface (UI), Programming, OpenAI GPT-4 API, Chatbot Conversation Design, Statistical Analysis, Data Reporting, Regression Modeling, Forecasting, Artificial Intelligence (AI), Big Data, Deep Learning, Advisory, Technology Consulting, AI Design, Sentiment Analysis, Python 3, TensorFlow Deep Learning Library (TFLearn), Neural Networks, R&D, AI Programming, Database Security, Technical Leadership, Software Architecture, Algorithms, Data Mining, Reporting, Dashboards, Bayesian Statistics, Generalized Linear Model (GLM), Minimum Viable Product (MVP), GitHub

Machine Learning Research Engineer

2023 - 2024
Martian Learning Inc.
  • Proposed a framework for quantifying and ranking the performance of large language model (LLM) routers.
  • Developed and implemented a benchmark for evaluating RAG systems.
  • Contributed to scientific publications around LLM router benchmarking.
Technologies: Artificial Intelligence (AI), Machine Learning, Natural Language Processing (NLP), Research, PyTorch, Python, Generative Artificial Intelligence (GenAI), Language Models, Large Language Models (LLMs), Open-source LLMs, Dashboards, Generative AI, Image Generation, Bayesian Statistics, Minimum Viable Product (MVP), Retrieval-augmented Generation (RAG), Embeddings from Language Models (ELMo), LangChain, GitHub

Senior AI/ML Advisor

2023 - 2023
GrantSmiths, LLC
  • Designed a complete, AI-powered software solution to meet the client's business needs, including high-level architecture, technical design specifications, cloud solution choice, and all other 3rd-party products and services.
  • Provided the client with a complete technical product roadmap, including design methodology, phases, milestones, product feature availability timeline, resources, team buildup, and cost and development time estimates.
  • Iterated with the client to produce a complete software design document detailing all aspects of the technical project, making adjustments to accommodate the client's business needs and industry-specific realities.
Technologies: Artificial Intelligence (AI), Research, Leadership, Communication, Advisory, AI Design, Architecture, Machine Learning, Deep Learning, Data Science, Python, Computer Science, Generative Pre-trained Transformers (GPT), Team Leadership, Analytics, Algorithms, Data Mining, Reporting, Google Cloud Platform (GCP), Large Language Models (LLMs), Open-source LLMs, Dashboards, Generative AI, Image Generation, Minimum Viable Product (MVP), Retrieval-augmented Generation (RAG), LangChain, GitHub

Data Scientist

2023 - 2023
Iron Light, Inc
  • Designed and implemented a predicted modeling ML algorithm based on voter record structured data.
  • Analyzed and profiled data from survey participants and advised the client on data completeness, usability, and representation.
  • Advised the client on best practices related to data science and machine learning R&D efforts.
Technologies: Data Science, Machine Learning, Amazon SageMaker, Natural Language Processing (NLP), Python, R, SQL, Amazon Athena, AI Design, Algorithms, AI Programming, Technical Leadership, Deep Learning, Generative Pre-trained Transformers (GPT), Leadership, Team Leadership, Advisory, Analytics, Data Mining, Google Cloud Platform (GCP), Large Language Models (LLMs), Open-source LLMs, Dashboards, Generative AI, Image Generation, Minimum Viable Product (MVP), Retrieval-augmented Generation (RAG), LangChain, GitHub

AI Developer

2023 - 2023
Sigma Squared Corporation
  • Worked on a generative AI tool to address actual cause-and-effect questions.
  • Navigated the client through a conversational AI tool's possible options and functionality that would predict and prescribe future actions.
  • Managed a team of data scientists and directed the overall company tech roadmaps.
Technologies: Artificial Intelligence (AI), OpenAI GPT-4 API, OpenAI GPT-3 API, Predictive Modeling, Natural Language Processing (NLP), OpenAI Gym, Predictive Analytics, Prescriptive Modeling, Prescriptive Analytics, Python 3, Machine Learning, Deep Learning, Data Science, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Advisory, Technology Consulting, AI Design, GPT, Generative Pre-trained Transformers (GPT), TensorFlow Deep Learning Library (TFLearn), OpenAI, gRPC, Neural Networks, Language Models, R&D, Generative Artificial Intelligence (GenAI), AI Programming, Database Security, Technical Leadership, Software Architecture, Analytics, Algorithms, Data Mining, Reporting, Dashboards, Generative AI, Image Generation, Minimum Viable Product (MVP), GitHub

NLP/Data Scientist

2023 - 2023
Airball, Inc.
  • Created a model to classify the different types of email.
  • Sanitized the information and created a pipeline to store it.
  • Implemented collaboratively with sync-up meetings as needed.
Technologies: Python, Natural Language Processing (NLP), Data Science, Machine Learning, GPT, Generative Pre-trained Transformers (GPT), PostgreSQL, Data Pipelines, Cloud Platforms, Pandas, Artificial Intelligence (AI), Deep Learning, Computer Vision, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Advisory, Technology Consulting, AI Design, OpenAI GPT-3 API, Python 3, TensorFlow Deep Learning Library (TFLearn), OpenAI, gRPC, Neural Networks, Language Models, R&D, Generative Artificial Intelligence (GenAI), AI Programming, Technical Leadership, Analytics, Algorithms, Dashboards, Generative AI, Bayesian Statistics, Minimum Viable Product (MVP), Retrieval-augmented Generation (RAG), GitHub

Senior AI Expert

2023 - 2023
Hasna Inc
  • Advised the CEO and executives on AI-powered applications in nutrigenomics.
  • Developed a product roadmap with key stakeholders based on research on state-of-the-art AI technology.
  • Communicated the proposed product vision and technical experimentation path to company leadership.
Technologies: Artificial Intelligence (AI), Consulting, Deep Learning, Data Science, Python, Scikit-learn, Computer Science, Supervised Learning, Data Analysis, Data Management, MySQL, APIs, Data Engineering, Microservices Architecture, Databases, SQL, Data Visualization, Data Modeling, Recommendation Systems, Healthcare Services, Healthcare, Research, Language Models, OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), OpenAI GPT-4 API, ChatGPT, Text Classification, User Interface (UI), Programming, Chatbots, Chatbot Conversation Design, Statistical Analysis, Regression Modeling, Forecasting, Machine Learning, Computer Vision, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Advisory, Technology Consulting, AI Design, GPT, Generative Pre-trained Transformers (GPT), Python 3, TensorFlow Deep Learning Library (TFLearn), OpenAI, gRPC, Neural Networks, R&D, Data Scraping, Generative Artificial Intelligence (GenAI), AI Programming, Technical Leadership, Analytics, Algorithms, Dashboards, Generative AI, Minimum Viable Product (MVP), Retrieval-augmented Generation (RAG), GitHub

AI Expert

2022 - 2023
RunKicker Pte Ltd
  • Led AI research into NCD risk assessment using computer vision and PPG signal processing.
  • Built BMI assessment AI algorithm by applying CNN computer vision to patient selfies.
  • Built blood pressure assessment AI algorithm by applying computer vision and time series analysis on video of a finger placed on a smartphone camera to capture PPG dynamics.
Technologies: Artificial Intelligence (AI), Image Processing, Python, Signal Processing, Health, Computer Vision, C++, Models, PyTorch, TensorFlow, Mobile, Healthcare Services, Healthcare, Research, Language Models, Generative Pre-trained Transformer 3 (GPT-3), Workshop Facilitation, Fine-tuning, ChatGPT, Text Classification, User Interface (UI), Programming, Chatbots, OpenAI GPT-4 API, Chatbot Conversation Design, Statistical Analysis, Large Language Models (LLMs), Regression Modeling, Forecasting, Quantitative Analysis, Machine Learning, Deep Learning, Data Science, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Technology Consulting, AI Design, GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Python 3, TensorFlow Deep Learning Library (TFLearn), Neural Networks, R&D, Data Scraping, Generative Artificial Intelligence (GenAI), AI Programming, Technical Leadership, Analytics, Algorithms, Reporting, Dashboards, Generative AI, Minimum Viable Product (MVP), Retrieval-augmented Generation (RAG), GitHub

CTO

2020 - 2021
Mathisit, Inc.
  • Advised a team of developers and data scientists on the technical roadmap and algorithm development strategy for a software holding company.
  • Recruited, ramped up, and oversaw a technical team of developers and data scientists.
  • Advised the company's executive leadership on the overall tech strategy and roadmap.
Technologies: Machine Learning, Image Recognition, Convolutional Neural Networks (CNN), Classification Algorithms, Artificial Intelligence (AI), Remote Team Leadership, Computer Vision, Advisory, Technology Consulting, TensorFlow, Pandas, Object Detection, Image Processing, Hugging Face, OCR, Text Recognition, Deep Learning, Data Science, Python, Scikit-learn, Tableau, Amazon Web Services (AWS), Keras, Computer Science, Supervised Learning, Data Analysis, Data Management, Amazon S3 (AWS S3), AWS Lambda, Amazon SageMaker, Amazon DynamoDB, Amazon Textract, MySQL, APIs, Data Engineering, Microservices Architecture, Databases, MVP Design, SQL, Data Visualization, Data Modeling, Recommendation Systems, GPU Computing, Research, CTO, Language Models, Workshop Facilitation, Fine-tuning, Pricing Models, Unsupervised Learning, Text Classification, User Interface (UI), Programming, Statistical Analysis, Regression Modeling, Forecasting, Quantitative Analysis, Big Data, Leadership, Team Leadership, Cross-functional Team Leadership, AI Design, Generative Pre-trained Transformers (GPT), Python 3, TensorFlow Deep Learning Library (TFLearn), Neural Networks, R&D, Data Scraping, Generative Artificial Intelligence (GenAI), Database Security, Technical Leadership, Software Architecture, Analytics, Algorithms, Reporting, Dashboards, Bayesian Statistics, Logistic Regression, Minimum Viable Product (MVP), GitHub

Principal Data Scientist

2014 - 2016
Teradata
  • Managed Think Big's data science consultation practice in the West Coast region.
  • Worked on big data science problems across multiple industries like eCommerce, fintech, biopharma, and medical imaging.
  • Applied ML techniques to various use cases like recommendation engines, customer profiling, churn modeling, predictive analytics, user segmentation, process optimization, next best action detection, and search relevance ranking.
  • Helped to close multiple sales and build repeatable consulting relationships with large enterprise customers.
Technologies: Scikit-learn, R, Python, Leadership, Team Leadership, Remote Team Leadership, Advisory, Computer Vision, Object Detection, Object Tracking, Image Processing, Data Science, Tableau, Amazon Web Services (AWS), Computer Science, Supervised Learning, Data Analysis, Data Management, Amazon S3 (AWS S3), OCR, Amazon SageMaker, Amazon DynamoDB, Amazon Textract, MySQL, APIs, Customer Segmentation, Data Engineering, Microservices Architecture, Databases, Finance, MVP Design, SQL, Data Visualization, Data Modeling, Recommendation Systems, eCommerce, Research, Language Models, Pricing Models, Data-driven Marketing, Unsupervised Learning, Text Classification, Integration, Programming, Statistical Analysis, Regression Modeling, Forecasting, Quantitative Analysis, Machine Learning, Artificial Intelligence (AI), Big Data, Deep Learning, Technology Consulting, AI Design, Sentiment Analysis, Python 3, R&D, Data Scraping, AI Programming, Technical Leadership, Analytics, Algorithms, Data Mining, Reporting, Dashboards, Bayesian Statistics, Generalized Linear Model (GLM), Logistic Regression, GitHub

Senior Research Scientist

2009 - 2012
eBay
  • Led an applied research team. Built eBay's first machine-learned search relevance ranking engine from the ground up.
  • Managed multiple research tracks, grew a team of top-talent researchers, oversaw IP processes, and more.
  • Was involved in machine learning, data mining, auction modeling, user modeling and classification, click log analysis, and more.
Technologies: Java, Hadoop, Computer Vision, Image Processing, OCR, Text Recognition, Data Science, Python, Amazon Web Services (AWS), Computer Science, Supervised Learning, Data Analysis, Data Management, Amazon S3 (AWS S3), MySQL, APIs, Data Engineering, Databases, MVP Design, SQL, R, Data Visualization, Data Modeling, Recommendation Systems, eCommerce, Large-scale Projects, Research, Language Models, Pricing Models, Unsupervised Learning, Text Classification, Integration, Programming, Statistical Analysis, Regression Modeling, Forecasting, Quantitative Analysis, Machine Learning, Artificial Intelligence (AI), Big Data, Team Leadership, AI Design, Sentiment Analysis, Python 3, R&D, Data Scraping, AI Programming, Technical Leadership, Analytics, Algorithms, Data Mining, Dashboards, Bayesian Statistics, Generalized Linear Model (GLM), Logistic Regression

Machine Learning Research Scientist

2009 - 2009
SearchMe
  • Developed an adaptive multimedia search relevance ranking system using machine learning (ML).
  • Experimented with ML ensemble decision trees using TreeNet.
  • Mentored new hires and ramped them up on the experimental framework.
  • Ran A/B testing experiments to produce evidence in support of improvement hypotheses.
Technologies: Java, Data Science, Python, Computer Science, Supervised Learning, Amazon Web Services (AWS), Data Analysis, Data Management, OCR, Databases, SQL, R, Data Visualization, Data Modeling, Recommendation Systems, Research, Text Classification, Integration, Programming, Statistical Analysis, Regression Modeling, Machine Learning, Artificial Intelligence (AI), Big Data, Sentiment Analysis, R&D, Data Scraping, AI Programming, Database Security, Analytics, Algorithms, Data Mining, Dashboards, Bayesian Statistics, Generalized Linear Model (GLM), Logistic Regression

Research Lead

2006 - 2008
Code Green Networks
  • Developed an NLP system to classify documents reliably on live network feeds.
  • Contributed to the production R&D cycle by writing production code and fixing bugs in Java and C.
  • Supervised offline experimentation to develop more efficient algorithms underlying the product features.
Technologies: Java, JavaScript, Data Science, Python, Computer Science, Supervised Learning, Amazon Web Services (AWS), Data Analysis, Data Management, Databases, SQL, Data Visualization, Data Modeling, Research, Text Classification, Integration, Programming, Statistical Analysis, Regression Modeling, Machine Learning, Artificial Intelligence (AI), Big Data, R&D, Data Scraping, AI Programming, Database Security, Analytics, Algorithms, Data Mining, Dashboards, Generalized Linear Model (GLM), Logistic Regression

Research Staff

2005 - 2006
Columbia University — CCLS Lab
  • Developed a statistical-rule-based hybrid ML system for the automatic translation of natural language news headlines.
  • Worked on Arabic/English automated translation systems.
  • Applied validation tests and reported incremental improvements using the BLEU score.
Technologies: Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Java, OCR, Text Recognition, Data Science, Python, Computer Science, Supervised Learning, Databases, SQL, Data Visualization, Data Modeling, Research, Text Classification, Integration, Programming, Statistical Analysis, Regression Modeling, Machine Learning, Artificial Intelligence (AI), Sentiment Analysis, R&D, Data Scraping, AI Programming, Analytics, Algorithms, Data Mining, Dashboards, Generalized Linear Model (GLM), Logistic Regression

ML Approach for the Early Detection of Autism by Combining Questionnaires and Home Video Screening

https://academic.oup.com/jamia/article/25/8/1000/4993666
Existing screening tools for early detection of autism are expensive, cumbersome, time-intensive, and sometimes fall short in predictive value. In this work, we sought to apply machine learning (ML) to gold standard clinical data obtained across thousands of children at-risk for autism spectrum disorder to create a low-cost, quick, and easy-to-apply autism screening tool.

Real-time Document Classification Engine

NLP based, trainable, configurable, document classification engine that is able to classify documents that are being transferred out of a network in real time in order to block certain types of documents. Part of a DLP (data loss prevention) feature set.

eCommerce Search Result Ranking Engine

ML-based search result ranking solution for a major eCommerce engine serving hundreds of millions of users daily in multiple languages and multiple geos. The engine applies real-time ML to learn and adapt to changing inventory and changing queries, using recent click-logs as training data feeds. Scale required distributed system over Hadoop and data management using a Teradata instance.

AI ML Bootcamp

I created and delivered a full-day boot camp to introduce business partners and venture capitalists to machine learning and AI. I presented them with foundational concepts, mathematical backgrounds, technical details, operational considerations, and business implications.

AI Powered Healthcare Mobile App

I advised and led the development of AI algorithms to power an end-user mobile app to measure and assess the risk of health conditions using input photos, videos, questionnaires, and audio inputs with cutting-edge AI/ML algorithms.

Sports Card Marketplace and Social Network

I advised and led a tech team to develop a fully automated online marketplace and social network around sports card trading. The technology included advanced AI/ML and computer vision models to identify, grade, and appraise user sports cards automatically.
2004 - 2006

Master's Degree in Machine Learning

Columbia University - New York City, NY, USA

1998 - 2001

Bachelor's Degree in Computer Engineering

Carleton University - Ottawa, Canada

Libraries/APIs

Scikit-learn, Matplotlib, TensorFlow, Keras, LSTM, Natural Language Toolkit (NLTK), OpenCV, PyTorch, Pandas, TensorFlow Deep Learning Library (TFLearn)

Tools

ChatGPT, Tableau, Amazon Elastic MapReduce (EMR), Amazon SageMaker, Amazon Textract, GitHub, Jupyter, Git, OpenAI Gym, Amazon Athena

Languages

Python, Java, SQL, PHP, JavaScript, Objective-C, HTML, R, Python 3, C++, Ruby

Platforms

Databricks, iOS, Linux, MacOS, Amazon EC2, Amazon Web Services (AWS), AWS Lambda, Mobile, Google Cloud Platform (GCP)

Paradigms

Data Science, MapReduce, Functional Programming, Agile Software Development, Microsoft Query, Microservices Architecture

Industry Expertise

Healthcare

Frameworks

Hadoop, gRPC, OpenFrameworks

Storage

MySQL, NoSQL, MongoDB, Amazon S3 (AWS S3), Amazon DynamoDB, Databases, Database Security, Teradata Databases, PostgreSQL, Data Pipelines

Other

Analytics, Dashboards, eCommerce, Machine Learning, Artificial Intelligence (AI), Deep Learning, Natural Language Processing (NLP), Big Data, Computer Vision, Computer Science, Supervised Learning, Predictive Modeling, Predictive Analytics, Neural Networks, Data Analysis, Algorithms, Healthcare Services, Advisory, Technology Consulting, AI Design, Image Processing, Training, GPT, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Data Management, APIs, Data Engineering, MVP Design, Recommendation Systems, Research, CTO, Workshop Facilitation, Text Classification, Programming, Large Language Models (LLMs), Regression Modeling, Forecasting, R&D, Data Scraping, Generative Artificial Intelligence (GenAI), AI Programming, Technical Leadership, Open-source LLMs, Generative AI, Minimum Viable Product (MVP), Retrieval-augmented Generation (RAG), Data Analytics, Document Processing, OCR, OOP Designs, Architecture, SVMs, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Natural Language Understanding (NLU), Natural Language Queries, Unsupervised Learning, Active Learning, Learning Transfer, Object Identification, LSTM Networks, Clustering, Cluster Analysis, Artificial Neural Networks (ANN), Statistical Methods, Bayesian Statistics, Data Visualization, Statistical Analysis, Leadership, Team Leadership, Remote Team Leadership, Cross-functional Team Leadership, Object Detection, Object Tracking, Chatbots, Data Governance, Customer Segmentation, Data Modeling, Large-scale Projects, Oncology & Cancer Treatment, Language Models, OpenAI GPT-4 API, Fine-tuning, Causal Inference, Pricing Models, Data-driven Marketing, Integration, Chatbot Conversation Design, Quantitative Analysis, Sentiment Analysis, OpenAI, Generative Adversarial Networks (GANs), Software Architecture, Data Mining, Reporting, Generalized Linear Model (GLM), Logistic Regression, Analytical Dashboards, Dashboard Design, Complex Data Analysis, Data Reporting, Pattern Recognition, BERT, Networks, Naive Bayes, Distributed Systems, Information Retrieval, Website Ranking, Decision Trees, Custom BERT, Statistical Modeling, Sales Forecasting, Deep Neural Networks, Image Recognition, Classification Algorithms, Hugging Face, Education, Online Course Design, Signal Processing, Health, Models, Text Recognition, Consulting, Google Colaboratory (Colab), Amazon Comprehend, Finance, GPU Computing, Generative Pre-trained Transformer 3 (GPT-3), User Interface (UI), Cloud Platforms, Prescriptive Modeling, Prescriptive Analytics, Communication, Image Generation, Chief AI Officer, Embeddings from Language Models (ELMo), LangChain

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring