George is available for hire

George McIntire

Verified Expert in Engineering

Data Scientist and Developer

Berkeley, CA, United States

Toptal member since January 31, 2024

Expertise

Machine Learning NLP Data Analysis Data Visualization Artificial Intelligence Data Science Web Scraping Data Mining Data Scraping Dashboard Prompt Engineering OpenAI Streamlit Development ChatGPT LLM

Bio

George is a results-driven data scientist who brings a diverse skill set and rich experience to the table. His expertise lies in data translation and a versatile toolkit, including Python, SQL, and machine learning. George excels at distilling complex findings into actionable and comprehensible insights, making data accessible and impactful for stakeholders.

Portfolio

The Ehlers-Danlos Society

Statistics, Data Science, Delphi Technique, Research, Surveys...

Self-employed

Large Language Models (LLMs), Data Analysis, Amazon Web Services (AWS)...

W.W.S. INC.

Data Science, Python, Pandas, Data Scientist, Palantir, Time Series...

Experience

Python - 9 years
Machine Learning - 8 years
Natural Language Processing (NLP) - 8 years
Artificial Intelligence (AI) - 8 years
Large Language Models (LLMs) - 4 years
LangChain - 3 years
Pinecone - 3 years
Retrieval-augmented Generation (RAG) - 3 years

Preferred Environment

Jupyter Notebook, Python 3, Google BigQuery, GitHub, Amazon SageMaker, Amazon Web Services (AWS), SQL

The most amazing...

...solution I've created is a text classification and quote extraction pipeline for a nonprofit focused on analyzing how the media quotes men as opposed to women.

Work Experience

Clinical Research & Consensus Lead

2024 - PRESENT

The Ehlers-Danlos Society

Developed a standardized clinical framework for diagnostic accuracy by facilitating expert consensus; translated qualitative medical insights into quantifiable metrics for global implementation.
Handled survey development, data analysis, and formal reporting, ensuring actionable insights that will drive improved diagnostic accuracy and patient outcomes.
Authored comprehensive case study materials and executive summaries to drive patient outcome improvements and clinical model adoption.

Technologies: Statistics, Data Science, Delphi Technique, Research, Surveys, Statistical Analysis

Healthcare Analytics & NLP Consultant

2022 - PRESENT

Self-employed

Worked as the lead LLM developer for an app that uses a ChatGPT-powered LLM to generate custom infographics. I created a vector database for retrieval-augmented generation (RAG), and used Amazon DynamoDB to store user conversation history.
Consulted for a project that explores and tests data science methodologies for use in the legal profession. I fine-tuned and adapted ChatGPT to extract and analyze relevant information from case opinions to aid lawyers.
Synthesized healthcare worker sentiment and operational feedback using NLP, identifying friction points in care delivery to inform process optimization and scaling.
Created a MVP AI recipe recommendation app. Used role&goal and few-shot prompting for prompt engineering with Chat-GPT4. Also built a RAG db with Qdrant populated with recipes and ingredient nutrients.

Technologies: Large Language Models (LLMs), Data Analysis, Amazon Web Services (AWS), Web Scraping, Social Network Analysis, Natural Language Processing (NLP), Amazon DynamoDB, Machine Learning, Python, Predictive Modeling, Artificial Intelligence (AI), Data Science, Deep Learning, Neural Networks, TensorFlow, Clustering, Data Cleansing, ChatGPT, OpenAI, OpenAI GPT-4 API, Pandas, Scikit-learn, Plotly, PyTorch, Data Mining, Data Scraping, NumPy, Data Analytics, Streamlit, Generative Pre-trained Transformers (GPT), LangChain, SQL, SpaCy, Supervised Learning, Legal, Labeling, Transformers, OpenAI GPT-3 API, Prompt Engineering, Retrieval-augmented Generation (RAG), Hugging Face, MLflow, OpenAI API, ChatGPT Prompts, Data Scientist, Text Analytics, Feature Engineering, Generative Artificial Intelligence (GenAI), Pinecone

Data Scientist

2025 - 2026

W.W.S. INC.

Designed and implemented production-grade data pipelines on Palantir Foundry to ingest, transform, and analyze large-scale commodity and market data used in trading and analytics workflows.
Built incremental, scalable transforms to process high-volume time-series and tabular datasets, optimizing for performance, reproducibility, and downstream analytics consumption.
Developed advanced analytical workflows in Python and SQL, supporting exploratory analysis, feature engineering, and model-ready datasets for forecasting and decision support.
Delivered interactive dashboards and curated datasets for internal users, enabling self-serve analysis and faster iteration on market insights.

Technologies: Data Science, Python, Pandas, Data Scientist, Palantir, Time Series, Time Series Analysis

LLM & Data Science Consultant

2024 - 2025

BriefCatch

Developed an LLM output evaluation pipeline for assessing the quality of legal citations extracted and corrected by BriefCatch's proprietary LLM tools.
Used Microsoft Azure and LangChain for the technical aspects of the project.
Created a process for generating an automated report that displays a summary of the evaluation pipeline's results.

Technologies: Azure, ChatGPT, Large Language Models (LLMs), Word Embedding, Feature Engineering, Correlational Analysis

Data Scientist | ML Engineer

2021 - 2021

Twitter

Conducted exploratory data analysis using BigQuery and named entity recognition on millions of tweets reported by users for perceived terms of service violations.
Detected networks that coordinated reporting actions by users maliciously targeting other users for banning using NetworkX and Neo4j.
Built an interactive dashboard with the results using Looker Studio, which was used by Twitter's health data science team members to inform allocating resources to combat malicious behavior.

Technologies: Python 3, BigQuery, Natural Language Processing (NLP), Social Network Analysis, Data Analysis, Neo4j, Pandas, Scikit-learn, Plotly, Python, Data Mining, NumPy, PostgreSQL, Google BigQuery, Dashboards, Data Analytics, SQL, SpaCy, Supervised Learning, Data Scientist, Statistical Analysis, Feature Engineering, Statistical Modeling, Correlational Analysis, Dashboard Design, Dashboard Development, Looker, Pinecone, Google Colaboratory (Colab)

Data Visualization Analyst

2018 - 2020

Callisto Media

Conducted data research projects using exploratory data analysis, statistical analysis, and machine learning.
Partnered with the marketing team to build an interactive dashboard using Plotly's Dash tool, allowing them to visualize important KPIs for various campaigns easily.
Designed a word-similarity mechanism that outputs a score that evaluates how similar two Amazon key phrases are to one another, helping the company evaluate the Amazon book market to decide which types of books to publish.
Used Word2Vec to automate a process that matches Amazon search terms with their appropriate categories designated by the Callisto taxonomy—a vital project to the company, given its reliance on Amazon search data.

Technologies: Jupyter Notebook, Plotly, Pandas, Scikit-learn, Python, Data Mining, NumPy, Dash, Dashboards, Data Analytics, Data Visualization, Regression Modeling, SpaCy, Supervised Learning, Data Scientist, Cluster Analysis, Text Classification, Text Analytics, Topic Modeling, Word Embedding, Statistical Analysis, Feature Engineering, Statistical Modeling, Correlational Analysis, Dashboard Design, Dashboard Development, Tableau, Google Colaboratory (Colab), Statistics

Experience

DataJockey

https://github.com/GeorgeMcIntire/DataJockey

DataJockey is a passion project that combines my two professions: data science and DJing. This project applies my data science expertise to analyzing my song collection. With machine learning's increasing ability to process, synthesize, and even generate music, I became inspired to dive in and see if big data algorithms could help me better understand my musical oeuvre and optimize my routine DJ activities.

Protect Nil LLM

I was the principal large language model (LLM) developer for an innovative application incorporating a ChatGPT-powered language model (LLM) to craft personalized infographics, where I enhanced the ChatGPT model with memory capabilities by implementing a retrieval-augmented generation (RAG) approach by creating a vector database and leveraging DynamoDB to store user conversation history efficiently; my primary toolkit for LLM development was LangChain, and I significantly elevated model performance through adept, prompt engineering techniques. I orchestrated the development of a comprehensive full-stack pipeline, encompassing a user-friendly front-end experience powered by a StreamLit app, seamless storage of user information using Amazon RDS, and efficient application deployment via Amazon EC2. My experience in these endeavors showcases my proficiency in creating innovative and efficient solutions in LLM development and full-stack application deployment.

Gender Representation and Opinion Detection in the Media

https://www.ischool.berkeley.edu/projects/2022/gender-representation-and-opinion-detection-media

We built a front-end dashboard to help them visualize gender representation through news articles in their various issue areas. This builds off of previous work such as Informed Opinion’s Gender Gap Tracker and the Global Media Monitoring Project’s Who Makes the News Report. As part of this project we productionized the models, built a data pipeline, performed usability testing, and documented and handed off our work to the organization.

My role was training a subjectivity text classification model and mine patterns in the extracted quotes from the articles dataset.

Education

2020 - 2022

Master's Degree in Information Systems

UC Berkeley School of Information - Berkeley, CA, USA

2007 - 2011

Bachelor's Degree in Economics

Occidental College - Los Angeles, CA, USA

Skills

Libraries/APIs

Pandas, Scikit-learn, NumPy, PyTorch, TensorFlow, SpaCy, OpenAI API

Tools

ChatGPT, Plotly, GitHub, Amazon SageMaker, BigQuery, Looker, Tableau

Languages

SQL, Python, Python 3, R

Frameworks

Streamlit

Platforms

Azure, Jupyter Notebook, Amazon Web Services (AWS)

Storage

Databases, PostgreSQL, Amazon DynamoDB, Amazon S3 (AWS S3), Neo4j

Other

Machine Learning, Natural Language Processing (NLP), Data Analysis, Web Scraping, Data Visualization, Artificial Intelligence (AI), Data Science, Prompt Engineering, Clustering, Data Cleansing, OpenAI, Data Mining, Data Scraping, Dashboards, Data Analytics, Supervised Learning, ChatGPT Prompts, English, Text Classification, Data Scientist, Cluster Analysis, Unsupervised Learning, Text Analytics, Topic Modeling, Word Embedding, Statistical Analysis, Feature Engineering, Correlational Analysis, Dashboard Design, Dashboard Development, Google Colaboratory (Colab), Google BigQuery, Writing & Editing, Social Network Analysis, Causal Inference, Surveying, Large Language Models (LLMs), Predictive Modeling, Deep Learning, Neural Networks, Generative Pre-trained Transformers (GPT), OpenAI GPT-4 API, Chatbots, Dash, LangChain, Pinecone, Regression Modeling, Text Recognition, Labeling, Blogging, Technical Writing, Content Writing, Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), Transformers, OpenAI GPT-3 API, Retrieval-augmented Generation (RAG), Churn Analysis, Hugging Face, MLflow, Statistics, Statistical Modeling, Generative Artificial Intelligence (GenAI), Analytics, Recommendation Systems, Time Series, Time Series Analysis, Critical Thinking, Research, Information Systems, Economics, Amazon RDS, Legal, Delphi Technique, Surveys, Palantir, Data Engineering

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring