George McIntire, Developer in Berkeley, CA, United States
George is available for hire
Hire George

George McIntire

Verified Expert  in Engineering

Data Scientists and Developer

Location
Berkeley, CA, United States
Toptal Member Since
January 31, 2024

George is a results-driven data scientist who brings a diverse skill set and rich experience to the table. His expertise lies in data translation and a versatile toolkit, including Python, SQL, and machine learning. George excels at distilling complex findings into actionable and comprehensible insights, making data accessible and impactful for stakeholders.

Portfolio

Self-employed
Large Language Models (LLMs), Data Analysis, Amazon Web Services (AWS)...
Twitter
Python 3, BigQuery, Natural Language Processing (NLP), Social Network Analysis...
Callisto Media
Jupyter Notebook, Plotly, Pandas, Scikit-learn, Python, Data Mining, NumPy...

Experience

Availability

Part-time

Preferred Environment

Jupyter Notebook, Python 3, Google BigQuery, GitHub, Amazon SageMaker, Amazon Web Services (AWS), SQL

The most amazing...

...solution I've created is a text classification and quote extraction pipeline for a nonprofit focused on analyzing how the media quotes men as opposed to women.

Work Experience

Data Science Consultant

2022 - PRESENT
Self-employed
  • Worked as the lead LLM developer for an app that uses a ChatGPT-powered LLM to generate custom infographics. I created a vector database for retrieval-augmented generation (RAG), and used Amazon DynamoDB to store user conversation history.
  • Consulted for a project that explores and tests data science methodologies for use in the legal profession. I fine-tuned and adapted ChatGPT to extract and analyze relevant information from case opinions to aid lawyers.
  • Leveraged unsupervised learning and sentence embeddings to analyze the survey results of healthcare workers.
Technologies: Large Language Models (LLMs), Data Analysis, Amazon Web Services (AWS), Web Scraping, Social Network Analysis, Natural Language Processing (NLP), Amazon DynamoDB, Machine Learning, Python, Predictive Modeling, Artificial Intelligence (AI), Data Science, Deep Learning, Neural Networks, TensorFlow, Clustering, Data Cleansing, ChatGPT, OpenAI, OpenAI GPT-4 API, Pandas, Scikit-learn, Plotly, PyTorch, Data Mining, Data Scraping, NumPy, Data Analytics, Streamlit, Generative Pre-trained Transformers (GPT), LangChain, SQL, SpaCy, Supervised Learning, Legal, Labeling, Transformers, OpenAI GPT-3 API

Data Science Intern

2021 - 2021
Twitter
  • Conducted exploratory data analysis using BigQuery and named entity recognition on millions of tweets reported by users for perceived terms of service violations.
  • Detected networks that coordinated reporting actions by users maliciously targeting other users for banning using NetworkX and Neo4j.
  • Built an interactive dashboard with the results using Looker Studio, which was used by Twitter's health data science team members to inform allocating resources to combat malicious behavior.
Technologies: Python 3, BigQuery, Natural Language Processing (NLP), Social Network Analysis, Data Analysis, Neo4j, Pandas, Scikit-learn, Plotly, Python, Data Mining, NumPy, PostgreSQL, Google BigQuery, Dashboards, Data Analytics, SQL, SpaCy, Supervised Learning

Data Visualization Analyst

2018 - 2020
Callisto Media
  • Conducted data research projects using exploratory data analysis, statistical analysis, and machine learning.
  • Partnered with the marketing team to build an interactive dashboard using Plotly's Dash tool, allowing them to visualize important KPIs for various campaigns easily.
  • Designed a word-similarity mechanism that outputs a score that evaluates how similar two Amazon key phrases are to one another, helping the company evaluate the Amazon book market to decide which types of books to publish.
  • Used Word2Vec to automate a process that matches Amazon search terms with their appropriate categories designated by the Callisto taxonomy—a vital project to the company, given its reliance on Amazon search data.
Technologies: Jupyter Notebook, Plotly, Pandas, Scikit-learn, Python, Data Mining, NumPy, Dash, Dashboards, Data Analytics, Data Visualization, Regression Modeling, SpaCy, Supervised Learning

DataJockey

https://github.com/GeorgeMcIntire/DataJockey
DataJockey is a passion project that combines my two professions: data science and DJing. This project applies my data science expertise to analyzing my song collection. With machine learning's increasing ability to process, synthesize, and even generate music, I became inspired to dive in and see if big data algorithms could help me better understand my musical oeuvre and optimize my routine DJ activities.

Protect Nil LLM

I was the principal large language model (LLM) developer for an innovative application incorporating a ChatGPT-powered language model (LLM) to craft personalized infographics. I enhanced the ChatGPT model with memory capabilities by implementing a retrieval-augmented generation (RAG) approach by creating a vector database and leveraging DynamoDB to store user conversation history efficiently. I orchestrated the development of a comprehensive full-stack pipeline, encompassing a user-friendly front-end experience powered by a StreamLit app, seamless storage of user information using Amazon RDS, and efficient application deployment via Amazon EC2. My primary toolkit for LLM development was LangChain, and I significantly elevated model performance through adept, prompt engineering techniques. My experience in these endeavors showcases my proficiency in creating innovative and efficient solutions in LLM development and full-stack application deployment.
2020 - 2022

Master's Degree in Information Systems

UC Berkeley School of Information - Berkeley, CA, USA

2007 - 2011

Bachelor's Degree in Economics

Occidental College - Los Angeles, CA, USA

Libraries/APIs

Pandas, Scikit-learn, NumPy, PyTorch, TensorFlow, SpaCy

Tools

Plotly, GitHub, Amazon SageMaker, ChatGPT, BigQuery

Languages

SQL, Python, Python 3, R

Paradigms

Data Science

Platforms

Jupyter Notebook, Amazon Web Services (AWS)

Frameworks

Streamlit

Storage

Databases, PostgreSQL, Amazon DynamoDB, Amazon S3 (AWS S3), Neo4j

Other

Machine Learning, Natural Language Processing (NLP), Data Analysis, Web Scraping, Data Visualization, Clustering, Data Cleansing, Data Mining, Data Scraping, Dashboards, Data Analytics, Supervised Learning, Google BigQuery, Writing & Editing, Social Network Analysis, Causal Inference, Surveying, Large Language Models (LLMs), Predictive Modeling, Artificial Intelligence (AI), Deep Learning, Neural Networks, Generative Pre-trained Transformers (GPT), OpenAI, OpenAI GPT-4 API, Chatbots, Dash, LangChain, Pinecone, Regression Modeling, Text Recognition, Labeling, Blogging, Technical Writing, Content Writing, Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNN), Transformers, OpenAI GPT-3 API, Critical Thinking, Research, Information Systems, Economics, Prompt Engineering, Amazon RDS, Legal, ChatGPT Prompts

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring