George is available for hire

George McIntire

Verified Expert in Engineering

Data Scientists and Developer

Location

Berkeley, CA, United States

Toptal Member Since

January 31, 2024

George is a results-driven data scientist who brings a diverse skill set and rich experience to the table. His expertise lies in data translation and a versatile toolkit, including Python, SQL, and machine learning. George excels at distilling complex findings into actionable and comprehensible insights, making data accessible and impactful for stakeholders.

Machine Learning Natural Language Processing (NLP)Data Analysis Data Visualization Data Cleansing Data Analytics Clustering Web Scraping Data Mining Python Pandas Scikit-learn NumPy Deep Learning Large Language Models (LLMs)Streamlit Plotly

Portfolio

Self-employed

Large Language Models (LLMs), Data Analysis, Amazon Web Services (AWS)...

Twitter

Python 3, BigQuery, Natural Language Processing (NLP), Social Network Analysis...

Callisto Media

Jupyter Notebook, Plotly, Pandas, Scikit-learn, Python, Data Mining, NumPy...

Experience

Data Analysis - 8 years Data Visualization - 8 years Natural Language Processing (NLP) - 8 years Machine Learning - 8 years Python 3 - 6 years Large Language Models (LLMs) - 4 years Amazon Web Services (AWS) - 4 years Social Network Analysis - 3 years

Availability

Full-time

Preferred Environment

Jupyter Notebook, Python 3, Google BigQuery, GitHub, Amazon SageMaker, Amazon Web Services (AWS), SQL

The most amazing...

...solution I've created is a text classification and quote extraction pipeline for a nonprofit focused on analyzing how the media quotes men as opposed to women.

Work Experience

Data Science Consultant

2022 - PRESENT

Self-employed

Worked as the lead LLM developer for an app that uses a ChatGPT-powered LLM to generate custom infographics. I created a vector database for retrieval-augmented generation (RAG), and used Amazon DynamoDB to store user conversation history.
Consulted for a project that explores and tests data science methodologies for use in the legal profession. I fine-tuned and adapted ChatGPT to extract and analyze relevant information from case opinions to aid lawyers.
Leveraged unsupervised learning and sentence embeddings to analyze the survey results of healthcare workers.

Technologies: Large Language Models (LLMs), Data Analysis, Amazon Web Services (AWS), Web Scraping, Social Network Analysis, Natural Language Processing (NLP), Amazon DynamoDB, Machine Learning, Python, Predictive Modeling, Artificial Intelligence (AI), Data Science, Deep Learning, Neural Networks, TensorFlow, Clustering, Data Cleansing, ChatGPT, OpenAI, OpenAI GPT-4 API, Pandas, Scikit-learn, Plotly, PyTorch, Data Mining, Data Scraping, NumPy, Data Analytics, Streamlit, GPT, LangChain

Data Science Intern

2021 - 2021

Twitter

Conducted exploratory data analysis using BigQuery and named entity recognition on millions of tweets reported by users for perceived terms of service violations.
Detected networks that coordinated reporting actions by users maliciously targeting other users for banning using NetworkX and Neo4j.
Built an interactive dashboard with the results using Looker Studio, which was used by Twitter's health data science team members to inform allocating resources to combat malicious behavior.

Technologies: Python 3, BigQuery, Natural Language Processing (NLP), Social Network Analysis, Data Analysis, Neo4j, Pandas, Scikit-learn, Plotly, Python, Data Mining, NumPy, PostgreSQL, Google BigQuery, Dashboards, Data Analytics

Data Visualization Analyst

2018 - 2020

Callisto Media

Conducted data research projects using exploratory data analysis, statistical analysis, and machine learning.
Partnered with the marketing team to build an interactive dashboard using Plotly's Dash tool, allowing them to visualize important KPIs for various campaigns easily.
Designed a word-similarity mechanism that outputs a score that evaluates how similar two Amazon key phrases are to one another, helping the company evaluate the Amazon book market to decide which types of books to publish.
Used Word2Vec to automate a process that matches Amazon search terms with their appropriate categories designated by the Callisto taxonomy—a vital project to the company, given its reliance on Amazon search data.

Technologies: Jupyter Notebook, Plotly, Pandas, Scikit-learn, Python, Data Mining, NumPy, Dash, Dashboards, Data Analytics, Data Visualization

Experience

DataJockey

https://github.com/GeorgeMcIntire/DataJockey

DataJockey is a passion project that combines my two professions: data science and DJing. This project applies my data science expertise to analyzing my song collection. With machine learning's increasing ability to process, synthesize, and even generate music, I became inspired to dive in and see if big data algorithms could help me better understand my musical oeuvre and optimize my routine DJ activities.

Protect Nil LLM

I was the principal large language model (LLM) developer for an innovative application incorporating a ChatGPT-powered language model (LLM) to craft personalized infographics. I enhanced the ChatGPT model with memory capabilities by implementing a retrieval-augmented generation (RAG) approach by creating a vector database and leveraging DynamoDB to store user conversation history efficiently. I orchestrated the development of a comprehensive full-stack pipeline, encompassing a user-friendly front-end experience powered by a StreamLit app, seamless storage of user information using Amazon RDS, and efficient application deployment via Amazon EC2. My primary toolkit for LLM development was LangChain, and I significantly elevated model performance through adept, prompt engineering techniques. My experience in these endeavors showcases my proficiency in creating innovative and efficient solutions in LLM development and full-stack application deployment.

Skills

Languages

Python, Python 3, SQL

Libraries/APIs

Pandas, Scikit-learn, NumPy, PyTorch, TensorFlow

Tools

Plotly, GitHub, Amazon SageMaker, ChatGPT, BigQuery

Paradigms

Data Science

Other

Machine Learning, Natural Language Processing (NLP), Data Analysis, Web Scraping, Data Visualization, Clustering, Data Cleansing, Data Mining, Dashboards, Data Analytics, Google BigQuery, Writing & Editing, Social Network Analysis, Causal Inference, Surveying, Large Language Models (LLMs), Predictive Modeling, Artificial Intelligence (AI), Deep Learning, Neural Networks, Generative Pre-trained Transformers (GPT), OpenAI, OpenAI GPT-4 API, Chatbots, Data Scraping, Dash, GPT, LangChain, Pinecone, Critical Thinking, Research, Information Systems, Economics, Prompt Engineering, Amazon RDS

Frameworks

Streamlit

Storage

Databases, PostgreSQL, Amazon DynamoDB, Amazon S3 (AWS S3), Neo4j

Platforms

Jupyter Notebook, Amazon Web Services (AWS)

Education

2020 - 2022

Master's Degree in Information Systems

UC Berkeley School of Information - Berkeley, CA, USA

2007 - 2011

Bachelor's Degree in Economics

Occidental College - Los Angeles, CA, USA

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring