LibGuides: AI Verification Guide: AI Fundamentals

Understanding AI Content and Process

Understanding just a few things about how Generative Artificial Intelligence functions is crucial to identifying student writing created by one.

AI Datasets

Generative AIs are trained on datasets. There are datasets for the Large Language Model (LLM) AIs such as ChatGPT and datasets for the image AIs such as DALL-E.

Most of the AIs a student might use (of one type):

Share the same dataset - that's right, they have the same content but use it differently.
Have content that largely comes from a crawl of free internet material and Wikipedia.
Are limited in currency to the date the crawl was performed.
Don't necessarily have reliable content.
May have an intellectual property problem.
Are not able to trespass the firewalls of private data or paid subscription databases (YET).

**NOTE**
- Update from April 2024 -
The game is changing - some AI LLMs are now purchasing content from paid datasets. Most search engines now include some sort of AI (the quality varies) and soon there will be an AI presence in library resources as well.

A pie chart of ChatGPT's training dataset sources.