Bias
Generative AI tools reflect the biases present in their training data, which may originate from various sources including: data inputters, anyone providing data or content (personal bias), the origin of the data (machine bias), and the exclusion of underrepresented or marginalized communities (selection bias). Moreover, users may inadvertently reinforce their existing beliefs by rephrasing prompts until they receive the answer they most desire (confirmation bias). Generative AI tools amplify and reinforce these biases, and it is crucial to remain critical of outputs/responses.
Otherwise, a growing sense of comfort, acceptance, and trust in what these tools provide can transform into depending on the machine to provide answers, realizations, and/or decisions (automation bias). Refer to academic resources to verify information and analyze the company’s data collection policies or procedures.
This New York Times article demonstrates how the resources AI is trained on can lead to bias: See How Easily A.I. Chatbots Can Be Taught to Spew Disinformation
Limited Knowledge
Generative AI relies on pretrained models to reduce the computing power needed to create text, images, and other outputs. While pre-training makes tools faster and easier to use, it means that some tools do not have access to information in real time. For example, the free version of ChatGPT has studied text authored before September 2021 and has no context for events after that date. Even if a GPT is connected to a search engine, it can only link to sources available on the internet. As any historian will tell you, there are lots of sources that are not available digitally, and the internet is an expansive, but incomplete, representation of all human knowledge and creative works. Searching the library’s catalog offers more robust access to scholarly works.
Likewise, image and audio-based AI reflect the limits of their training sets. Image recognition tools that were trained on pictures from the early 2000s on might misidentify objects in photos from the 1890s.
Hallucinations
AI hallucinations occur when Generative AI tools produce incorrect, misleading, or nonexistent content. Remember that large language models, or LLMs, are trained on massive amounts of data to find patterns; they, in turn, use these patterns to predict words and then generate new content. The fabricated content is presented as though it is factual, which can make AI hallucinations difficult to identify. A common AI hallucination in higher education happens when users prompt text tools like ChatGPT or Gemini (previously Google Bard) to cite references or peer-reviewed sources. These tools scrape data that exists on this topic and create new titles, authors, and content that do not actually exist.
Image-based and sound-based AI is also susceptible to hallucination. Instead of putting together words that shouldn’t be together, generative AI adds pixels in a way that may not reflect the object that it’s trying to depict. This is why image generation tools add fingers to hands. The model can see that fingers have a particular pattern, but the generator does not understand the anatomy of a hand. Similarly, sound-based AI may add audible noise because it first adds pixels to a spectrogram, then takes that visualization and tries to translate it back into a smooth waveform.