Skip to main content

UC Irvine Study Finds Mismatch between Human Perception and Reliability of AI-Assisted Language Tools

Adding uncertainty phrasing to large language model responses can help users better gauge accuracy

As AI tools like ChatGPT become more mainstream in day-to-day tasks and decision-making processes, the ability to trust and decipher errors in their responses is critical. A new study by cognitive and computer scientists at the University of California, Irvine finds people generally overestimate the accuracy of large language model (LLM) outputs. But with some tweaks, says lead author Mark Steyvers, cognitive sciences professor and department chair, these tools can be trained to provide explanations that enable users to gauge uncertainty and better distinguish fact from fiction.

“There’s a disconnect between what LLMs know and what people think they know,” says Steyvers. “We call this the calibration gap. At the same time, there’s also a discrimination gap – how well humans and models can distinguish between correct and incorrect answers. Our study looks at how we can narrow these gaps.”

Findings, published online in Nature Machine Intelligence, are some of the first to explore how LLMs communicate uncertainty. The research team included cognitive sciences graduate students Heliodoro Tejeda, Xinyue Hu and Lukas Mayer; Aakriti Kumar, ’24 Ph.D.; and Sheer Karny, junior specialist. They were joined by Catarina Belem, graduate student, and Padhraic Smyth, Distinguished Professor and director of the Data Science Initiative from computer science.

Padhraic Smyth (left) and Mark Steyvers stand outside of Donald Bren Hall. (Steve Zylius/UCI)
Padhraic Smyth (left) and Mark Steyvers stand outside of Donald Bren Hall. (Steve Zylius/UCI)

Read more about this collaboration between researchers in the School of Social Sciences and the School of Information and Computer Sciences (ICS).

Skip to content