UC Irvine Study Finds Mismatch between Human Perception and Reliability of AI-Assisted Language Tools
Adding uncertainty phrasing to large language model responses can help users better gauge accuracy
As AI tools like ChatGPT become more mainstream in day-to-day tasks and decision-making processes, the ability to trust and decipher errors in their responses is critical. A new study by cognitive and computer scientists at the University of California, Irvine finds people generally overestimate the accuracy of large language model (LLM) outputs. But with some tweaks, says lead author Mark Steyvers, cognitive sciences professor and department chair, these tools can be trained to provide explanations that enable users to gauge uncertainty and better distinguish fact from fiction.
“There’s a disconnect between what LLMs know and what people think they know,” says Steyvers. “We call this the calibration gap. At the same time, there’s also a discrimination gap – how well humans and models can distinguish between correct and incorrect answers. Our study looks at how we can narrow these gaps.”
Findings, published online in Nature Machine Intelligence, are some of the first to explore how LLMs communicate uncertainty. The research team included cognitive sciences graduate students Heliodoro Tejeda, Xinyue Hu and Lukas Mayer; Aakriti Kumar, ’24 Ph.D.; and Sheer Karny, junior specialist. They were joined by Catarina Belem, graduate student, and Padhraic Smyth, Distinguished Professor and director of the Data Science Initiative from computer science.

Read more about this collaboration between researchers in the School of Social Sciences and the School of Information and Computer Sciences (ICS).