Grounding Gaps in Language Model Generations
Abstract
Effective conversation requires common ground: a shared understanding between the participants. Common ground, however, does not emerge spontaneously in conversation. Speakers and listeners work together to both identify and construct a shared basis while avoiding misunderstanding. To accomplish grounding, humans rely on a range of dialogue acts, like clarification (What do you mean?) and acknowledgment (I understand.). In domains like teaching and emotional support, carefully constructing grounding prevents misunderstanding. However, it is unclear whether large language models (LLMs) leverage these dialogue acts in constructing common ground. To this end, we curate a set of grounding acts and propose corresponding metrics that quantify attempted grounding. We study whether LLMs use these grounding acts, simulating them taking turns from several dialogue datasets, and comparing the results to humans. We find that current LLMs are presumptive grounders, biased towards assuming common ground without using grounding acts. To understand the roots of this behavior, we examine the role of instruction tuning and reinforcement learning with human feedback (RLHF), finding that RLHF leads to less grounding. Altogether, our work highlights the need for more research investigating grounding in human-AI interaction.
Materials
BibTeX
@misc{shaikh2024grounding,
title={Grounding Gaps in Language Model Generations},
author={Omar Shaikh and Kristina Gligorić and Ashna Khetan and Matthias Gerstgrasser and Diyi Yang and Dan Jurafsky},
year={2024},
eprint={2311.09144},
archivePrefix={arXiv},
primaryClass={cs.CL}
}