
Large Language Models (LLMs) have become integral to a wide range of applications, raising concerns about their tendency to generate hallucinated content and exhibit biases inherited from training data. While prior research has examined hallucination behavior across different AI models, less attention has been given to how these limitations align with human perceptions of bias and trust. This paper presents a comparative review of existing research on hallucinations in contemporary LLMs, synthesizing findings across multiple studies to identify common trends, evaluation approaches, and reported limitations. In parallel, a human perception study examines how users interpret and judge bias, reliability, and trustworthiness in AI-generated outputs. Participants provide subjective assessments of perceived bias and confidence in model responses, enabling comparison with conclusions drawn in prior technical literature. The findings reveal a clear divergence between empirically reported hallucination behavior and user perception. Models identified as having lower hallucination tendencies are not consistently perceived as less biased or more trustworthy. Instead, fluent and confident responses often lead to higher perceived reliability, regardless of documented limitations. This highlights a disconnect between technical evaluation and human judgment. This study emphasizes integrating human-centered perspectives into LLM evaluation and underscores the need for transparency, clearer communication of limitations, and trust-aware deployment.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
