Uncertainty estimation is a promising approach to detect hallucinations in large language models (LLMs). Recent approaches commonly depend on model internal states to estimate uncertainty. However, they suffer from strict assumptions on how hidden states should evolve across layers, and from information loss by solely focusing on last or mean token. To address these issues, we present Sequential Internal Variance Representation (SIVR), a supervised hallucination detection framework that leverages token-wise, layer-wise features derived from hidden states. SIVR adopts a more basic assumption that uncertainty manifests in the degree of dispersion or variance of internal representations across layers, rather than relying on specific assumptions, which makes the method model and task agnostic. It additionally aggregates the full sequence of per-token variance features, learning temporal patterns indicative of factual error and thereby preventing information loss. Experimental results demonstrate SIVR consistently outperforms strong baselines. Most importantly, SIVR enjoys stronger generalisation and avoids relying on large training sets, highlighting the potential for practical deployment.
@inproceedings{srey-etal-2026-learning,title={Learning Uncertainty from Sequential Internal Dispersion in Large Language Models},author={Srey, Ponhvoan and Wu, Xiaobao and Nguyen, Cong-Duy and Luu, Anh Tuan},booktitle={Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics},month=jul,year={2026},url={https://arxiv.org/abs/2604.15741},publisher={Association for Computational Linguistics},}
2025
EMNLP
Unsupervised Hallucination Detection by Inspecting Reasoning Processes
Ponhvoan Srey, Xiaobao Wu, and Anh Tuan Luu
In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Nov 2025
Unsupervised hallucination detection aims to identify hallucinated content generated by large language models (LLMs) without relying on labeled data. While unsupervised methods have gained popularity by eliminating labor-intensive human annotations, they frequently rely on proxy signals unrelated to factual correctness. This misalignment biases detection probes toward superficial or non-truth-related aspects, limiting generalizability across datasets and scenarios. To overcome these limitations, we propose IRIS, an unsupervised hallucination detection framework, leveraging internal representations intrinsic to factual correctness. IRIS prompts the LLM to carefully verify the truthfulness of a given statement, and obtain its contextualized embedding as informative features for training. Meanwhile, the uncertainty of each response is considered a soft pseudolabel for truthfulness. Experimental results demonstrate that IRIS consistently outperforms existing unsupervised methods. Our approach is fully unsupervised, computationally low cost, and works well even with few training data, making it suitable for real-time detection.
@inproceedings{srey-etal-2025-unsupervised,title={Unsupervised Hallucination Detection by Inspecting Reasoning Processes},author={Srey, Ponhvoan and Wu, Xiaobao and Luu, Anh Tuan},editor={Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet},booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},month=nov,year={2025},address={Suzhou, China},publisher={Association for Computational Linguistics},url={https://aclanthology.org/2025.emnlp-main.1124/},pages={22117--22129},isbn={979-8-89176-332-6},}
MLJ
Uncover and unlearn nuisances: agnostic fully test-time adaptation
Ponhvoan Srey*, Yaxin Shi*, Hangwei Qian, and 2 more authors