Abstract
The rapidly increasing popularity of LLM-powered chatbots has led to them being used for a increasing number of different tasks by the general public. One of these tasks is searching for information instead of using a search engine. Previous work has shown that complex search tasks can be problematic for traditional search engines to solve, but little is known about the capability of LLMs on the same task. We compared four LLMs on their capability to answer a specific type of complex search task: known-item requests from casual leisure domains. We constructed a test collection by gathering known-item requests for books, games and movies from online forums along with verified answers by the original requester. We prompted four LLMs multiple times with the same prompt and analyzed the results with respect to accuracy and the degree to which answers were fabricated by the LLM. Our results show that
LLMs are not particularly effective in fulfilling these complex casual leisure needs, but there are are big differences between LLMs and across domains.
LLMs are not particularly effective in fulfilling these complex casual leisure needs, but there are are big differences between LLMs and across domains.
Originalsprog | Engelsk |
---|---|
Titel | CHIIR 2025: Proceedings of the 2025 Conference on Human Information Interaction and Retrieval |
Status | Accepteret/In press - 2025 |