LLMs Cannot (Yet) Match the Specificity and Simplicity of Online Finance Communities in Long Form Question Answering

Kris Fillip Kahl, Tolga Buz, Russa Biswas, Gerard de Melo

Publikation: Bidrag til bog/antologi/rapport/konference proceedingKonferenceartikel i proceedingForskningpeer review

5 Downloads (Pure)

Abstract

Retail investing is on the rise, and a growing number of users are relying on online finance communities to educate themselves. However, as Large Language Models (LLMs) are increasingly viewed as powerful question-answering (QA) tools, users have shifted away from interacting in communities towards discourse with AI-driven conversational interfaces. Such AI tools are currently constrained by the availability of labelled data providing domain-specific financial knowledge. Therefore, in this work, we curate a QA preference dataset called SOCIALFINANCEQA for fine-tuning and aligning LLMs, extracted from more than 7.4 million submissions and 82 million comments from 2008 to 2022 in Reddit's 15 largest finance communities. Additionally, we propose the novel framework SOCIALQA-EVAL as a generally applicable method to evaluate generated QA responses. We evaluate various LLMs fine-tuned on this dataset, using traditional metrics, LLM-based evaluation, and human annotation. Our results demonstrate the value of high-quality Reddit data, with even state-of-the-art LLMs improving on producing simpler and more specific responses.

OriginalsprogEngelsk
TitelEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024
RedaktørerYaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Antal sider26
ForlagAssociation for Computational Linguistics
Publikationsdato2024
Sider2028-2053
ISBN (Elektronisk)9798891761681
StatusUdgivet - 2024
Begivenhed2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, USA
Varighed: 12 nov. 202416 nov. 2024

Konference

Konference2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
Land/OmrådeUSA
ByHybrid, Miami
Periode12/11/202416/11/2024
SponsorApple, Bloomberg, Citadel Securities, et al., Google DeepMind, Meta

Bibliografisk note

Publisher Copyright:
© 2024 Association for Computational Linguistics.

Fingeraftryk

Dyk ned i forskningsemnerne om 'LLMs Cannot (Yet) Match the Specificity and Simplicity of Online Finance Communities in Long Form Question Answering'. Sammen danner de et unikt fingeraftryk.

Citationsformater