Abstract
Retail investing is on the rise, and a growing number of users are relying on online finance communities to educate themselves. However, as Large Language Models (LLMs) are increasingly viewed as powerful question-answering (QA) tools, users have shifted away from interacting in communities towards discourse with AI-driven conversational interfaces. Such AI tools are currently constrained by the availability of labelled data providing domain-specific financial knowledge. Therefore, in this work, we curate a QA preference dataset called SOCIALFINANCEQA for fine-tuning and aligning LLMs, extracted from more than 7.4 million submissions and 82 million comments from 2008 to 2022 in Reddit's 15 largest finance communities. Additionally, we propose the novel framework SOCIALQA-EVAL as a generally applicable method to evaluate generated QA responses. We evaluate various LLMs fine-tuned on this dataset, using traditional metrics, LLM-based evaluation, and human annotation. Our results demonstrate the value of high-quality Reddit data, with even state-of-the-art LLMs improving on producing simpler and more specific responses.
Originalsprog | Engelsk |
---|---|
Titel | EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024 |
Redaktører | Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen |
Antal sider | 26 |
Forlag | Association for Computational Linguistics |
Publikationsdato | 2024 |
Sider | 2028-2053 |
ISBN (Elektronisk) | 9798891761681 |
Status | Udgivet - 2024 |
Begivenhed | 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 - Hybrid, Miami, USA Varighed: 12 nov. 2024 → 16 nov. 2024 |
Konference
Konference | 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024 |
---|---|
Land/Område | USA |
By | Hybrid, Miami |
Periode | 12/11/2024 → 16/11/2024 |
Sponsor | Apple, Bloomberg, Citadel Securities, et al., Google DeepMind, Meta |
Bibliografisk note
Publisher Copyright:© 2024 Association for Computational Linguistics.