Bias-aware lexicon-based sentiment analysis

Mohsin Iqbal; Asim Karim; Faisal Kamiran

Bias-aware lexicon-based sentiment analysis

Mohsin Iqbal, Asim Karim, Faisal Kamiran

Publikation: Bidrag til bog/antologi/rapport/konference proceeding › Konferenceartikel i proceeding › Forskning › peer review

15 Citationer (Scopus)

Abstract

Sentiment analysis of textual content is widely used for automatic summarization of opinions and sentiments expressed by people. With the growing popularity of social media and user-generated content, efficient and effective sentiment analysis is critical to businesses and governments. Lexicon-based methods provide efficiency through their manually developed affective word lists and valence values. However, the predictions of such methods can be biased towards positive or negative polarity thus distorting the analysis. In this paper, we propose Bias-Aware Thresholding (BAT), an approach that can be combined with any lexicon-based method to make it bias-aware. BAT is motivated from cost-sensitive learning where the prediction threshold is changed to reduce prediction error bias. We formally define bias in polarity predictions and present a measure for quantifying it. We evaluate BAT in combination with AFINN and SentiStrength -- two popular lexicon-based methods -- on seven real-world datasets. The results show that bias reduces smoothly with an increase in the absolute value of the threshold, and accuracy increases as well in most cases. We demonstrate that the threshold can be learned reliably from a very small number of labeled examples, and supervised classifiers learned on such small datasets produce poorer bias and accuracy performances.

Originalsprog	Engelsk
Titel	Proceedings of the 30th Annual ACM Symposium on Applied Computing
Antal sider	6
Publikationsdato	apr. 2015
Sider	845–850
Status	Udgivet - apr. 2015
Udgivet eksternt	Ja

AUB Link

Søg efter materialet i Aalborg Universitetsbiblioteks søgemaskine

Citationsformater

@inproceedings{122379133a104e14b3a90f0d64911522,

title = "Bias-aware lexicon-based sentiment analysis",

abstract = "Sentiment analysis of textual content is widely used for automatic summarization of opinions and sentiments expressed by people. With the growing popularity of social media and user-generated content, efficient and effective sentiment analysis is critical to businesses and governments. Lexicon-based methods provide efficiency through their manually developed affective word lists and valence values. However, the predictions of such methods can be biased towards positive or negative polarity thus distorting the analysis. In this paper, we propose Bias-Aware Thresholding (BAT), an approach that can be combined with any lexicon-based method to make it bias-aware. BAT is motivated from cost-sensitive learning where the prediction threshold is changed to reduce prediction error bias. We formally define bias in polarity predictions and present a measure for quantifying it. We evaluate BAT in combination with AFINN and SentiStrength -- two popular lexicon-based methods -- on seven real-world datasets. The results show that bias reduces smoothly with an increase in the absolute value of the threshold, and accuracy increases as well in most cases. We demonstrate that the threshold can be learned reliably from a very small number of labeled examples, and supervised classifiers learned on such small datasets produce poorer bias and accuracy performances.",

author = "Mohsin Iqbal and Asim Karim and Faisal Kamiran",

year = "2015",

month = apr,

language = "English",

pages = "845–850",

booktitle = "Proceedings of the 30th Annual ACM Symposium on Applied Computing",

}

TY - GEN

T1 - Bias-aware lexicon-based sentiment analysis

AU - Iqbal, Mohsin

AU - Karim, Asim

AU - Kamiran, Faisal

PY - 2015/4

Y1 - 2015/4

N2 - Sentiment analysis of textual content is widely used for automatic summarization of opinions and sentiments expressed by people. With the growing popularity of social media and user-generated content, efficient and effective sentiment analysis is critical to businesses and governments. Lexicon-based methods provide efficiency through their manually developed affective word lists and valence values. However, the predictions of such methods can be biased towards positive or negative polarity thus distorting the analysis. In this paper, we propose Bias-Aware Thresholding (BAT), an approach that can be combined with any lexicon-based method to make it bias-aware. BAT is motivated from cost-sensitive learning where the prediction threshold is changed to reduce prediction error bias. We formally define bias in polarity predictions and present a measure for quantifying it. We evaluate BAT in combination with AFINN and SentiStrength -- two popular lexicon-based methods -- on seven real-world datasets. The results show that bias reduces smoothly with an increase in the absolute value of the threshold, and accuracy increases as well in most cases. We demonstrate that the threshold can be learned reliably from a very small number of labeled examples, and supervised classifiers learned on such small datasets produce poorer bias and accuracy performances.

AB - Sentiment analysis of textual content is widely used for automatic summarization of opinions and sentiments expressed by people. With the growing popularity of social media and user-generated content, efficient and effective sentiment analysis is critical to businesses and governments. Lexicon-based methods provide efficiency through their manually developed affective word lists and valence values. However, the predictions of such methods can be biased towards positive or negative polarity thus distorting the analysis. In this paper, we propose Bias-Aware Thresholding (BAT), an approach that can be combined with any lexicon-based method to make it bias-aware. BAT is motivated from cost-sensitive learning where the prediction threshold is changed to reduce prediction error bias. We formally define bias in polarity predictions and present a measure for quantifying it. We evaluate BAT in combination with AFINN and SentiStrength -- two popular lexicon-based methods -- on seven real-world datasets. The results show that bias reduces smoothly with an increase in the absolute value of the threshold, and accuracy increases as well in most cases. We demonstrate that the threshold can be learned reliably from a very small number of labeled examples, and supervised classifiers learned on such small datasets produce poorer bias and accuracy performances.

M3 - Article in proceeding

SP - 845

EP - 850

BT - Proceedings of the 30th Annual ACM Symposium on Applied Computing

ER -

Bias-aware lexicon-based sentiment analysis

Abstract

AUB Link

Fingeraftryk

Citationsformater