Balancing Prediction Errors for Robust Sentiment Classification

Mohsin Iqbal; Asim Karim; Faisal Kamiran

doi:10.1145/3328795

Balancing Prediction Errors for Robust Sentiment Classification

Mohsin Iqbal, Asim Karim, Faisal Kamiran

Research output: Contribution to journal › Journal article › Research › peer-review

14 Citations (Scopus)

Abstract

Sentiment classification is a popular text mining task in which textual content (e.g., a message) is assigned a polarity label (typically positive or negative) reflecting the sentiment expressed in it. Sentiment classification is used widely in applications like customer feedback analysis where robustness and correctness of results are critical. In this article, we highlight that prediction accuracy alone is not sufficient for assessing the performance of a sentiment classifier; it is also important that the classifier is not biased toward positive or negative polarity, thus distorting the distribution of positive and negative messages in the predictions. We propose a measure, called Polarity Bias Rate, for quantifying this bias in a sentiment classifier. Second, we present two methods for removing this bias in the predictions of unsupervised and supervised sentiment classifiers. Our first method, called Bias-Aware Thresholding (BAT), shifts the decision boundary to control the bias in the predictions. Motivated from cost-sensitive learning, BAT is easily applicable to both lexicon-based unsupervised and supervised classifiers. Our second method, called Balanced Logistic Regression (BLR) introduces a bias-remover constraint into the standard logistic regression model. BLR is an automatic bias-free supervised sentiment classifier.

We evaluate our methods extensively on seven real-world datasets. The experiments involve two lexicon-based and two supervised sentiment classifiers and include evaluation on multiple train-test data sizes. The results show that bias is controlled effectively in predictions. Furthermore, prediction accuracy is also increased in many cases, thus enhancing the robustness of sentiment classification.

Original language	English
Article number	33
Journal	ACM Transactions on Knowledge Discovery from Data
Volume	13
Issue number	3
Number of pages	21
ISSN	1556-4681
DOIs	https://doi.org/10.1145/3328795
Publication status	Published - Jul 2019
Externally published	Yes

Access to Document

10.1145/3328795

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{3dd27da03fad4e80b4e0373f85b2c971,

title = "Balancing Prediction Errors for Robust Sentiment Classification",

abstract = "Sentiment classification is a popular text mining task in which textual content (e.g., a message) is assigned a polarity label (typically positive or negative) reflecting the sentiment expressed in it. Sentiment classification is used widely in applications like customer feedback analysis where robustness and correctness of results are critical. In this article, we highlight that prediction accuracy alone is not sufficient for assessing the performance of a sentiment classifier; it is also important that the classifier is not biased toward positive or negative polarity, thus distorting the distribution of positive and negative messages in the predictions. We propose a measure, called Polarity Bias Rate, for quantifying this bias in a sentiment classifier. Second, we present two methods for removing this bias in the predictions of unsupervised and supervised sentiment classifiers. Our first method, called Bias-Aware Thresholding (BAT), shifts the decision boundary to control the bias in the predictions. Motivated from cost-sensitive learning, BAT is easily applicable to both lexicon-based unsupervised and supervised classifiers. Our second method, called Balanced Logistic Regression (BLR) introduces a bias-remover constraint into the standard logistic regression model. BLR is an automatic bias-free supervised sentiment classifier.We evaluate our methods extensively on seven real-world datasets. The experiments involve two lexicon-based and two supervised sentiment classifiers and include evaluation on multiple train-test data sizes. The results show that bias is controlled effectively in predictions. Furthermore, prediction accuracy is also increased in many cases, thus enhancing the robustness of sentiment classification.",

author = "Mohsin Iqbal and Asim Karim and Faisal Kamiran",

year = "2019",

month = jul,

doi = "10.1145/3328795",

language = "English",

volume = "13",

journal = "ACM Transactions on Knowledge Discovery from Data",

issn = "1556-4681",

publisher = "Association for Computing Machinery",

number = "3",

}

TY - JOUR

T1 - Balancing Prediction Errors for Robust Sentiment Classification

AU - Iqbal, Mohsin

AU - Karim, Asim

AU - Kamiran, Faisal

PY - 2019/7

Y1 - 2019/7

N2 - Sentiment classification is a popular text mining task in which textual content (e.g., a message) is assigned a polarity label (typically positive or negative) reflecting the sentiment expressed in it. Sentiment classification is used widely in applications like customer feedback analysis where robustness and correctness of results are critical. In this article, we highlight that prediction accuracy alone is not sufficient for assessing the performance of a sentiment classifier; it is also important that the classifier is not biased toward positive or negative polarity, thus distorting the distribution of positive and negative messages in the predictions. We propose a measure, called Polarity Bias Rate, for quantifying this bias in a sentiment classifier. Second, we present two methods for removing this bias in the predictions of unsupervised and supervised sentiment classifiers. Our first method, called Bias-Aware Thresholding (BAT), shifts the decision boundary to control the bias in the predictions. Motivated from cost-sensitive learning, BAT is easily applicable to both lexicon-based unsupervised and supervised classifiers. Our second method, called Balanced Logistic Regression (BLR) introduces a bias-remover constraint into the standard logistic regression model. BLR is an automatic bias-free supervised sentiment classifier.We evaluate our methods extensively on seven real-world datasets. The experiments involve two lexicon-based and two supervised sentiment classifiers and include evaluation on multiple train-test data sizes. The results show that bias is controlled effectively in predictions. Furthermore, prediction accuracy is also increased in many cases, thus enhancing the robustness of sentiment classification.

AB - Sentiment classification is a popular text mining task in which textual content (e.g., a message) is assigned a polarity label (typically positive or negative) reflecting the sentiment expressed in it. Sentiment classification is used widely in applications like customer feedback analysis where robustness and correctness of results are critical. In this article, we highlight that prediction accuracy alone is not sufficient for assessing the performance of a sentiment classifier; it is also important that the classifier is not biased toward positive or negative polarity, thus distorting the distribution of positive and negative messages in the predictions. We propose a measure, called Polarity Bias Rate, for quantifying this bias in a sentiment classifier. Second, we present two methods for removing this bias in the predictions of unsupervised and supervised sentiment classifiers. Our first method, called Bias-Aware Thresholding (BAT), shifts the decision boundary to control the bias in the predictions. Motivated from cost-sensitive learning, BAT is easily applicable to both lexicon-based unsupervised and supervised classifiers. Our second method, called Balanced Logistic Regression (BLR) introduces a bias-remover constraint into the standard logistic regression model. BLR is an automatic bias-free supervised sentiment classifier.We evaluate our methods extensively on seven real-world datasets. The experiments involve two lexicon-based and two supervised sentiment classifiers and include evaluation on multiple train-test data sizes. The results show that bias is controlled effectively in predictions. Furthermore, prediction accuracy is also increased in many cases, thus enhancing the robustness of sentiment classification.

U2 - 10.1145/3328795

DO - 10.1145/3328795

M3 - Journal article

SN - 1556-4681

VL - 13

JO - ACM Transactions on Knowledge Discovery from Data

JF - ACM Transactions on Knowledge Discovery from Data

IS - 3

M1 - 33

ER -

Balancing Prediction Errors for Robust Sentiment Classification

Abstract

Access to Document

AUB Link

Fingerprint

Cite this