Standard

Learning a decision maker's utility function from (possibly) inconsistent behavior. / Nielsen, Thomas Dyhre; Jensen, Finn Verner.

I: Artificial Intelligence, Vol. 160, Nr. 1-2, 2004, s. 53-78.

Publikation: Forskning - peer reviewTidsskriftartikel

Harvard

APA

CBE

MLA

Vancouver

Author

Nielsen, Thomas Dyhre; Jensen, Finn Verner / Learning a decision maker's utility function from (possibly) inconsistent behavior.

I: Artificial Intelligence, Vol. 160, Nr. 1-2, 2004, s. 53-78.

Publikation: Forskning - peer reviewTidsskriftartikel

Bibtex

@article{c908d360c7d211db86ee000ea68e967b,
title = "Learning a decision maker's utility function from (possibly) inconsistent behavior",
publisher = "Elsevier BV",
author = "Nielsen, {Thomas Dyhre} and Jensen, {Finn Verner}",
year = "2004",
volume = "160",
number = "1-2",
pages = "53--78",
journal = "Artificial Intelligence",
issn = "0004-3702",

}

RIS

TY - JOUR

T1 - Learning a decision maker's utility function from (possibly) inconsistent behavior

A1 - Nielsen,Thomas Dyhre

A1 - Jensen,Finn Verner

AU - Nielsen,Thomas Dyhre

AU - Jensen,Finn Verner

PB - Elsevier BV

PY - 2004

Y1 - 2004

N2 - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.

AB - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.

U2 - 10.1016/j.artint.2004.08.003

DO - 10.1016/j.artint.2004.08.003

JO - Artificial Intelligence

JF - Artificial Intelligence

SN - 0004-3702

IS - 1-2

VL - 160

SP - 53

EP - 78

ER -