Learning a decision maker's utility function from (possibly) inconsistent behavior
Publikation: Forskning - peer review › Tidsskriftartikel
Standard
Learning a decision maker's utility function from (possibly) inconsistent behavior. / Nielsen, Thomas Dyhre; Jensen, Finn Verner.
I: Artificial Intelligence, Vol. 160, Nr. 1-2, 2004, s. 53-78.Publikation: Forskning - peer review › Tidsskriftartikel
Harvard
APA
CBE
MLA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Learning a decision maker's utility function from (possibly) inconsistent behavior
A1 - Nielsen,Thomas Dyhre
A1 - Jensen,Finn Verner
AU - Nielsen,Thomas Dyhre
AU - Jensen,Finn Verner
PB - Elsevier BV
PY - 2004
Y1 - 2004
N2 - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.
AB - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.
U2 - 10.1016/j.artint.2004.08.003
DO - 10.1016/j.artint.2004.08.003
JO - Artificial Intelligence
JF - Artificial Intelligence
SN - 0004-3702
IS - 1-2
VL - 160
SP - 53
EP - 78
ER -