Learning a decision maker's utility function from (possibly) inconsistent behavior

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

24 Citationer (Scopus)

Resumé

When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.
Udgivelsesdato: DEC

OriginalsprogEngelsk
TidsskriftArtificial Intelligence
Vol/bind160
Udgave nummer1-2
Sider (fra-til)53-78
Antal sider25
ISSN0004-3702
DOI
StatusUdgivet - 2004

Fingerprint

decision maker
learning
decision model
Computer science
learning method
computer science
uncertainty
scenario
community

Citer dette

@article{c908d360c7d211db86ee000ea68e967b,
title = "Learning a decision maker's utility function from (possibly) inconsistent behavior",
abstract = "When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.",
author = "Nielsen, {Thomas Dyhre} and Jensen, {Finn Verner}",
year = "2004",
doi = "10.1016/j.artint.2004.08.003",
language = "English",
volume = "160",
pages = "53--78",
journal = "Artificial Intelligence",
issn = "0004-3702",
publisher = "Elsevier",
number = "1-2",

}

Learning a decision maker's utility function from (possibly) inconsistent behavior. / Nielsen, Thomas Dyhre; Jensen, Finn Verner.

I: Artificial Intelligence, Bind 160, Nr. 1-2, 2004, s. 53-78.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

TY - JOUR

T1 - Learning a decision maker's utility function from (possibly) inconsistent behavior

AU - Nielsen, Thomas Dyhre

AU - Jensen, Finn Verner

PY - 2004

Y1 - 2004

N2 - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.

AB - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.

U2 - 10.1016/j.artint.2004.08.003

DO - 10.1016/j.artint.2004.08.003

M3 - Journal article

VL - 160

SP - 53

EP - 78

JO - Artificial Intelligence

JF - Artificial Intelligence

SN - 0004-3702

IS - 1-2

ER -