### Resumé

When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.

Udgivelsesdato: DEC

Originalsprog | Engelsk |
---|---|

Tidsskrift | Artificial Intelligence |

Vol/bind | 160 |

Udgave nummer | 1-2 |

Sider (fra-til) | 53-78 |

Antal sider | 25 |

ISSN | 0004-3702 |

DOI | |

Status | Udgivet - 2004 |

### Fingerprint

### Citer dette

}

*Artificial Intelligence*, bind 160, nr. 1-2, s. 53-78. https://doi.org/10.1016/j.artint.2004.08.003

**Learning a decision maker's utility function from (possibly) inconsistent behavior.** / Nielsen, Thomas Dyhre; Jensen, Finn Verner.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › peer review

TY - JOUR

T1 - Learning a decision maker's utility function from (possibly) inconsistent behavior

AU - Nielsen, Thomas Dyhre

AU - Jensen, Finn Verner

PY - 2004

Y1 - 2004

N2 - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.

AB - When modeling a decision problem using the influence diagram framework, thequantitative part rests on two principal components: probabilities forrepresenting the decision maker's uncertainty about the domain andutilities for representing preferences. Over the last decade, several methodshave been developed for learning the probabilities from a database.However, methods for learning the utilities have only received limitedattention in the computer science community. A promising approach for learning a decision maker's utility function is to takeoutset in the decision maker's observed behavioral patterns, and then find autility function which (together with a domain model) can explainthis behavior. That is, it is assumed that decision maker's preferences arereflected in the behavior. Standard learning algorithmsalso assume that the decision maker is behavioralconsistent, i.e., given a model ofthe decision problem, there exists a utility function which canaccount for all the observed behavior. Unfortunately, this assumption israrely valid in real-world decision problems, and in these situationsexisting learning methods may only identify a trivial utilityfunction. In this paper we relax this consistency assumption, and propose twoalgorithms for learning a decision maker's utility function frompossibly inconsistent behavior; inconsistent behavior is interpreted as random deviations from an underlying (true) utilityfunction. The main difference between the twoalgorithms is that the first facilitates a form of batch learningwhereas the second focuses on adaptation and is particularlywell-suited for scenarios where the DM's preferences change over time.Empirical results demonstrate the tractability of thealgorithms, and they also show that the algorithms converge toward the trueutility function for even very small sets of observations.

U2 - 10.1016/j.artint.2004.08.003

DO - 10.1016/j.artint.2004.08.003

M3 - Journal article

VL - 160

SP - 53

EP - 78

JO - Artificial Intelligence

JF - Artificial Intelligence

SN - 0004-3702

IS - 1-2

ER -