### Description

Most data sets encountered in machine learning and data mining mining are incomplete:

not all data records contain the values of each of the data attributes.

Most common statistical and machine learning techniques will only provide useful

results on incomplete data when the mechanism that causes certain attribute

values to be unobserved is ignorable, i.e., it needs not be represented explicitly

in the statistical model for the data. The standard way of obtaining ignorability

is via the missing at random assumption. We have conducted an in-depth study

of the connection between the missing at random (and similar) assumption, ignorability,

and explicit procedural models for mechanisms that cause missing

data. One result we obtained shows that the standard argument used

to infer ignorability from the missing at random assumption is incomplete, and

suitable additional assumptions on the data generating process may need to be

made in order to establish ignorability. We also have characterized natural types

of random mechanisms that will lead to missing at random data. These results

provide some natural criteria by which one can evaluate whether the missing at

random assumption is appropriate for a given data set.

not all data records contain the values of each of the data attributes.

Most common statistical and machine learning techniques will only provide useful

results on incomplete data when the mechanism that causes certain attribute

values to be unobserved is ignorable, i.e., it needs not be represented explicitly

in the statistical model for the data. The standard way of obtaining ignorability

is via the missing at random assumption. We have conducted an in-depth study

of the connection between the missing at random (and similar) assumption, ignorability,

and explicit procedural models for mechanisms that cause missing

data. One result we obtained shows that the standard argument used

to infer ignorability from the missing at random assumption is incomplete, and

suitable additional assumptions on the data generating process may need to be

made in order to establish ignorability. We also have characterized natural types

of random mechanisms that will lead to missing at random data. These results

provide some natural criteria by which one can evaluate whether the missing at

random assumption is appropriate for a given data set.

Status | Active |
---|---|

Effective start/end date | 19/05/2010 → … |

### Funding

- <ingen navn>

### Fingerprint

Ignorability

Incomplete Data

Missing at Random

Machine Learning

Statistical Learning

Missing Data

Statistical Model

Mining

Data Mining

Attribute

Evaluate