Incomplete Data and Ignorability

Description

Most data sets encountered in machine learning and data mining mining are incomplete:
not all data records contain the values of each of the data attributes.
Most common statistical and machine learning techniques will only provide useful
results on incomplete data when the mechanism that causes certain attribute
values to be unobserved is ignorable, i.e., it needs not be represented explicitly
in the statistical model for the data. The standard way of obtaining ignorability
is via the missing at random assumption. We have conducted an in-depth study
of the connection between the missing at random (and similar) assumption, ignorability,
and explicit procedural models for mechanisms that cause missing
data. One result we obtained shows that the standard argument used
to infer ignorability from the missing at random assumption is incomplete, and
suitable additional assumptions on the data generating process may need to be
made in order to establish ignorability. We also have characterized natural types
of random mechanisms that will lead to missing at random data. These results
provide some natural criteria by which one can evaluate whether the missing at
random assumption is appropriate for a given data set.
StatusActive
Effective start/end date19/05/2010 → …

Funding

  • <ingen navn>

Fingerprint

Ignorability
Incomplete Data
Missing at Random
Machine Learning
Statistical Learning
Missing Data
Statistical Model
Mining
Data Mining
Attribute
Evaluate