Incomplete Data and Ignorability



Most data sets encountered in machine learning and data mining mining are incomplete:
not all data records contain the values of each of the data attributes.
Most common statistical and machine learning techniques will only provide useful
results on incomplete data when the mechanism that causes certain attribute
values to be unobserved is ignorable, i.e., it needs not be represented explicitly
in the statistical model for the data. The standard way of obtaining ignorability
is via the missing at random assumption. We have conducted an in-depth study
of the connection between the missing at random (and similar) assumption, ignorability,
and explicit procedural models for mechanisms that cause missing
data. One result we obtained shows that the standard argument used
to infer ignorability from the missing at random assumption is incomplete, and
suitable additional assumptions on the data generating process may need to be
made in order to establish ignorability. We also have characterized natural types
of random mechanisms that will lead to missing at random data. These results
provide some natural criteria by which one can evaluate whether the missing at
random assumption is appropriate for a given data set.
Effektiv start/slut dato19/05/2010 → …


  • <ingen navn>


Udforsk forskningsemnerne, som dette projekt berører. Disse etiketter er oprettet på grundlag af de underliggende bevillinger/legater. Sammen danner de et unikt fingerprint.