This tar.gz file contains the replica of the author's GitHub repository which includes the data used for running the experiments on papers. Due to copyright restrictions two of the three graph citation networks are not in included. Please contact the author via email:
[email protected] for further information. The dataset holds:1. data and code files used to construct a multi-multi instance semi-synthetic dataset from the MNIST database of handwritten digits, available here: http://yann.lecun.com/exdb/mnist/; a training set of 60,000 examples, and a test set of 10,000 examples. Digits are organized in bags-of-bags of arbitrary cardinality. 2. data, image and code files used to construct a semi-synthetic dataset from MNIST, placing digits randomly into a background images of black pixels.3. example real citation network datasets where data can be naturally decomposed into bags-of-bags (MMI data) or bags (MI data). mmi.tar.gz can be uncompressed using standard compression utilities. Code files are in Python .py, .npy, .pyc formats, Linux shell executable files .sh and .json format, openly accessible via text edit software. README and other metadata files are provided in .md markdown language, .meta and .txt format at various levels of the folder architecture. Image files are provided in .idx3-ubyte file type: a simple format for vectors and multidimensional matrices of various numerical types. mmi Installation mmi uses the following dependencies: - numpy- TensorFlow -See installation instructions: https://www.tensorflow.org/install/ To install mmi, run the install command: python setup.py install You can also install mmmi from PyPI pip install mmi Run the demo Enter example_mnist folder and run python train_mmi_mnist.py BackgroundIn the associated paper, we study an extension of the multi-instance learning problem where examples are organized as nested bags of instances (e.g., a document could be represented as a bag of sentences, which in turn are bags of words). This framework can be useful in various scenarios, such as graph classification, image classification and translation-invariant pooling in convolutional neural network. In order to learn multi-multi instance data, we introduce a special neural network layer, called bag-layer, whose units aggregate sets of inputs of arbitrary size. We prove that the associated class of functions contains all Boolean functions over sets of sets of instances. We present empirical results on semi-synthetic data showing that such class of functions can be actually learned from data. We also present experiments on citation graphs datasets where our model obtains competitive results.