Contextual-guided Bag-of-visual-words model for Multi-class object categorization

Mehdi Mirza-Mohammadi; Sergio Escalera; Petia Radeva

doi:10.1007/978-3-642-03767-2_91

Contextual-guided Bag-of-visual-words model for Multi-class object categorization

Mehdi Mirza-Mohammadi^*, Sergio Escalera, Petia Radeva

^*Corresponding author for this work

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

9 Citations (Scopus)

Abstract

Bag-of-words model (BOW) is inspired by the text classification problem, where a document is represented by an unsorted set of contained words. Analogously, in the object categorization problem, an image is represented by an unsorted set of discrete visual words (BOVW). In these models, relations among visual words are performed after dictionary construction. However, close object regions can have far descriptions in the feature space, being grouped as different visual words. In this paper, we present a method for considering geometrical information of visual words in the dictionary construction step. Object interest regions are obtained by means of the Harris-Affine detector and then described using the SIFT descriptor. Afterward, a contextual-space and a feature-space are defined, and a merging process is used to fuse feature words based on their proximity in the contextual-space. Moreover, we use the Error Correcting Output Codes framework to learn the new dictionary in order to perform multi-class classification. Results show significant classification improvements when spatial information is taken into account in the dictionary construction step.

Original language	English
Title of host publication	Computer Analysis of Images and Patterns - 13th International Conference, CAIP 2009, Proceedings
Number of pages	9
Publication date	2009
Pages	748-756
ISBN (Print)	3642037666, 9783642037665
DOIs	https://doi.org/10.1007/978-3-642-03767-2_91
Publication status	Published - 2009
Externally published	Yes
Event	13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009 - Munster, Germany Duration: 2 Sept 2009 → 4 Sept 2009

Conference

Conference	13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009
Country/Territory	Germany
City	Munster
Period	02/09/2009 → 04/09/2009
Sponsor	University of Münster, International Association for Pattern Recognition, Olympus Soft Imaging Solutions GmbH, Philips

Series	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	5702 LNCS
ISSN	0302-9743

Access to Document

10.1007/978-3-642-03767-2_91

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

Mirza-Mohammadi, Mehdi ; Escalera, Sergio ; Radeva, Petia. / Contextual-guided Bag-of-visual-words model for Multi-class object categorization. Computer Analysis of Images and Patterns - 13th International Conference, CAIP 2009, Proceedings. 2009. pp. 748-756 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 5702 LNCS).

@inproceedings{9cc2978eae5546c58d2cb1afd6c7ec9d,

title = "Contextual-guided Bag-of-visual-words model for Multi-class object categorization",

abstract = "Bag-of-words model (BOW) is inspired by the text classification problem, where a document is represented by an unsorted set of contained words. Analogously, in the object categorization problem, an image is represented by an unsorted set of discrete visual words (BOVW). In these models, relations among visual words are performed after dictionary construction. However, close object regions can have far descriptions in the feature space, being grouped as different visual words. In this paper, we present a method for considering geometrical information of visual words in the dictionary construction step. Object interest regions are obtained by means of the Harris-Affine detector and then described using the SIFT descriptor. Afterward, a contextual-space and a feature-space are defined, and a merging process is used to fuse feature words based on their proximity in the contextual-space. Moreover, we use the Error Correcting Output Codes framework to learn the new dictionary in order to perform multi-class classification. Results show significant classification improvements when spatial information is taken into account in the dictionary construction step.",

author = "Mehdi Mirza-Mohammadi and Sergio Escalera and Petia Radeva",

year = "2009",

doi = "10.1007/978-3-642-03767-2_91",

language = "English",

isbn = "3642037666",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Physica-Verlag",

pages = "748--756",

booktitle = "Computer Analysis of Images and Patterns - 13th International Conference, CAIP 2009, Proceedings",

note = "13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009 ; Conference date: 02-09-2009 Through 04-09-2009",

}

Mirza-Mohammadi, M, Escalera, S & Radeva, P 2009, Contextual-guided Bag-of-visual-words model for Multi-class object categorization. in Computer Analysis of Images and Patterns - 13th International Conference, CAIP 2009, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5702 LNCS, pp. 748-756, 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009, Munster, Germany, 02/09/2009. https://doi.org/10.1007/978-3-642-03767-2_91

Contextual-guided Bag-of-visual-words model for Multi-class object categorization. / Mirza-Mohammadi, Mehdi; Escalera, Sergio; Radeva, Petia.
Computer Analysis of Images and Patterns - 13th International Conference, CAIP 2009, Proceedings. 2009. p. 748-756 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 5702 LNCS).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Contextual-guided Bag-of-visual-words model for Multi-class object categorization

AU - Mirza-Mohammadi, Mehdi

AU - Escalera, Sergio

AU - Radeva, Petia

PY - 2009

Y1 - 2009

N2 - Bag-of-words model (BOW) is inspired by the text classification problem, where a document is represented by an unsorted set of contained words. Analogously, in the object categorization problem, an image is represented by an unsorted set of discrete visual words (BOVW). In these models, relations among visual words are performed after dictionary construction. However, close object regions can have far descriptions in the feature space, being grouped as different visual words. In this paper, we present a method for considering geometrical information of visual words in the dictionary construction step. Object interest regions are obtained by means of the Harris-Affine detector and then described using the SIFT descriptor. Afterward, a contextual-space and a feature-space are defined, and a merging process is used to fuse feature words based on their proximity in the contextual-space. Moreover, we use the Error Correcting Output Codes framework to learn the new dictionary in order to perform multi-class classification. Results show significant classification improvements when spatial information is taken into account in the dictionary construction step.

AB - Bag-of-words model (BOW) is inspired by the text classification problem, where a document is represented by an unsorted set of contained words. Analogously, in the object categorization problem, an image is represented by an unsorted set of discrete visual words (BOVW). In these models, relations among visual words are performed after dictionary construction. However, close object regions can have far descriptions in the feature space, being grouped as different visual words. In this paper, we present a method for considering geometrical information of visual words in the dictionary construction step. Object interest regions are obtained by means of the Harris-Affine detector and then described using the SIFT descriptor. Afterward, a contextual-space and a feature-space are defined, and a merging process is used to fuse feature words based on their proximity in the contextual-space. Moreover, we use the Error Correcting Output Codes framework to learn the new dictionary in order to perform multi-class classification. Results show significant classification improvements when spatial information is taken into account in the dictionary construction step.

UR - http://www.scopus.com/inward/record.url?scp=70349335789&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-03767-2_91

DO - 10.1007/978-3-642-03767-2_91

M3 - Article in proceeding

AN - SCOPUS:70349335789

SN - 3642037666

SN - 9783642037665

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 748

EP - 756

BT - Computer Analysis of Images and Patterns - 13th International Conference, CAIP 2009, Proceedings

T2 - 13th International Conference on Computer Analysis of Images and Patterns, CAIP 2009

Y2 - 2 September 2009 through 4 September 2009

ER -

Mirza-Mohammadi M, Escalera S, Radeva P. Contextual-guided Bag-of-visual-words model for Multi-class object categorization. In Computer Analysis of Images and Patterns - 13th International Conference, CAIP 2009, Proceedings. 2009. p. 748-756. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 5702 LNCS). doi: 10.1007/978-3-642-03767-2_91

Contextual-guided Bag-of-visual-words model for Multi-class object categorization

Abstract

Conference

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this