Abstract
Combining spatio-temporal interest points with Bag-of-Words models achieves state-of-the-art performance in action recognition. However, existing methods based on “bag-ofwords” models either are too local to capture the variance in space/time or fail to solve the ambiguity problem in spatial and temporal dimensions. Instead, we propose a salient vocabulary construction algorithm to select visual words from a global point of view, and form compact descriptors to represent discriminative histograms in the neighborhoods. Those salient neighboring histograms are then trained to model different actions. Our approach yields a competitive result on the KTH dataset compare to state-of-the-art methods. On the more challenging UCF Sports dataset, we obtain 95.21%, which is approximately 4% better than the current best published results.
Original language | English |
---|---|
Title of host publication | IEEE International Conference on Image Processing |
Number of pages | 5 |
Publisher | IEEE Signal Processing Society |
Publication date | 2013 |
Pages | 2807-2811 |
ISBN (Print) | 978-1-4799-2341-0 |
Publication status | Published - 2013 |
Event | ICIP 2013: The International Conference on Image Processing 2013 (ICIP) - Melbourne, Australia Duration: 15 Sept 2013 → 18 Sept 2013 |
Conference
Conference | ICIP 2013 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 15/09/2013 → 18/09/2013 |
Keywords
- Salient visual words, neighboring histograms, action recognition