Abstract
In this paper we show a new method of using automatic age
and gender recognition to recommend a sequence of multimedia items to a home TV audience comprising multiple viewers.
Instead of relying on explicitly provided demographic data for
each user, we define an audio-based demographic group profile
that captures the age and gender for all members of the audience. A 7-class age and gender classifier employing a fusion
of acoustic and prosodic features determines the probability of
each speaker belonging to each class. The information for all
speakers is then combined to form the group profile, which itself is the input to a recommender system. The recommender
system finds the content items whose demographics best match
the group profile. We tested the effectiveness of the system for
several typical home audience configurations. In a survey, users
were given a configuration and asked to rate a set of advertisements on how well each advertisement matched the configuration. Unbeknown to the subjects, half of the adverts were recommended using the derived audio demographics and the other
half were randomly chosen. The recommended adverts received
a significantly higher median rating of 7.75, as opposed to 4.25
for the randomly selected adverts.
and gender recognition to recommend a sequence of multimedia items to a home TV audience comprising multiple viewers.
Instead of relying on explicitly provided demographic data for
each user, we define an audio-based demographic group profile
that captures the age and gender for all members of the audience. A 7-class age and gender classifier employing a fusion
of acoustic and prosodic features determines the probability of
each speaker belonging to each class. The information for all
speakers is then combined to form the group profile, which itself is the input to a recommender system. The recommender
system finds the content items whose demographics best match
the group profile. We tested the effectiveness of the system for
several typical home audience configurations. In a survey, users
were given a configuration and asked to rate a set of advertisements on how well each advertisement matched the configuration. Unbeknown to the subjects, half of the adverts were recommended using the derived audio demographics and the other
half were randomly chosen. The recommended adverts received
a significantly higher median rating of 7.75, as opposed to 4.25
for the randomly selected adverts.
Original language | English |
---|---|
Title of host publication | 14th Annual Conference of the International Speech Communication Association (INTERSPEECH 2013) : Speech in Life Sciences and Human Societies |
Editors | F. Bimbot, C. Cerisara, G. Gravier, L. Lamel, F. Pellegrino, P. Perrier |
Number of pages | 5 |
Volume | 1 |
Publisher | Curran Associates, Inc |
Publication date | 2013 |
Pages | 2827-2831 |
ISBN (Print) | 978-1-62993-443-3 |
Publication status | Published - 2013 |
Event | Interspeech 2013 - Lyon, France Duration: 25 Aug 2013 → 29 Aug 2013 http://www.interspeech2013.org/ |
Conference
Conference | Interspeech 2013 |
---|---|
Country/Territory | France |
City | Lyon |
Period | 25/08/2013 → 29/08/2013 |
Internet address |
Series | Proceedings of the International Conference on Spoken Language Processing |
---|---|
ISSN | 2308-457x |