Introduction Speech intelligibility in realistic environments is directly correlated with the ability of focusing attention on the sounds of interest while discarding the background noise and other competing stimuli. This work investigates task-driven auditory attention in noisy environments. Specifically, this study focuses on the ability to successfully execute a word spotting task while speech perception has to cope with the presence of music playing in the background. Methods The executed behavioural experiments consider different types of songs and explore how their distinct characteristics (such as dynamics or presence of distortion sound effects) affect the subjects’ task performance and, thus, the distribution of attention. Results Our results show that the ability of correctly separating the target sound from the background noise has a major impact on the performance of the subjects. Indeed, songs not presenting any distortion effect result in being more distracting than the ones with distortion, whose frequency spectrum envelop differentiates more from the one of the narrative. Furthermore, subjects performed the worst with songs characterised by high dynamics playing in the background, due to the unexpected changes capturing the attention of the listener.