Abstract
We propose a new algorithm for time stretching
music signals based on the theory of nonstationary Gabor
frames (NSGFs). The algorithm extends the techniques of the
classical phase vocoder (PV) by incorporating adaptive timefrequency
(TF) representations and adaptive phase locking. The
adaptive TF representations imply good time resolution for the
onsets of attack transients and good frequency resolution for
the sinusoidal components. We estimate the phase values only
at peak channels and the remaining phases are then locked to
the values of the peaks in an adaptive manner. During attack
transients we keep the stretch factor equal to one and we propose
a new strategy for determining which channels are relevant
for reinitializing the corresponding phase values. In contrast to
previously published algorithms we use a non-uniform NSGF to
obtain a low redundancy of the corresponding TF representation.
We show that with just three times as many TF coefficients
as signal samples, artifacts such as phasiness and transient
smearing can be greatly reduced compared to the classical PV.
The proposed algorithm is tested on both synthetic and real
world signals and compared with state of the art algorithms in
a reproducible manner.
music signals based on the theory of nonstationary Gabor
frames (NSGFs). The algorithm extends the techniques of the
classical phase vocoder (PV) by incorporating adaptive timefrequency
(TF) representations and adaptive phase locking. The
adaptive TF representations imply good time resolution for the
onsets of attack transients and good frequency resolution for
the sinusoidal components. We estimate the phase values only
at peak channels and the remaining phases are then locked to
the values of the peaks in an adaptive manner. During attack
transients we keep the stretch factor equal to one and we propose
a new strategy for determining which channels are relevant
for reinitializing the corresponding phase values. In contrast to
previously published algorithms we use a non-uniform NSGF to
obtain a low redundancy of the corresponding TF representation.
We show that with just three times as many TF coefficients
as signal samples, artifacts such as phasiness and transient
smearing can be greatly reduced compared to the classical PV.
The proposed algorithm is tested on both synthetic and real
world signals and compared with state of the art algorithms in
a reproducible manner.
Original language | English |
---|---|
Journal | I E E E Transactions on Audio, Speech and Language Processing |
Volume | 25 |
Issue number | 11 |
Pages (from-to) | 2199-2208 |
Number of pages | 10 |
ISSN | 1558-7916 |
DOIs | |
Publication status | Published - Sept 2017 |
Keywords
- Phase vocoder
- nonstationary Gabor frames
- Time-frequency analysis
- Gabor theory
- Time stretching