Timescale Modification of Speech

doi:10.1201/9781439864869-11

ABSTRACT

Timescale modification is the general term applied to various methods for altering the temporal structure of speech without altering other aspects of the signal such as its pitch or the voice quality of the speaker. Timescale modification can be applied globally to change the perceived speaking rate of an utterance by uniformly increasing or decreasing the durations of all the syllables in the utterance. It is also possible to apply timescale modification locally within an utterance to alter the duration of just a particular region of the utterance, say, to lengthen a single syllable. The two approaches can be combined to mimic the nonuniform duration/rate changes that talkers actually produce when speeding up or slowing down their speech (e.g., [1]). Several methods have been developed for timescale modification with low distortion. Griffin and Lim [3] presented a method based on recursively approximating the timescale modified speech in the spectral domain. This method worked quite well, but was also computationally very intensive and required substantial time to compute even a single modified sentence. More recently, Moulines and Charpentier [5] published the PSOLA [4] and related approaches which can be used to modify both the timescale and the pitch of utterances using computationally simple time domain operations. For more information about the PSOLA algorithm, see my article “Pitch Modification of Speech Using PSOLA” (page 187). The timescale modification algorithm presented in this article exploits the temporal structure of the speech signal in a way that is quite similar to the PSOLA algorithm, but is simpler because, unlike PSOLA, it is not intended for pitch modification. Before reading 174either this article or the PSOLA article, the reader might find it helpful to review my article on general speech acoustics, “Introduction to Speech Acoustics” (page 159).