Respeaking : Subtitling through speech recognition

doi:10.4324/9781315717166-7

ABSTRACT

In broad terms, respeaking may be defined as the production of subtitles by means of speech recognition. A more thorough definition would present it as ‘a technique in which a respeaker listens to the original sound of a (live) programme or event and respeaks it, including punctuation marks and some specific features for the deaf and hard of hearing audience, to a speech recognition software, which turns the recognized utterances into subtitles displayed on the screen with the shortest possible delay’ (Romero-Fresco 2011: 1). Thus, in many ways, respeaking is to subtitling what interpreting is to translation, namely a leap from the written to the oral without the safety net of time. It is, in effect, a form of (usually intralingual) computer-aided simultaneous interpreting with the addition of punctuation marks and features such as the identification of the different speakers with colours or name tags. Although respeakers are normally encouraged to repeat the original soundtrack, and hence produce verbatim subtitles, the fast-paced delivery of speech in media content often makes that difficult. The challenges arising from high speech rates are compounded by other constraints. These include the need to incorporate punctuation marks through dictation while the respeaking of the original soundtrack is unfolding; and the expectation that respoken output will abide by standard viewers’ reading rates. Consequently, respeakers often end up paraphrasing, rather than repeating or shadowing, the original soundtrack (Romero-Fresco 2009).