ABSTRACT

A look at the abundant research on formulaic language reveals a surprising number of different labels for the items studied. It is obvious that researchers have not all been examining the same phenomenon, working with different motivations in quite separate areas of linguistics and areas outside of linguistics, in fields as diverse as social anthropology and neurology. In fact, the use of the term formulaic language/formulaic sequence was not common until Wray (1999) put things in perspective after examining the growing body of research on multiword units. The categories usually identified are collocation, idiom, lexical phrase, lexical bundle, construction, formulaic expression, phrasal verb, n-gram, and concgram. A survey of the main categories in recent literature shows that there is considerable common ground among researchers regarding the target of their investigations, and that the labels for categories exist for good reason. At the same time, there is a certain amount of vagueness, overlap, and reliance on interpretation. For instance, the real difference between a collocation and an idiom is difficult to perceive, at least in part because a number of researchers have studied them from different perspectives over a very long time. The distinction between a lexical bundle and a multiword construction is delicate due to small differences in the processes used to extract them from corpora. Certain types of formulaic sequences almost seem to defy classification, for example sequences like and then or sooner or later present definite challenges. One might question the value of the categorizations to us as researchers or teachers. Why should an educator or researcher be concerned whether a sequence is, for example, a phrasal verb or a collocation? An examination of the major categories of formulaic language will help to put the phenomenon into perspective for applied linguists and those involved in language education.