ABSTRACT

Corpus linguistics most commonly refers to the study of machine-readable spoken and written language samples that have been assembled in a principled way for the purpose of linguistics research. At the heart of empirically based linguistics and data-driven description of language, corpus linguistics is concerned with language use in real contexts. Therefore, it is often contrasted with Chomskyan linguistics, which emphasises language competence and often involves made-up examples as the basis of its exploration of language. Access to ever larger spoken and written corpora has already revolutionised the description of language in use; however, the impact of corpus linguistics has reached far beyond the disciplines that are purely concerned with linguistic descriptions of language. As an approach, corpus linguistics continues to gain recognition and popularity, with an increasing number of researchers across different disciplines exploring innovative ways of using corpus-based research as part of their methods toolkit. This chapter provides a brief overview of some of the different types of corpora available

and some of the methods used within the area of corpus linguistics, including the generation of frequency lists, concordance outputs and keyword analyses. It then moves on to a discussion of selected current issues in corpus linguistics. We focus here on three issues which we believe are marked by the persistent attention they have received in the field, as well as by their prominent status among researchers and end-users. The issues we will introduce include an area of language description (phraseology and corpus research), an area of application (English language teaching and corpus research), and an area of resource development (the Web as corpus). The chapter will conclude with a discussion of the impact which technological developments may have on the discipline. All the corpus resources mentioned in this chapter can be found after the ‘Further reading’ section.