Introduction Corpora are collections of naturally occurring language data, stored in electronic form, designed to be representative of particular types of text and analysed with the aid of computer software tools. Corpora are now common in English for academic purposes (EAP) research and practice, both to provide quantitative information about discourse, and to corroborate insights derived from more qualitative studies. They also play an increasingly important role in EAP pedagogy, providing syllabus items, examples to illustrate accepted usage, and opportunities for data-driven learning.