ABSTRACT

Data mining is the process that is used to turn raw data into useful information that will help to uncover hidden patterns, relationships and trends. It is a data-driven technique in which data are examined to determine which variables and their values are important and understand how variables are related to each other. It involves applying algorithms to the extraction of hidden information with the aim of building an effective predictive or descriptive model of data for explanation and/or generalisation. The focus is on data sourcing, pre-processing, data warehousing, data transformation, aggregation and statistical modelling. The availability of big data (extremely large and complex datasets, some of which are free to use, re-use, build on and redistribute, subject to stated conditions and licence: see Chapter 3) and technological advancement in software and tools (see below), has led to rapid growth in the use of data mining as a research method.