## ABSTRACT

The concept of distance is at the heart of many methods for dealing with multivariate data. For example, in the scatterplots shown in Chapter 2 the relative “distances” between points are obviously central to making interpretations about the plots; this is clear without giving an explicit definition of distance but simply appealing to its everyday usage. But in this chapter, we will need to consider the distance concept in more detail as we consider methods which aim to give useful low-dimensional “maps” of particular types of data. We have, of course, already met one such technique, principal components analysis, in the previous chapter. As mentioned briefly there, this technique finds the required low-dimensional map in such a way that Euclidean distances between the observations in the space of the original variables are preserved as far as possible by the points representing the observations in the principal component plot. For the observations in the data matrix X (see Chapter 12), the Euclidean distance between observation i with variable values x i 1 , x i 2 , … , x i q https://s3-euw1-ap-pe-df-pch-content-public-u.s3.eu-west-1.amazonaws.com/9781351202275/2e8e5f47-538b-4ac7-9a9d-da2cd2fd9aae/content/inline-math14_1.jpg"/> , and observation j with variable values x j 1 , x j 2 , … , x j q https://s3-euw1-ap-pe-df-pch-content-public-u.s3.eu-west-1.amazonaws.com/9781351202275/2e8e5f47-538b-4ac7-9a9d-da2cd2fd9aae/content/inline-math14_2.jpg"/> is defined as d i j = ∑ k = 1 q ( x i k − x j k ) 2 https://s3-euw1-ap-pe-df-pch-content-public-u.s3.eu-west-1.amazonaws.com/9781351202275/2e8e5f47-538b-4ac7-9a9d-da2cd2fd9aae/content/umath14_1.jpg"/>