ABSTRACT

The multivariate normal distribution remains a central choice for modelling continuous multivariate data. This is partially due to its providing a useful representation for the underlying data-generating mechanism in many applications, but also to convenience in terms of interpretation and ease of computation. An alternative to add flexibility is to use parametric distributions with skewness and thick tails that robustify inference while keeping interpretation and computations manageable. We review some of the available options and, by way of illustration, we use a novel formulation based on two-piece families that contains the normal and Student t distributions as particular cases, adds a reduced set of easily interpretable parameters, leads to simple model fitting and is unimodal, a convenient property for clustering. As also described by others before us, our examples illustrate that in the presence of asymmetries or heavy tails normal mixtures may lead to biased parameter estimates, poor clustering, or suggest the addition of spurious components, whereas more flexible mixtures tend to better capture the underlying subpopulations. The methodology is implemented in the R package twopiece, freely available at https://r-forge.r-project.org/projects/twopiece.