Healthcare systems worldwide are entering a new phase: ever-increasing quantities of complex, massively multivariate data concerning all aspects of patient care are starting to be routinely acquired and stored [1], throughout the life of a patient. This exponential growth in data quantities far outpaces the capability of clinical experts to cope, resulting in a so-called data deluge, in which the data are largely unexploited. There is huge potential for using advances in large-scale machine learning methodologies * to exploit the contents of these complex data sets by performing robust, scalable, automated inference to improve healthcare outcomes significantly by using patient-specific probabilistic models, a field in which there is little existing research [2] and which promises to develop into a new 74industry supporting the next generation of healthcare technology. Data integration across spatial scales, from molecular to population level, and across temporal scales, from fixed genomic data to a beat-by-beat electrocardiogram (ECG), will be one of the key challenges for exploiting these massive, disparate data sets.