Neural networks are a broad class of predictive models that have enjoyed considerably popularity over the past decade. Neural networks consist of a collection of objects, known as neurons, organized into an ordered set of layers. Directed connections pass signals between neurons in adjacent layers. Prediction proceeds by passing an observation X i to the first layer; the output of the final layer gives the predicted value y ^ i $ \hat{y}_{i} $ https://s3-euw1-ap-pe-df-pch-content-public-u.s3.eu-west-1.amazonaws.com/9781315171401/4cd6de1b-8629-40c1-bcce-40bbf922c18a/content/inline-math8_1.tif"/> . Training a neural network involves updating the parameters describing the connections in order to minimize some loss function over the training data. Typically, the number of neurons and their connections are held fixed during this process. The concepts behind neural networks have existed since the early 1940s with the work of Warren McCulloch and Walter Pitts [119]. The motivation for their work, as well as the method’s name, drew from the way that neurons use activation potentials to send signals throughout the central nervous system. These ideas were later revised and implemented throughout the 1950s [55, 139]. Neural networks, however, failed to become a general purpose learning technique until the early-2000s, due to the fact that they require large datasets and extensive computing power.