In many domains involving expert judgment, the base rate of errors is quite low. In Air Traffic Control (ATC), for instance, operational errors for en route aircraft are quite rare. Even after approaching zero errors, however, controllers continue to improve with experience. In such cases, traditional measures of performance based on percent correct ‘gold standard’ answers may be insensitive to performance improvements.