For the past 30 years, large-scale assessments (LSAs) have been used widely by education systems around the world to inform and monitor their education policies (Lafontaine & Simon, 2008). These include assessments administered nationally such as the National Assessment of Educational Progress (NAEP) in the United States or the Pan-Canadian Assessment Program (PCAP) in Canada, as well as international assessments such as the Trends in International Mathematics and Science Study (TIMSS) and the Program for International Student Assessment (PISA). The results from these assessments are used to inform decisions about education practices, resource allocations, and education effectiveness. The meaningful interpretation of scores from these assessments is critically tied to the comparability of scores for all comparison groups.