Notater 2014/27
Imputation of missing data among immigrants in the Register of the Population's Level of Education (BU)
In Norway, the Register for the Population’s Level of Education (BU) contains information on all residents, 16 years of age and older. While in general, missing data is minimal in this register (around 3 percent), increases in immigration to Norway in recent years is creating knowledge gaps about the level of education of its new residents. Census-surveys have filled some of the missing data, however, around 20 percent of immigrants still have an unknown level of education. With rising non-response in these census-surveys, the problem is unlikely to disappear.
An imputation method is proposed to address the non-response bias from these surveys and to create a “complete” dataset. A missing at random (MAR) assumption is made, with imputation being based on a nearest neighbour technique called predictive mean matching. The auxiliary variables used to find a matching donor include gender, age, occupation, income, length of time living in Norway, citizenship and country of origin.
Results from the imputed dataset show some small overall changes. In general, imputation reduces the percentage of both the highest and lower education levels. Comparison of the imputation results among Swedish immigrants with data from Statistics Sweden shows the imputation proposed is adjusting the data in the right direction.
Author: Susie Jentoft