Discussion
Main results
Only one-third of the 53 000 persons invited to LOFUS participated. Nevertheless, there were very modest differences in participation rates across subgroups, when invited persons were classified by six sociodemographic variables available for all invited persons (figure 1). Furthermore, participating subgroups were not particularly healthy, as participation was, for instance, high in elderly persons. Our study thus showed that between-group selection did not seriously affect the LOFUS data. However, among the 53 000 invited persons, in total, 3516 persons died during the average follow-up period of 4.68 years, and the death rate was three times higher for the two-thirds of non-participants compared with the one-third of participants. This pattern of excess mortality in non-participants was found across all subgroups classified by the six sociodemographic variables. Our study therefore showed a considerable health selection in participation within groups. Therefore, healthy persons participated in LOFUS to more or less the same extent independently of their sociodemographic background, but unhealthy persons were under-represented from all groups.
Figure 1Relative risk of non-participation and mortality rate ratio for non-participants compared with participants by sociodemographic variables. Relative risks adjusted for sex and age 5 years at invitation. Mortality rate ratios adjusted for sex, 5-year age group at follow-up and calendar year. Age groups were not adjusted for age. Economic status available only for persons aged 30–64 years at invitation.
Previous studies
Differences in survey participation across population subgroups have been seen in basically all surveys. In the UK Biobank, participation of men was lower than that of women, and participation was lower from people in their 40s than those in their 60s.17 In the Dutch LifeLines cohort, women constituted a larger proportion of participants than they did in the source population, and so did married/cohabitating persons.18
When the source population is well-known, weighting of responses with the population distribution is an established method to estimate the population representative response pattern.19 The Danish National Health Profile is a questionnaire-based survey of the citizens’ health with the aim of providing data for ‘targeted health promotion, prevention and treatment’.20 In the last survey in 2021, 324 000 persons were invited, and 57% participated. Among men, 37% aged 16–24 years participated compared with 74% aged 65–74 years. To obtain data representative of the Danish population, responses were weighted by sex, age, education, income, socioeconomic group, family type, ethnicity, owner/renter of dwelling, number of visits to general practitioner (GP) and number of hospital admissions in 2019.21 So, when for instance a prevalence of severe sleeping disorders of 15% was reported, this estimate was supposed to reflect the general population.
Less studied, although well established, is the excess mortality in non-participants compared with participants. In the Whitehall II cohort, non-participants compared with participants who were followed up for 10 years had an adjusted HR of all-cause mortality of 2.10 (95% CI 1.72 to 2.57); after 15 years, it was 2.00 (95% CI 1.64 to 2.45) and after 20 years, 1.83 (95% CI 1.56 to 2.16).22 A similar pattern was seen in the Norwegian HUNT Study where non-participants followed for 3 years had an HR of 2.80 (95% CI 2.54 to 3.09); after 14 years, 1.72 (95% CI 1.66 to 1.78); and after 25 years, 1.50 (95% CI 1.44 to 1.57).23 A decreasing excess mortality rate in non-participants by the time of follow-up was also seen in our data.
The excess mortality of non-participants compared with participants may also vary across socioeconomic groups. During 10 years of follow-up of the FINRISK Surveys from 1972 to 1992, the all-cause mortality of non-participating compared with participating men was 1.87 (95% CI 1.22 to 2.86) for upper-level non-manual employees; 2.46 (95% CI 1.69 to 3.59) for low-level non-manual employees; and 3.18 (95% CI 2.52 to 4.01) for manual workers, with a narrower range for women (1.95, 2.54 and 2.64, respectively).24 In a 15-year follow-up of the Danish Diet, Cancer and Health cohort, the excess mortality of non-participants compared with participants was 1.73 (95% CI 1.66 to 1.79) for men and 2.10 (95% CI 2.01 to 2.20) for women. This excess mortality was seen across all educational groups. Using higher-educated participating men as baseline, they had an MRR of 1.77 (95% CI 1.63 to 1.94), while basic/high school educated non-participating men had an MRR of 3.68 (95% CI 3.41 to 3.98).25
The FINRISK and Diet, Cancer and Health data were in accordance with what we found in LOFUS. All non-participants had an excess mortality compared with participants, but the excess was more marked for persons on public support than for self-supported persons.
Strengths and limitations
A strength of the study was the complete follow-up for vital status for all people invited to LOFUS. Furthermore, all estimates were numerically stable with narrow CIs. Invitation of household members meant that single households were slightly under-represented. We used log-binomial analyses instead of the traditional logistic regression as we wanted to know not only the direction of associations but also the size. For example, the non-participation proportion of invited persons was 78% in in-migrants and 62% in long-term residents, giving an RR of 1.26, but with an OR of 2.17 (=(4055×17 336)/(28 067×12 153)).
RRs were adjusted for sex and age 5-year age group at invitation. MRRs were adjusted for sex, age 5-year age group at follow-up, and calendar year, because the full model including all covariates did not converge.
Interpretation
The starting point for the present study was the paradoxical observation of an excess mortality in Lolland-Falster, while LOFUS health data were in line with levels in other parts of Denmark. Although we found only modest differences in participation across subgroups, selective participation may explain part of the paradox. In the weighted data from the Danish National Health Survey, chronic obstructive pulmonary disease was reported by 8% in Lolland-Falster and 5% nationwide; osteoarthritis by 34% vs 23%; obesity by 25% vs 19%; daily smoking by 21% vs 14%; and excess alcohol intake by 15% vs 16%.20 The pattern in the weighted results was overall in line with what we would expect from the population-based mortality data. Nevertheless, weighting is made under the assumption that the characteristics of non-participants in a given subgroup equal that of participants from the same subgroup. However, this may not be true for health characteristics, as non-participants in the years following invitation had a considerably higher mortality than participants. So, for instance, in the Danish National Health Survey, the weighted prevalence of severe sleeping disorders was 15%, but the true prevalence may be higher because weighting is unlikely to account for the in-group health selection even when weighting included visits to GPs and hospitalisation.21
We found the measured prevalence of airway obstructions in LOFUS participants in line with the prevalence in other parts of Denmark.11 In the Danish National Health Survey data, the weighted, self-reported prevalence of chronic obstructive pulmonary disease was 60% (=8.4%/5.2%) higher in Lolland-Falster than nationally.20 So, selection between groups could have played a role. But even more important was probably the selection within groups; the mortality from respiratory diseases was 4.54 higher in non-participants than in participants in LOFUS. One should therefore be cautious about using health survey data for prioritising health interventions.
Health surveys have not only served the purpose of mapping population profiles as a basis for health policy, but also of providing prospective cohort data for identification of risk factors. For this purpose, selective participation of healthy individuals may actually be seen as an advantage. In order to exclude reverse causation deriving from behavioural patterns caused by prevalent disease, it is standard in the analysis of prospective cohort studies to exclude participants with prevalent diseases at the time of recruitment. The philosophy being that the purpose of prospective cohort studies is to identify associations between risk factors and disease occurrence, and that these associations will be valid on the population level even when derived from selected subpopulations.26
Lately, this understanding of the validity of findings from prospective cohort studies has been challenged.27 The UK Biobank had a response rate of 5.45%.16 In order to test for the potential impact of selective responses, a post-stratification was undertaken using the Health Survey of England 2008 (HSE). The HSE had a response rate of 64% and used age, sex, household type, geographical region and social class as non-responder weights. The general population estimates from the HSE were used as input for weighting of the UK Biobank data by age, sex, education, smoking, physical activity and body mass index, where weighted totals from the UK Biobank were summed to the totals of the UK population.
Weighted UK Biobank data had younger, less educated and less physically active persons than those in the unweighted data. Excluding persons with cancer or cardiovascular disease at baseline, 302 009 UK Biobank participants remained, of whom 11 875 died during follow-up. In the unweighted UK Biobank data, a protective effect was seen on death from cardiovascular disease of current as compared with never drinking (HR of 0.60–0.66). These associations disappeared in the weighted data (HR 0.93–1.00), indicating an over-representation of healthy lives among UK Biobank participants reporting current drinking. This finding is an important reminder of the risk of generating spurious associations from incomplete data, but the weighting might in itself have generated problems as one-fourth of the data had been excluded due to missing values. Furthermore, the fact that all other studied associations remained largely unchanged indicated some resilience of associations generated from selected populations.