Review

Are in-market control systems for sunscreens adequate for consumer protection? A review of the legal framework regulating sun protection factor labels in Europe

Abstract

Public health authorities and regulations in Europe protect the population against the damaging effects of excessive solar ultraviolet radiation through, among other means, monitoring marketed sunscreens and enforcing compliance with sun protection factor labelling requirements. In-market control processes are fundamental and complementary to other public health initiatives in a context of suboptimal sunscreen use in real-world settings. However, the laboratory testing method used for determining the sun protection factor of sunscreens is known to produce variability of results. The combination of an inherently variable testing method with the necessary rigidity of regulations generates volatility in the decision-making process followed by regulators during official in-market controls and exposes sunscreens to be susceptible to accidental mislabelling challenges. This leads to a paradoxical situation that may leave most sunscreens incorrectly labelled in the market and to a potential dilemma for authorities. The issue may get further amplified when non-official sources echo and broadcast uncontrolled messages about sunscreens to the public. Amending current regulation with a tolerance level to compare results that accommodates the variability of results from sun protection factor tests would ease decision-making, bring robustness to an uncertain legal landscape, make more efficient the efforts to convey consistent public health messages about the benefits of sunscreen use and better protect users. There are precedents of using tolerance levels for regulatory decision-making in other fields, and a review of the applicable legal landscape in Europe reveals that implementing it for sunscreens would only require one change to current cosmetics law.

Introduction

Excessive exposure to solar ultraviolet radiation (UVR) has long been known to have detrimental effects to human health, including skin cancer, photoaging and worsening photosensitive dermatoses.1–4 In order to tackle this modifiable risk factor, public health efforts and educational campaigns promote a variety of sun protection measures,4–7 such as avoiding unnecessary sun exposure, using cover clothing and applying sunscreen.1 2 5

In the case of sunscreens, since there is a recognition that the actual protection afforded by sunscreens in real-world settings is less than that determined in laboratory tests,2 8–10 guidelines emphasise the importance of applying sunscreen regularly, in sufficient amounts and adequately.1 2 5 Among the different features that characterise sunscreens (eg, texture, type of UV filters, waterproofness), the sun protection factor (SPF) has emerged as the most familiar metric to sunscreen users.11–14 The SPF value, which is determined in a laboratory setting following standardised testing procedures, is also relied on by the medical and public health communities to produce guidelines and make recommendations.1 5 10 15

Besides public health efforts, the regulation of the SPF labels displayed on sunscreen products allows the public to make safe and informed choices about sunscreens.5 13 14 16 Conformity with these regulations is monitored and enforced through in-market control systems that recurrently identify and bring into compliance commercialised sunscreens that have mislabelled SPF values.14 17 18 Yet as seen by the sheer number of sunscreens with mislabelled SPFs reported in the press, often leading to legal disputes,19–22 these cases seem too common, affecting users,9 23–25 regulators and manufacturers26–29 around the world.30–33

This compliance enforcement task can be challenging when the tests that have been used to label the SPF of the product are methodologically correct, and the only reason for questioning the labelled value is the different SPF result of another methodologically correct test. In these cases, a decision has to be made on whether test results are sufficiently different as to require relabelling or recalling the product or not.14 17 The consequences from these decisions go from economical and reputational damages to the manufacturer if the product is recalled from the market, to being perceived as unethical and against the interests of public health if otherwise. Either way, though, the message permeating the public, which is often amplified in the media, does not contribute to building confidence in sunscreens.14 15 34

In-market control processes play an important role within the broader public health efforts to protect consumers. Since having regulation that unequivocally answers the question of whether two different valid SPF results can be considered the same or not as they relate to labelling would ease decision-making, the objective of this article is to review the legal framework which regulates SPF labelling in Europe (EU) (EU recommendations are often also followed by other countries outside of the EU) and to propose improvements to make it more robust.

Monitoring and enforcing SPF labelling compliance in the EU

In the EU, manufacturers of sunscreen products typically and lawfully rely on just one SPF test to label a sunscreen.17 35 The labelled SPF value on the product is established using a correspondence table that maps the results obtained in a valid test from the ISO24444 method to pre-established SPF values (see table 1).5 35 This mapping allows simplifying test results into standardised SPF values and categories for consumer information.35

Table 1
|
Correspondence of measured SPF test values to labelled SPF and categories according to EU recommendation

The ISO24444 method, which uses the production of erythema (measured at 24 hours after exposure) on the skin of healthy volunteers as clinical end-point, is known to produce variability of results in repeat tests.13 16 36 37 This is due to factors such as differences in the equipment used in testing laboratories, the skill of the operators, environmental conditions and differences in the sensitivity to erythema of the volunteers, to name a few.16 37 In addition, these discrepancies could tend to be amplified for higher SPF values.38 39

Among the different means used by the method to control these factors, such as specifying the characteristics of the solar simulator, having inclusion and exclusion criteria for selecting test participants and using statistics to validate the tests, tests performed with the ISO24444 method are also controlled by the use of validated reference standard products to verify the test procedure (see table 2). If the SPF of the reference standard product does not fall within acceptance limits, then the entire test is rejected. Reference standard products have been validated numerous times, and as part of the method’s ring tests, and so are considered to be reliable and produce reproducible results.36 40

Table 2
|
Mean, upper and lower bound acceptance limits for the SPF of reference standard products

An apparent fact about the method’s variability is the acceptance limits for reference standard products. For instance, regardless of the root cause of the variabilities involved, the accepted validity limits for the SPF of reference standard P6 go from SPF 31.0 to 54.9 and for P5 from SPF 23.7 to 37.4.40 If these reference standard products were commercialised, as per labelling regulations P6 could thus be mapped to both labels SPF 30 and SPF 50, and P5 could be labelled as both categories ‘medium protection’ and ‘high protection’, depending on the results obtained in the tests (see tables 1 and 2).35 This variability of SPF results in the context of the current legal landscape creates a problematic situation: figure 1 represents the flow of events that would unfold if P6 was commercialised in the market and regulation was followed literally.

Figure 1
Figure 1

Flow chart of events for monitoring and enforcing compliance with sunscreen SPF labelling regulations according to EU law (case example of reference standard P6). EU, Europe; SPF, sun protection factor.

If a manufacturer launched P6 to the market (steps 1–2)17 35 40 with an SPF label 50 and a competent authority (competent authorities in the EU normally are national regulatory agencies),41 following the review of the product information file, decided to conduct an SPF test,17 35 40 42 it would be led to conclude, due to the method’s variability, that the label should be either 30 or 50 (steps 4–5). If it was 30, the product would have to effectively be relabelled (steps 7–8) or recalled from the market (steps 10–11).17 The unfolding of these events happens without calling the safeguard clause and so the competent authority not even needing to have ‘reasonable grounds for concern, that a product […] could present a serious risk to human health’,17 which in this example seems reasonable given that P6 is a reference standard product, but due to calling the clause about non-conformity with the labelling.17 35 42 Therefore, strictly following the regulation, just by chance, P6 could not be labelled SPF 50.

The same flow of events would apply if the tests conducted by the manufacturer and the competent authority had been swapped and the manufacturer had labelled the product SPF 30 in the first place (steps 1–2)17 35 40 and the competent authority had the test leading to require a label of SPF 50 (steps 4–5).17 35 40 42 This would also represent a non-conformity that would end up with the product being relabelled, to SPF 50 in this case (steps 7–8) or recalled (steps 10–11).17 Therefore, again just by chance, P6 could not be labelled SPF 30 either.

Since the conclusion reached by the competent authority in its decision-making process is subject to chance, this means that, paradoxically, a robust product such as P6 could have a mislabelled SPF in the market. A similar mislabelling issue could occur for P5, in this case not for different SPF labels but for different categories (see tables 1 and 2). If reference standard products are mislabelled so could be many (or most) other products, casting serious doubts about the overall correctness of SPF labels and categories in the market.

The impact of in-market controls and public health initiatives to the public

Beyond, and regardless of, the variabilities encountered in SPF laboratory tests, several factors contribute to making the SPF value determined in a laboratory to be an overestimation of the actual SPF achieved in real conditions.2 9–11 For instance, the amount of product applied in real-world settings (estimated at 0.4–1.2 mg/cm2) is less than the amount used in laboratory tests (2 mg/cm2).8–11 A 50% reduction in the amount of product applied can lower the actual SPF value by a factor of four.2 9 10 Sweat, rubbing and environmental factors also diminish the efficacy of a sunscreen over time,6 9 and users often miss out on adequately covering certain parts of exposed areas.8 9 Besides the SPF, which is primarily a measure of the protection capacity of a sunscreen against UVB radiation (290–320 nm), protection against UVA (320–400 nm) is likewise important.9 10 16 40 Multiple other factors (eg, the contribution of visible light to erythema, the time of the day and solar zenith angle) further contribute to making the SPF achieved in natural sun exposure conditions differ from that measured in the laboratory. Consequently, public health efforts aim at improving sun protection by, among other considerations (eg, seeking shade, using protective clothing and sunglasses), incentivising and conveying messages about the correct use of sunscreens.4–7

The consistency of public health messages through mainstream programmes and advertising campaigns, as well as the systematic use of sunscreen at a population level, have demonstrated to be highly effective in improving the population’s sun protection43–46 and saving healthcare costs.44 45 47 48 These efforts rely on the robustness of sunscreen labels and are undertaken by bodies such as governmental agencies, healthcare institutions and medical associations.8 11 12 15

Nonetheless, official public health messages compete with (sometimes conflicting) messages about sunscreen use delivered by other stakeholder groups.8 15 34 In fact, with the uptake of social media and other content delivery platforms, there has been an increase of commentary from non-official sources promoting sunscreen related content with no scientific backup. For the average consumer, the number of impacts received and the variety of sources involved is large and diverse, to the extent that consumers are often confused, may have misconceptions about the factors important in a sunscreen, and may even not always understand the meaning of sunscreen labels.8 11 12 34

While several recommendations have been made to improve the effectiveness of the public health message and counteract misleading content from non-official sources, being consistent with the message delivered is fundamental to both, achieve the goal of changing people’s behaviours towards better sun protection and make the best use of public health’s resources.3 8 34 43

Once a sunscreen is made available in the marketplace, anyone, such as a consumer organisation with the appropriate resources, can perform an SPF test on the product (steps 14–15 in figure 1). The same variability that may lead a competent authority to conclude that a sunscreen is mislabelled also applies here, with the compounding factor that the (potentially misleading) conclusions that are reached are typically shared with the general public.15 16 Comments from such groups, as well as those from other non-official sources, have today become more popular than the messages coming from trustworthy official sources, making the subsequent effects of a recall/relabel action by the competent authority to echo to the general public with a message that is no longer controlled or necessarily aligned to the official public health message, and so to potentially contradict the competent authority’s initial intention to protect the public.15 34

It would be perverse to purposely decrease (or even avoid doing) monitoring and compliance enforcement activities, as there are public health benefits in removing sunscreens with mislabelled SPFs from the market, especially for tests that are methodologically flawed or for products that could represent a safety concern. Although the combination of a legal landscape that accidentally leads to relabelling or recalling (possibly many of the) audited sunscreens with the amplification of that message to the general public does not help building on the official public health message about trust in sunscreens.

Resolving the paradox with a tolerance level to compare SPFs

If establishing SPF labels using valid results from correctly administered tests may lead to mislabelled values, what changes could be made within regulation to remove randomness in the decision-making process followed by competent authorities when confronted with the result of an SPF test (steps 4–5 in figure 1) that challenges the label of a commercialised sunscreen (step 1)?

Following from the example of what would happen if P6 was commercialised (figure 1), one could think that possible solutions to avoid this paradox could be to either perform a ‘decisive test’ in steps 5–6 (ie, a new test that would dismiss and over-rule the results from any previous tests and the results of which would be considered as final for labelling purposes) or a ‘pooled analysis’ (ie, combining the data from the different tests, using statistical approaches such as a Fisher’s F test, to try to reach a conclusion as to what the SPF label should be).38 However, these solutions might not be optimal. For instance, the outcome of the ‘decisive test’ would still be bound to the variability of the method, and so the conclusion about whether the label should be SPF 30 or 50 still be left to chance. And the outcome of the ‘pooled analysis’ could not be used for labelling purposes, as regulation requires the SPF label to be based on the result of a valid test.17 35

Moreover, if either the ‘decisive test’ or the ‘pooled analysis’ solutions were implemented within regulation, manufacturers, in order to minimise risks, could be led to perform multiple SPF tests before products are even placed on the market in the first place (step 1 in figure 1). While this could anyway be a prudent thing to do, conducting more than one SPF test before a product is launched is also uneconomical, possibly unfair, as not all manufacturers have the same financial means, and unethical, as human subjects are involved in those tests.36

Given that the labelling non-conformity identified by the competent authority in the example of figure 1 is independent of the characteristics of the product and it is not due to any incorrect application of the method, another solution might be to use a ‘tolerance level’ (that is, an acceptable amount of variation in the SPF test results) to compare the results from the two tests before a conclusion is reached as to whether there exists a non-conformity with the labelling of the product or not.

The benefit of incorporating an additional step within regulation (between steps 5 and 6 in figure 1) with a tolerance level to discriminate whether different SPF results can be considered different or the same as they relate to labelling is that it would avoid exposing the public to (unintentional) mislabelling, and manufacturers to legal disputes, which is what today’s legal framework could inevitably lead to, as highlighted in steps 12 and 13 in figure 1.17 35 42 Additionally, establishing a tolerance level would be simple to implement within current regulations, only requiring one change to cosmetics law for the case of non-compliance with sunscreen’s SPF labels (ie, chapter VIII in 2009/1223/EC),17 and it would not have the incentive to manufacturers to conduct multiple tests before products are commercialised, since the risk of having a mislabelled SPF due to the variability of the method could already be accounted for within the tolerance level.

It is not uncommon for competent authorities to use tolerance levels for assessing compliance with labelling regulations. In other fields, such as in the labelling of vitamin contents in food products, the EU has issued guidance to precisely help competent authorities in a similar issue. Specifically, that guidance says that ‘Tolerances for nutrition labelling purposes are important as it is not possible for foods to always contain the exact nutrient levels labelled, due to natural variations and variations from production and during storage. However, the nutrient content of foods should not deviate substantially from labelled values to the extent that such deviations could lead to consumers being misled’ and that ‘Tolerances mean the acceptable differences between the nutrient values declared on a label and those established in the course of official controls’.49

While the sources of variability of SPF results from the ISO24444 method have nothing to do with the sources of variability for the vitamin concentrations in food products, the parallelism between these two distinct fields is noteworthy: test results which are inherently variable must ultimately be translated into a labelled value for consumer’s information. The precedent that tolerance levels are used for regulatory compliance and decision-making in the field of food products could ease doing the same for the case of sunscreen’s SPF.

How could the tolerance level be defined?

Whereas in the case of vitamin concentrations in food products the tolerance level is established to be, after rounding considerations, 50% on the upper side and the measurement uncertainty on the lower side, defining a tolerance level for the case of sunscreen’s SPF requires reaching consensus on what can be considered for ‘acceptable differences’.49

One could delve into the ISO24444 method as a potential source for evidence to inform what such ‘acceptable differences’ should be. For instance, by considering the 95% CI of the SPF of the tests.40 However, because for a test to be valid its 95% CI must be within ±17% of the mean SPF, the fact that the acceptance limits for a reference standard product such as P6 span beyond any 95% CI that could be obtained in any valid test for that product means that the 95% CI is too narrow to be used as a discriminator to compare different tests. Notice that the lower bound acceptance limit for P6 is 31.0, for such a result to be considered valid, it must have a 95% CI ranging at most from 25.7 to 36.3, and that the upper bound acceptance limit for P6 is 54.9, which yields a 95% CI ranging at most from 45.6 to 64.2.40 Since the ranges (25.7–36.3) and (45.6–64.2) do not even overlap, valid results from the method (ie, a test for P6 yielding an SPF of 31.0 and another test yielding an SPF of 54.9) would be considered as different if the 95% CI was used as a tolerance level.

There exists evidence beyond the method which could be used to inform how the tolerance level could be defined. Given that the statistical criterion used by the ISO24444 method to determine the validity of a test is the same regardless of whether the number of subjects is 10 or 20, by studying how the 95% CI as a percentage of the mean changes with the number of subjects Bacardit was able to establish the intrinsic ability of the method to differentiate SPF results to be ×1.73.50 While this ratio does not account for experimental related variability, and therefore it might represent an underestimation if it were to be used as a tolerance level, it provides a wider range of values than the 95% CI from the method.

In a proposed alternative in vitro method to the current ISO24444 in vivo method, Pissavini et al defined an acceptance funnel for comparing in vitro and in vivo test results.18 39 The criteria consist of a check to ascertain minimal method bias and that at least 95% of the paired SPF values (of a pool of 24 sunscreen products tested concurrently in blinded in vivo and in vitro ring-tests across several European laboratories), fit within the upper and lower limits of a funnel (defined by upper and lower 95% CI’s) across the full range of labelled SPF categories, SPF 6–50+. Although that funnel is meant as a validation of the in vitro method, it is not unreasonable to think that a repurposed version of it, assuming that the variability coming from the in vivo tests can be isolated from that coming from the in vitro tests, could be used to inform a tolerance level for comparing results from two in vivo tests.

Miksa et al did study interlaboratory variability by analysing a population of 44 commercially available sunscreens from the EU market (SPF 15–50). By compounding a coefficient of variation (%CV) according to the number of in vivo SPF values from different laboratories (sunscreens were tested in at least three different laboratories) they could determine that using the average result from 3 to 4 independent SPF tests would reduce variability significantly.37 This approach could be considered as an alternative to a tolerance level, however, and as it is the case with the ‘decisive test’ and the ‘pooled analysis’ options discussed before, it would be uneconomical, possibly unfair and unethical, besides, it would also require making changes to current regulations in several places, such as not only for competent authority’s decision-making but also for labelling products before launch.17 35 36

Table 3 outlines the advantages and disadvantages of various possible solutions to aid decision-making by competent authorities.

Table 3
|
Possible solutions to remove randomness in the competent authority’ decision-making process when confronted with the result of an SPF test that challenges the label of a commercialised sunscreen

There possibly exist other sources of evidence that could be used to inform the tolerance level. Or maybe additional efforts should be devoted to produce that evidence. Anyhow, whichever current or new evidence is used, a consensus must be reached among all relevant stakeholders on what can be considered for ‘acceptable differences’.

Conclusions

With a recognised suboptimal use of sunscreens in real-world settings and a dilution of the official public health message in the mainstream media, complementary measures such as in-market control systems become even more crucial. However, sunscreen labelling regulations in the EU may not be fit for purpose, since reliable sunscreens (eg, the very reference standards from the ISO24444 method) could be mislabelled if they were commercialised. A legal landscape that leaves competent authority’s decision-making to chance is not consistent with having neither robust in-market control systems nor relevant labels for the SPF and categories of sunscreens; and it thus compromises both conveying meaningful consumer information and promoting effective public health campaigns about the benefits of sunscreen use.

Amending current regulations with a tolerance level for discriminating SPF results as they relate to labelling would eliminate volatility in the decision-making process. This could be a solution to many of the SPF mislabelling problems seen around the world and it could become an instrument to indirectly moderate messages about sunscreen use delivered by non-official sources. The EU has already implemented the use of tolerance levels to aid regulatory decision-making in other fields; and implementing it for the case of sunscreen’s SPF only requires one change within current cosmetics law without altering the way sunscreens are routinely labelled by manufacturers.

Given the importance of having a robust legal landscape that provides certainty and safety to all stakeholders and that sets solid foundations upon which public health initiatives can reliably be built, future research should aim at generating the necessary evidence to inform what an acceptable tolerance level should be so that legislators can incorporate it into current in-market control processes.