Purpose-oriented classification of PHSS
Distribution of disease
The distribution maps represent a spatially refined assessment of a particular public health issue or a disease. These provide a starting point for various public health interventions in terms of developing strategies for control and assessing disease burden. Every surveillance system has data that can be used to represent the distribution, but some surveillance systems and health maps have the explicit function of showing disease distribution. Thus, these systems collect, refine and analyse the data primarily to represent distribution. Moreover, such maps can either show a representation of global distribution of the disease or it can be focused on a geographic region.
A publicly available public health map on www.healthmap.org developed by Friefield et al10 and Brownstein et al11 is a web-based tool which also has other resources available. The map is developed through online content such as news reports, blogs, alerts and other online tools to give a distribution of 87 disease categories in 89 countries. The map was constructed in Freifeld et al10 by analysing 778 online reports about disease outbreaks. The map version in11 provides real-time disease outbreak but the major purpose of the map as it is publicly available is a disease distribution system.
The global distribution of diseases can also be academically designed ventures with the ability to apply different academic tools to analyse and visualise the global burden of the disease. One such example is the global distribution of maps of the leishmaniases diseases carried out by Pigott et al.12 To understand the global distribution of the disease information from various sources such as published literature, online reports, strain archives and genetic data from GenBank was aggregated. The result was detailed maps of the distribution of the disease with the estimate that around 1.7 billion people live in areas where they are at the potential risk of leishmaniasis. Insights such as affected population, level of risk and the distribution of risk for earlier intervention and control can be gained from surveillance system with distribution as their primary focus. Another such example is the yellow fever distribution map developed by Shearer et al13 for worldwide infection risk zones. Geographical records were analysed to find 5×5 km regions across all the risk zones. The regression model used also took into consideration environmental, biological, vaccination coverage and spatial disease variability. The vaccination data estimate found out that in the risk zone the vaccination averts between 94 336 and 118 500 cases of yellow fever annually.
Global distribution maps are not limited to diseases only and they are used also for estimating factors and causes associated with a particular disease. Messina et al14 carried out a global environment suitability study for Zika virus. Through a specie distribution modelling, it was shown that tropical and subtropical regions globally have suitable environmental conditions for the spread of the virus. Specie distribution models for finding out the niche environment for developing vectorborne diseases are also carried out for other diseases too. Examples of these include dengue,15 leishmaniasis12 and Crimean-Congo Haemorrhagic Fever.14 Distribution maps are also used to measure attitude towards a public health issue. Twitter activity and sentiment analysis were used to find out public attitude towards immunisation and awareness about vaccination campaigns.16 Twitter and Facebook were analysed to find out the attitude of the people and public leaders while analysis of news showed a different set of actors. This varying preference of use of social media and topic-wise selection of social media gives insights into the demographic that prefers one social media type over the other. Similar studies about public attitude towards a health-related issue can engulf studies such as finding attitude of people towards the use of a pharmaceutical product or recording averse events associated with a medication.17
Monitoring of disease
While surveillance and monitoring can mean the same thing, in the context of this paper, surveillance is a broader term which includes public health maps and systems that are meant to find out the distribution of disease and make predictions in the future based on the collected data. For the purpose of this review, we classify those surveillance systems and health maps under monitoring which use real-time data to update the system, either demographic-wise and geographical distribution-wise or risk-level wise. Data collection and geographical or demographic distribution are included in this category, but the collected data are a continuous process with various methods used to sift and verify the data. Over the last few years, web-based and social media-based maps have been developed under this category which regularly gets updated. The real-time data can also come from medically validated sources such as EHRs, databases of diseases and so on, but not necessarily.
The surveillance systems under this classification category are often larger systems initiated by governments or large global public health organisations. The real-time web-based surveillance systems were meant to strengthen global disease surveillance systems. The first system to be developed through such an approach was the Programme for Monitoring Emerging Diseases (ProMED-Mail) established in 1994.18 It was chartered by the Federation of American Scientists with the aim to disseminate information to a wide audience in real time. After this WHO created an efficient infrastructure called Global Outbreak Alert Response Network (GOARN) which builds capacities in partnered networks to coordinate response to global disease outbreaks. From the initial news-based monitoring surveillance systems such as ProMED-Mail, the advent of social media has led to adoption of other real-time social media-based monitoring systems. These are often adopted at the national level such as the Generating Epidemiological Trends from Web Logs Like is being officially accepted by the Swedish government.19 It has been used as a complementary tool for daily surveillance by epidemiologists.
Under this classification category of surveillance systems used for monitoring, there are some which do not update automatically but the data collected are first analysed and vetted by a human expert. ProMED-Mail and GOARN20 are such systems. There also are collaborations among governments and public health organisations such as WHO in developing PHSS. One example is Global Public Health Intelligence Network21 which is a collaboration between Health Canada and WHO for early warning of potential public health threats which also includes chemical, biological radiological and nuclear induced public health threats. Some other social media-based surveillance systems add more functionality or alertness to previously implemented PHSS such as EpiSPIDER22 extracts emerging infectious disease information from ProMED, combines it with CIA Factbook, extracts location using natural language processing and then posts it on Google Maps. Google Trends is another tool widely used as news aggregator which gives topic-wise mention and frequency. It has been used in tracing epidemics, disease outbreak and distribution of diseases.
Because of COVID-19, there has been a surge in surveillance maps specifically for monitoring cases of COVID-19. Almost every government has one type or the other surveillance system in place to monitor the cases and number of COVID-19. While it will be out of the scope of this paper to mention even the most efficient and informative surveillance systems for COVID-19, the global dashboard developed by John Hopkins University Center for Systems Science Engineering is a comprehensive one.23 For modelling the outbreak and spread of COVID-19, a stochastic metapopulation epidemic simulation tool is used to simulate global outbreak dynamics. The raw data for the simulation tool are also available while the user interface is an interactive GUI. Similarly, WHO also has a comprehensive global dashboard with interactive user interface to surveil the COVID-19 outbreak and report various data associated with the epidemic.24 There have been other national-level surveillance maps for various purposes available. These maps provide access to the national-level health data and the maps are regularly updated, for example, the health map of the Australian health department.25
There are some limitations to the web-based and social media-based surveillance systems. The biggest issue is internet penetration and asymmetrical global access. The surveillance systems based on social media or web-based online news content are skewed towards developed or developing countries while major portions of the world where the access to internet is not as pervasive as in the developed world may be left out from the data collection process. The other problem is that of reliability of data which is self-reported or comes from news sources which may also not be as reliable as clinically validated data. Another issue with automated surveillance technologies is that of analysis of the language used. The machine learning algorithms used for sentiment analysis or reporting may not contain the nuances of language such as cultural tones, language shifts and colloquiums. These language barriers may affect the accuracy of detecting a disease outbreak or reporting of the disease.
Prediction of disease
Prediction about the disease outbreak in risk zones, identifying the risk zones and tracing the trajectory of a disease is the third purpose of the surveillance maps. Monitoring the disease is part of the prediction as the data collected is used for prediction through different tools. The introduction of AI and machine learning tools into PHS has given the surveillance systems the ability to accurately follow the disease and enable policy-makers to take pre-emptive action. Other than prediction AI is also used to collect the data and analyse the data. AI provides modelling tools that can assess the pattern of disease transmission and spread and can also assess public attitude and responses towards the disease. The predictions of AI are context-based through quantification of variables, responses and other factors in the interacting environment.26
The evolution of an epidemic or a disease in space, time and particular demography is a complex process involving a degree of uncertainty and non-linearity. The application of aggregated statistics and linear interactions is thus limited in predicting the pattern of transmission or outbreak of a disease in a risk zone is limited. AI tools are also used to know the distribution of complex diseases such as AI tools were employed to simulate the global distribution of mosquitoborn infections.27 The risk of dengue transmission was predicted through the use of random forest, an AI tool, in Singapore by taking into consideration the dengue, environmental, entomological and population data. Similarly, deep learning was used to predict the risk of Zika virus outbreak in Americas.28
Prediction using non-linear, unstructured and heterogeneous sources of data is also carried out over the years through surveillance systems. Thapen et al combined Twitter data with news sources to predict the outbreak detection.29 The use of Twitter and social media data for outbreak prediction can be challenging as making semantic sense of the tweets and social media posts are informal and often incomplete. Luo et al30 proposed a long-short-term RNN structure to classify tweets containing infection-related information and showed that the model outperformed conventional prediction systems. However, as noted previously, prediction-oriented surveillance systems can have problems such as unequal availability of data in terms of geospatial distribution, the heterogeneity of users and language barriers.
FluSight task is hosted by the US Centers for Disease Control and Prevention which carries seasonal influenza forecasting at the national and regional levels using weighted influenza-like illness (wILI) data. There are other tools developed which are AI based to carry out prediction based on wILI data such as the framework developed by Adhikari et al.31 There is presyndromic surveillance where disease outbreaks which are novel and cannot be placed in the current categories are predicted using different machine leaning tools such as are used in.32 33 Modelling disease transmission is also carried out by prediction-oriented surveillance systems. Scarpino and Petri34 used dynamic approaches such as permutation entropy, Markov chain simulations and epidemic simulations. Machine learning methods were used by Tripathi et al35 to predict controllability of disease on complex networks.
Prediction of outbreaks or pattern of transmission of disease involves non-linear modelling and simulation. Hence, it is a non-trivial task. The costs associated with a false positive or false negative are also high as it may result in waste of resources or negligence. Prediction of surveillance systems and health maps will benefit more from advanced AI tools such as deep learning in data collection, making the data operationalisable and analysing the data as well as modelling the disease pattern of transmission.