Managing Fitness-for-Use

Management refers to the strategies and methods adopted to mitigate against uncertainty in spatial databases and to reduce the uncertainty absorption required of the user (Bedard, 1986; Hunter and Goodchild, 1993; Frank, 2008). Any strategy needs to anticipate the entire GIS process from the input of data to the output of analytical products and is a quality assurance process. Beginning with identification of suitable data sets for a project, there is a need from the outset to assess suitability and reliability. They provide further descriptions pertaining to the objects in the database and ideally consist of a series of standardized attributes (Canadian General Standards Board, 1991). Coming under this umbrella then, are definitions of entities and attributes, measurement and coding practice, rules used for spatial delimitation, data sources, and data quality. This gives rise to the notion of spatial data audit.

Thus, the theoretical purpose of metadata is to allow a user to know the nature of the data and its compilation, in particular, the physical and conceptual compatibility of the data for integration and use with other data sets (Hootsman and van der Wel, 1993). Their value and reliability can be judged. Not surprising then, the main consideration of metadata has been in the context of data transfer standards. For example, the U.S. Spatial Data Transfer Standard (SDTS) (National Institute of Standards and Technology, 1992) requires a data quality report that specifically requires information on lineage, positional accuracy, attribute accuracy, logical consistency, and completeness. The use of all these data quality modules in a data transfer is mandatory. Key standards for metadata are: ISO/TC211 Geographical Information Metadata Standard, FGDC-STD-001-1998 Content Standard for Digital Geospatial Data, and ISO 15836 Dublin Core Metadata Element Set.

While metadata will allow a user to assess the reliability of the data for use in a particular application and compatibility for integration with other data sets, they may not be sufficient to assist users in assessing fitness-for-use after propagation of uncertainty during analyses, particularly because such an assessment is context-specific. It is in response to issues such as these that Hunter and Goodchild (1993) have developed an overall strategy for managing uncertainty in spatial databases (Figure 8.29). Data and the system (hardware and software) are first evaluated for error separately and then evaluated again when combined to form an analytical product. These errors then need to be judged in context and communicated so that choices can be made by the user between error reduction (data or system upgrading) and error absorption (accepting the risk in some way).

This type of model, while structurally useful, leaves the specific methodology to the user. What is needed is a conceptual framework that can act both as a strategy and direct specific methodology. The framework presented in Figure 8.30 is based on a communications model (Shannon, 1948; Bedard, 1986) in which there must be sufficient flow of information in order to reduce uncertainty. In this case, not only does data about the real world need to be converted and communicated as information (by means of GIS) to a user/decision maker, but there must also be sufficient communication about the quality of that information in order to reduce uncertainty in evaluating its fitness-for-use. The initial focus of the framework is on “context zero,” the original context in which data are collected, presumably in response to and as specified for a specific or related range of uses.

In surveying the real world (or perceived reality), the observers are expected to record or generate measures of positional, thematic, and temporal uncertainty of their data in ways appropriate to the nature of the data being collected and the technology in use. Observers are likely to have their own professionally/culturally conditioned view of the real world and may well be distinctly different from the eventual users of the data. Truth in quality reporting at the highest possible resolution and the recording of metadata are, thus, important elements in judging reliability. Aggregate measures of quality may be produced and, where recorded in the metadata, will need to be referred to by the user.

Nevertheless, these may not be sufficiently discriminating to give an indication of the spatial variability in quality. For extensive thematic layers compiled possibly by several observers or on different occasions, such variability is likely to be an important component in judging reliability and ultimate evaluation of fitness-for-use of GIS products. Therefore, observers should preferably record quality measures pertaining to individual objects or entities within a thematic layer, such as in the fuzzy set example above.

Where a number of thematic layers are to be used in an analysis, it may be necessary to integrate a range of quality measures in the propagation of uncertainty. Current research has focused on propagating a specific individual quality measure (such as PCC, variances, or probabilities) common to all the thematic layers. Instead, the framework provides for a mapping, in the mathematical sense of f: X → Y, from a range of quality measures into a common propagation metric M. A prime candidate for M are fuzzy measures (fuzzy sets, fuzzy numbers) into which measures of possibility, plausibility, belief, certainty, and probability can be transformed (Graham and Jones, 1988). The use of M and accepted mappings into it would provide observers with greater flexibility in their choice of appropriate quality measures.

Thus, the overall mapping from a domain of inputs to a set of real decisions becomes where Ωu = uncertainty in the domain inputs, Ms = common propagation metric at the start of analysis, Mo = output metric at the end of analysis, Ru = level of uncertainty in the set of real decisions. As we have seen with fuzzy sets, the propagation of uncertainty through GIS analyses can result in a propagation metric, which is not easily intelligible to a user. It needs to be rendered intelligible through a second mapping from M to a fitness-for-use statement or indicator that corresponds to the user’s real world model pertinent to the application. In this way, meaningful, application-specific visualizations of information quality can be generated and incorporated into decision making. SA allows the user to assess the robustness of the information quality and explore the contribution of individual thematic layers with specification for upgrade where necessary.

The user is then able to evaluate overall fitness-for-use and take responsibility both for the use of the base data and for the use of the analytical outputs. Because the framework provides the basis for handling uncertainty and evaluating fitness-for-use, users can continue to take this responsibility in all subsequent contexts where the base data are made available for use. Data can have a long shelf life and evaluations of their fitness-for-use in applications should be made possible over their entire life. This framework is intended for implementation with existing data structures in GIS software. For software developers, it provides a framework on which to develop suitable functionality for the handling of uncertainty.