We have already touched on this in looking at natural variation. While some methods of data collection may appear highly objective (e.g., laboratory measurement of soil pH values from field sampling), the interpretation of what these results mean is often a matter of judgment. Other forms of data collection are often highly subjective and dependent upon expert knowledge. Let us take aerial photographic interpretation (API) as an example. A lot of spatial data is collected in this way. A number of authors have studied the consistency and correctness of API. Congalton and Mead (1983) tested five interpreters on classes of tree cover. Although the results differed, they were found to be not significantly different at the 95% confidence level. Yet Drummond (1987), on a more varied test of nine land use classes carried out by five experienced, midcareer professionals, found that the superimposed results had considerable variability. Where contrasting land uses were juxtaposed (e.g., an area of agriculture in a woodland clearing), boundary conformity was high.
Villages, on the other hand, which in this area tend to have diffuse boundaries as well as classes, such as “fallow bush” (which can easily be confused with other cover types), tended to have low boundary conformity. Fookes et al. (1991) were able to qualitatively compare eight interpretations carried for the Ok Madam site, Papua, New Guinea, where a 35-million m3 landslide occurred as a result of construction work. The interpretations, which required a high level of skill, were very different both in style of presentation and in their conclusions. Fookes et al. went on to note that the more correct and informative conclusions were based on interpretations that deduced the active processes rather than relying solely on the recognition of individual features. Carrara et al. (1992) found some 50% discrepancy between interpretations of individual landslide features mostly due to uncertainty in mapping old, inactive landslide bodies. However, once the individual features were extrapolated over landform units, the results were felt to be acceptable (83%) despite loss of resolution. One could conclude from these studies that when you ask n experts, you’ll get n somewhat different opinions, but that if you combine these opinions you may well have a model of uncertainty from which to work. Finding n experts is not always feasible, is likely to be time consuming, and, above all, an expensive way to collect data. This leads us to consider the contexts within which such experts work.
GIS and environmental modeling can be both a research and a professional activity. It is, however, predominantly an applied activity often with the analyst working in a consultant–client relationship, which has important ramifications. Whereas a researcher’s primary concern is with understanding, the goal of a consultant is action on his or her recommendations. To achieve this goal, the consultant should exercise judgment, based on experience and intuition, and focus predominantly on the principal variables that are under the client’s control (Block, 1981). Blockley and Henderson (1980) consider the major dissimilarity between science and, say, engineering to be in the consequences of incorrect prediction. Whereas the scientist is concerned with precision, objectivity, and truth, and attempts to falsify conjectures as best he or she can; for the engineer, falsification of conjectures and decisions means failure, which must be avoided. The engineer strives to produce an artifact (road, dam, bridge) of quality (safe, economic, aesthetic) and, therefore, is “primarily interested in dependable information … is interested in accuracy only to the extent that it is necessary to solve the problem effectively” (Blockley and Robinson, 1983; see also Frank, 2008). Thus, context (science versus engineering) has important pragmatic quality implications with regard to the collection and use of data in GIS and environmental modeling.
Salgé (1995) defines semantic accuracy as the “pertinence of the meaning of the geographical object” as separate from its geometrical representation. The exact meaning of common words used to characterize classes of objects is a frequent problem in database integration. Let me provide a real example. In Chapter 6, we looked at a basin management planning project in Hong Kong. As part of this project, I mapped the land cover over large areas of the territory and one of the classes was “village.” The definition of “village” here was a cluster of predominantly residential buildings within a rural environment and was constructed as a separate class to differentiate the runoff characteristics from those of the surrounding fields. The Planning Department, on hearing that villages had been mapped digitally approached me to explore acquisition of the mapping. However, I was aware that their definition of “village” was quite different and referred to traditional settlements (as opposed to more
recent informal settlements) with the boundary extending 100 m beyond the outer buildings of the settlement. This is not what had been mapped and there was considerable scope for confusion, inadvertent misuse of the data, and eventual dissatisfaction with the data provider. Fortunately, this situation was avoided, but it is very easy to occur where commonly used words can have ambiguity and where precise definitions of feature classes are not available.
Finding a Way Forward
Following on from the above discussion, we can now flesh out Figure 8.2 with much more detail (though by no means exhaustive) on the specific causes of the four main types of uncertainty (Figure 8.6). This covers all aspects of primary data collection, deployment of secondary data, processing by hardware and software, and the final use of the analytical products. It’s a veritable minefield. The concept of fitness-for-use has already been mentioned. The data quality debate has been too narrowly focused on error in data rather than the wider consideration of uncertainty. Certainly information on data quality “provides the basis to assess the fitness of the spatial data for a given purpose” (Chrisman, 1983a); however, this is closer to our definition of the reliability of data inputs. For the products of analysis, the user needs to evaluate, within the specific context, how the initial reliability translates into fitnessfor-use depending on the propagation of uncertainty (its accentuation or dampening) for the methods of analysis (or simulation) used. Thus, fitnessfor-use is not a fixed attribute of analytical products, but is context specific.
So, for example, data errors in a site analysis carried out to identify suitability for growing potatoes may not detract from the usefulness of the analytical products for making decisions. The same data sets used in the same way, but this time as part of a site suitability analysis for the disposal of toxic wastes, may be deemed unfit for use because of the risks associated with the level of resulting uncertainty. This would imply a need to pay attention, through a managed process, to data reliability, modeling the propagation of uncertainty and where necessary taking action to reduce the levels of uncertainty. This is embodied in Veregin’s (1989a) “hierarchy of needs” illustrated in Figure 8.7. The first step—source identification—we have already explored in the paragraphs above culmination in Figure 8.6. The following sections will progressively move up the hierarchy.