Modeling Error and Uncertainty in GIS

It should be noted at the outset that there is no single best approach to modeling uncertainty for the various data handling and transformation functions that source data might be subjected to in GIS. This is partly because we do not yet have a single, generally accepted theory of uncertainty in GIS (Heuvelink, 1998) and partly because different GIS functions operate on the uncertainty in different ways. Hence, we shall approach the problem of modeling error and uncertainty from the perspective of specific GIS functionality to begin with (topological overlay and interpolation) and then go on to some wider generic issues (fuzzy concepts and uncertainty analysis). Given the wide range of possible GIS functions, I have been selective here. For other overviews, see Goodchild and Gopal (1989), Heuvelink (1998), and Hunter (1999). This is an area of ongoing research and the reader is urged to consult relevant journals on a regular basis.

Topological Overlay

Topological overlay is one of the functions that characterize GIS from other types of software. The overlay operation requires that two or more data layers are superimposed or combined to produce a new, composite map. Identification of co-location of objects or feature classes through overlay is fundamental to many forms of spatial analysis. An example is the spatial co-existence approach to environmental modeling discussed in Chapter 6. The overlay operation can be carried out on both raster and vector data and may take a number of forms. For example, layers having numerical attributes can be combined using arithmetic operators (map algebra) while categorical attributes can have Boolean operators applied to them. For vector polygon overlay there are three fundamental components: (1) the determination of geometrical intersection, (2) reconstruction of topology, and (3) the assignment of attributes. Implementations may differ between vendors—ArcGIS and ArcView, for example, combine overlay and Boolean selection in one command (IDENTITY, INTERSECT, UNION). Where individual data layers contain geometric discrepancies, polygon overlay results in the creation of spurious polygons commonly known as slivers (Figure 8.11).

These small, often numerous polygons tend not to reflect reality and are derived mostly from data processing. The original discrepancies may arise from digitizing, numerical rounding, generalization, changing map projection, and from poor conflation (matching of common boundaries in different layers). Numerical errors introduced by geometrical operations on objects represented by floating-point numbers (Hoffman, 1989) will result in perturbations and creep in vertices and edges during the overlay process.

Strategies for managing or reducing slivers from overlay must be able to first determine those polygons that are truly spurious. The most popular
approach relies on an epsilon band (ε) that is a buffer zone used to represent the possible error around a point or line (Figure 8.3). The concept was introduced into GIS by Chrisman (1982, 1983b) and Blakemore (1984) based on a formulation by Perkal (1956). Smith and Campbell (1989) found that when using ε on two geomorphological factor maps, an average of 30% error could occur in simple area measurements after overlay. Caspary and Scheuring (1993) give further consideration to the shape of the ε band and present a sagging or pinched buffer zone using ε/√2 in the middle of the line merging with a disk of radius ε at the end points of the line. Zhang and Tulip (1990) use ε as a fuzzy tolerance for automatic removal of spurious polygons, while Law and Brimicombe (1994) use ε to classify mismatches when combining primary and secondary data source in GIS and thereby allowing the use of a decision tree to reconcile the two data sets. Problems naturally arise in quantifying ε. Law (1994) provides a summary of sources of error and, assuming independence, the range of ε for 1:1000 scale mapping in Hong Kong (Table 8.2). Many vendors, however, will merely allow users to specify whatever tolerances they feel appropriate for the removal of slivers during the overlay process.

The types of error evident in the results of vector overlay are not always easy to disentangle as they lie along a continuum (Chrisman, 1989). Although Chrisman considers there is no workable theory, he has devised a framework (Figure 8.12), which considers the influence of scale on how errors might be classified. At the extremes of the diagonal, slivers are clearly distinct from attribute error. It is also useful to distinguish between slivers and more serious positional errors. The intermediate ground, however, is less clear and persists as a sizeable grey area in vector overlay. Overlay operations on raster data assume that the data layers are registered to the same grid. This, in many ways, avoids the geometric problems of vector overlay and accounts for why raster is considered the easier of the data models for modeling error propagation (Goodchild, 1990b). Where positional error and attribute error occur during compilation, these tend to manifest themselves in a raster data layer as attribute errors (i.e., a grid cell assigned the wrong attribute).

These are then difficult to distinguish. In synthetic tests by Arbia et al. (1998), positional errors consequent on, say, geo-rectification of satellite imagery, vector to raster conversion and resampling to new cell size or cell orientation were found to play a prominent role in accounting for a third of the propagated error. Where arithmetic operators are used in map algebra, any uncertainty for the user may also focus on the weightings and the actual operators used to combine the layers. Decisions on weightings and arithmetic operators are made external to GIS and are more a matter of professional competence.