Digital Representation of Phenomena

In Chapter 2, we saw how the digital representation of reality was driven by the data model, which is then translated into a data structure and finally into a file structure. In GIS, we have basically two ways of representing spatial data: raster or vector. In the raster approach, accuracy is going to be limited by cell size, and while the concept of accuracy is essentially independent of the issue of resolution, cell size will limit the minimum error that can be measured (Chrisman, 1991). In vector, points that subtend lines will be recorded with high precision regardless of rounding at the time of measurement. Thus, a line ending at a point with x coordinate measured as 12.3 rounded to the nearest 0.1 nevertheless may be stored implicitly as 12.300000 and would be considered as not joining to another line ending at 12.300001, though this would not be warranted within the initial precision of measurement.

When lines are snapped to form polygons, the software forces the point of snap to be exactly the same number for both lines to the nth decimal place. However, different hardware may handle floating-point arithmetic differently, so, for example, I have found in moving data sets from an IBM server to a Sun server and vice versa (both using the same GIS software in a UNIX operating system) that some polygons no longer snap—it’s the nth decimal place. The same goes for attribute data. Whether vector or raster is used, space needs to be partitioned into discrete chunks in order to be handled digitally. A cell or a polygon needs defined boundaries whether implicit or explicit. Each cell or polygon is homogeneously defined as belonging to a particular class leading to abrupt change from one class to another at the boundaries. This reflects the reductionist nature of GIS already mentioned above where there is a tendency to reduce the complexity of the real world to discrete spaces characterized by a number of discrete classes. Another effect of digital representation is that you can zoom in, and zoom in yet farther and all the lines remain pixel-thin implying a high level of accuracy that is again not warranted. I call this the “infinite zoom” problem. This leads users to believe that they can overlay data surveyed at, say, 1:1000 scale with data surveyed at 1:100,000 scale, whereas they would never attempt to do this with paper maps.

Natural Variation

Natural variation exists in our landscapes because of the complexity of the systems at work and the multiplicity of causal factors that operate. These causal factors vary both singly and in combination along environmental gradients. They are also reinforced or dampened by internal feedback loops. Temporal variability exists due to fluctuations in the external environment. Our tendency is to handle such complexity through inductive generalization into discrete, mutually exclusive, homogeneous classes (the dominant GIS mode of data modeling discussed above), but we are merely creating an interpretation of reality. Just as models are simplifications of reality, so too are our mapped representations. Hence, Burrough’s (1986a) statement that “many soil scientists and geographers know from field experience that carefully drawn boundaries and contour lines on maps are elegant misrepresentations of changes that are often gradual, vague, or fuzzy.” It is our intuitive need to distinguish boundaries within continua that lead to many spatial data problems.

Suppose we had decided to map vegetation in three classes: woodland, shrub, and grassland. In the upper part of Figure 8.4, our job is easy; there
are homogeneous groups of the three vegetation types to which we can affix boundaries with reasonable certainty. In my 30 years of mapping vegetation, such “convenient” natural landscapes are infrequent. Most of the time it looks like the lower part of Figure 8.4, but the mapping task still needs to be done. We could introduce more classes, such as “shrub with trees” and “shrub with grass,” and all the other possible combinations that exist. However, this adds a level of complexity that doesn’t necessarily ease our problem because how many trees do we need before we abruptly change from “shrub” to “shrub with trees”? Our maps may end up with a myriad small polygons that make analysis orders of magnitude more difficult. Some would say it’s a matter of scale, you just need a finer resolution. I defy anybody to out and unambiguously peg on the ground the boundary to some woodland: Do you take the tree trunks, the extent of the crown, what about the roots? No, more often than not we just have to deal with it and interpret some boundaries. This then leads to series of polygons that users then interpret as homogeneous woodland, shrub, and grassland when in reality there is a degree of heterogeneity. Figure 8.5 gives a contingency table of uncertainties that can arise consequent on our treatment of natural variation.

As Bouille (1982) states: “Most of the phenomena that we deal with … are imperfectly organized, incompletely structured, not exactly accurate, etc.
In a word, the phenomena are ‘fuzzy.’ However, we must not reject fuzzy data, we must not transform them into more exact data; we must keep them as fuzzy data and process them as fuzzy information by fuzzy operators producing fuzzy results.” We will return to the issue of fuzziness later in this chapter.