Issues of Algorithm for Modeling

Any user of GIS and environmental models often has a bewildering choice of algorithms to choose from. Of course, not all competing algorithms are
implemented in every software package and, for some proprietary software, information on the exact algorithms being used may be sparse, sometimes no better than a vague hint. Nevertheless, algorithm choice does often present itself and making the right choice can be important for successful modeling. As we saw in Chapter 4, the success or otherwise in using inverse distance weighted (IDW) interpolation (as one algorithm) rested on choice of parameters and even the data sampling pattern, so choice between competing algorithms is not a clear-cut case, as it also depends on how you use them in relation to the application and the available data.

Nevertheless, professionals should acquaint themselves not only with the range of algorithms that are available together with their advantages and disadvantages for use in particular situations, but should also be able to understand at least the broad consequences of fitness-for-use. Staying with the interpolation example, a popular alternative to IDW is triangulated irregular network and, in Chapter 8, we used kriging on the rainfall data. We can briefly compare the performance of these three data sets. To recall, Figure 9.7(a) shows the mathematically computed topography with 10-m contours and 50-m shading. Figure 9.7(b) shows the purposive sampling of 25 points plus the four corner points to bring interpolation to the edge of the study area without extrapolation outside the convex hull subtended by the sample points. Figure 9.7(c/d) shows the results we previously gained for IDW where r = 4 (shaded every 50 m with original contours superimposed for comparison) together with the distribution of residual errors giving a root mean square error (RMSE) of ±19.5 m. Figure 9.7(e) shows the results of an interpolation using TIN (again shaded every 50 m with original contours superimposed for comparison) with residual errors in Figure 9.7(f) giving an RMSE of ±26.9 m. For this purposive sample where points have been placed to pick out as best as possible local maxima and minima on the surface, TIN for the most part is quite good.

The problem that has arisen and is contributing to the high RMSE is with very slim triangles being formed along the boundary truncating the ridgeline. For this reason, the convex hull resulting from the triangulation of sample points should preferably fall well outside the study area boundary so that this effect can be ignored. TIN tends to produce rather rectilinear-looking results because each triangle is treated as a plane. For this reason, TIN is not very good on sparse data sets. Figure 9.7(g) shows the results of kriging using a spherical variogram model with residual errors in Figure 9.7(h) giving an RMSE of ±15.7 m. Although still not a very good result in relation to the original map, kriging has given the best result of the three and providing that the variogram can be properly constructed and fitted, this geostatistical approach is finding increasing favor among modelers. The point to be made, however, is that all interpolations are approximations and depending on the nature of the sample data and the phenomenon to be modeled by interpolation, the modeler needs to exercise considerable skill in achieving a best fit particularly where (and unlike the above examples) no “truth” model is available for immediate comparison. For a review of interpolation methods, see Lam (1983), Burrough and McDonnell (1998), or de Smith et al. (2007).

Another area of algorithm choice worth looking at in relation to quite a number of environmental simulation models that use routing over a DEM, is calculation of maximum gradient. Within the mapping sciences and GIS, such algorithm choice has been an ongoing debate. The problem is that there are so many variant algorithms—at least 12 to my knowledge—that objective comparison becomes a lengthy task. Calculation of gradient is usually carried out from a grid DEM and, because it cannot be based on a single grid cell value of elevation, a surrounding group of cells need to be used. This is usually restricted to a neighborhood 3 × 3 matrix of cells centered on the cell for which gradient is being calculated. Not all methods use all the neighboring cells within this matrix. A larger neighborhood could be used, but this might introduce unacceptable smoothing. Based on this 3 × 3 matrix, there are two broad groups of approaches: (1) those based on finite difference techniques (e.g., Sharpnack and Akin, 1969; Horn, 1981; Ritter, 1987) and (2) those based on calculating a best fit surface (e.g., Evans, 1980; Zevenbergen and Thorne, 1987).

This is only a selection of methods, and there are other variants. Skidmore (1989) tested six methods including all of the above. Srinivasan and Engel (1991) tested four methods including Horn’s, best-fit plane, and quadratic surface. Jones (1998) tested eight methods including all of the above except the best-fit plane. Skidmore, and Srinivasan and Engel both use 30 m DEM of natural terrain. Skidmore checked computed gradients against hand calculations from map contours. Srinivasan and Engel evaluated their computed gradients through its use to calculate the length of
slope and steepness of slope factors in the Universal Soil Loss Equation for which observed values were available. Jones used a synthetic surface (similar to our example topography, but generated from a 49-term polynomial) and different relative grid sizes for which true values could be known by numerical methods. The results on the synthetic surface for Formula (9.11) to Formula (9.18) above showed that the second-order finite difference (Ritter) performed best overall followed by Horn and Sharpnack and Akin, and then the quadratic surface.

Increasing accuracy followed decreasing grid size. When the synthetic surface was rotated, Ritter’s showed greater sensitivity to rotation angle than the other methods, which becomes more pronounced as grid size is reduced. Nevertheless, the RMSE remained lower at all times for Ritter’s than for the other methods. Skidmore found a statistically significant correlation between all methods used and the “true” gradient. However, the finite difference approaches yielded lower correlations with Ritter and Sharpnack and Akin giving the worst results. Skidmore also noted that spurious gradients could be calculated regardless of method in areas of flat or near flat terrain. This has important implications, for example, for routing in floodplain areas. Finally, Srinivasan and Engel found that Horn’s finite difference approach was more accurate on flatter slopes than on steeper slopes where the 3 × 3 matrix covered too great an area in relation to the length of slope on steeper sections, and was nevertheless the most accurate overall.

Differences in algorithm can make a difference of up to 286% on the calculation of the length of slope and steepness of slope factors with considerable variability all around. They conclude that “careful selection of slope prediction method is recommended.” These three studies from a comparative perspective have some contrasting results on which method might be better, but from another aspect there is a close agreement, like the interpolation algorithms, each method of calculating gradient will yield a different result. For some simulation models the effect may be negligible, for others it may be more important. Where there is likely to be model sensitivity, that sensitivity should be tested.