Bringing Data Issues and Modeling Issues Together

Digital spatial data sets have grown rapidly in scope, coverage, and volume over the last decade. We have moved from data-poverty to data-ichness. On the other hand, environmental models have steadily grown more complex, are more frequently used, are expected to deal with larger data volumes, and give better predictions over a wider range of issues from local to global scales. The abundance of digital data is leading to its own set of problems in identifying and locating relevant data and in evaluating choices of resolution, coverage, provenance, cost, and conformance with the models to be used. Furthermore, for environmental models it may not be so much the characteristics of the raw data that are the most critical, but their characteristics once converted, aggregated, and implemented in the model. Given that a modeling task may access data from multiple sources, there is the added difficulty of assessing combined performance in relation to the implementation of the simulation such that outputs have fitness-for-use. Then there are the modeling issues of choosing appropriate space–time discretization, data transformations, algorithms where choice presents itself.

Finally, as we have seen, there are difficulties in achieving an adequate calibration. Clearly, tools are required in order to help resolve the data and modeling issues discussed over the last two chapters. Should such tools be part of GIS or built into the environmental simulation model? Given the trajectory that we are on toward tool coupling strategies where the network is the core technology (Chapter 7, Figure 7.5 and Figure 7.10), neither approach need be the solution. Instead, in order to meet the diversity of requirements just listed, with sufficient flexibility, a wide range of functionality from different sources could be tightly coupled to form a quality analysis engine (QAE), as suggested by Li et al. (2000) and as an agent-based implementation by Li (2006). There now exists a richness of public domain and proprietary software, which can be used for various aspects of quality analysis. Varekamp et al. (1996), for example, have already demonstrated the availability and efficacy of public domain geostatistical software, such as GSLIB (Deutsch and Journel, 1992), PCRaster, and GEO-EAS, to which could be added GSTAT ; further tools, such as GLUE, discussed above, and spatial analysis tools, such as GeoDa. A wealth of proprietary software is also available.

Given the speed, sophistication, and growing interoperability of these tools there is little reason to invest in reimplementing these tools within GIS or environmental models, but instead to couple them within a process architecture, as suggested in Figure 9.11. Here GIS and environmental models are portrayed largely in their de facto relationship, whereby GIS integrate and preprocess spatial data inputs for simulation models and postprocess and present for visualization the simulation outputs. The QAE is a series of tightly coupled tools, which depending on the type of modeling being undertaken might include exploratory data analysis, statistics, geostatistics, interpolators, zone designers, cluster detectors, MC analysis, and tools for simulating synthetic data sets and error surfaces.

GIS would have a role in initializing the QAE with spatial data and assisting in the visualization of results. Much of the interaction is then between the tools of the QAE and the environmental simulation model in a stimulation and response mode in order to carry out both uncertainty analysis and sensitivity analysis of both data and model components. By using simulated synthetic data, for example, key issues around data quality, discretization, and model performance can be studied and understood at project inception stage, as illustrated above in establishing minimum sampling requirements using fractalized shorelines (see Figure 9.3). Using the tools of the QAE, the effects of algorithm choice could also be explored as well as analysis in support of calibration. In as much as research in this area is ongoing, there are already sufficient QAE components on the market and in the public domain together with programming languages like Visual Basic that allow the creation of wrappers, for environmental modeling and GIS professionals to proceed with their own implementations in support of their specific requirements.

In Chapter 8, we looked at the issues surrounding data uncertainties and the evolving strategies for knowing and reducing the level of uncertainty in spatial data, the analytical products of GIS, and inputs to environmental simulation models. However, as Frank (2008) has observed, it is “difficult to observe directly the effect of data quality on decisions.” While good data = good decisions is a common-sense belief, there are, as we have seen, so many intermediate steps between data collection and model output—the transformation of data into information and understanding—that better data do not necessarily lead to better decisions. In Chapter 9, we considered a range of issues in model uncertainty and, again, the evolving strategies for knowing and reducing the level of uncertainty in the outputs of GIS and environmental simulation modeling.

Here again, higher resolution, more detailed models are not necessarily the path to better decisions. In any case, as we saw from Equation (9.3), there will inevitably be some residual uncertainty. However, having gotten to this stage, a decision needs to be made by somebody: Is there a problem or isn’t there, what are the risks, should something be done about them and if so what, is it technically the most appropriate solution, will the majority agree with it, how much is it going to cost, can we afford it, should we afford it, and does it represent value for money? This typifies the decision space that needs to be explored and navigated. Many GIS analysts and professional modelers may well say it is not their decision, they just ascertain and present the facts as they see them. But as we discussed in Chapter 5 (Figure 5.7), there has to be communication with the policy makers and the public in an iterative process that should ideally bring about an informed consensus.