VISUALIZATION, GIS AND GEOSTATISTICS FOR INTERPRETATION OF GEOCHEMICAL DATA.

 

Katrin Grünfeld

Division of Geoinformatics, Dept. of Geodesy and Photogrammetry, Royal Institute of Technology, 100 44 Stockholm, Sweden

katring@geomatics.kth.se

 

ABSTRACT

This paper presents a new approach for visual exploration and analysis of multivariable regional geochemical data. We study techniques from geostatistics, Geographic Information Systems (GIS) and visualization, and evaluate all three methods with respect to their advantages and limitations. We suggest a combination of selected statistical, geostatistical, GIS and visualization tools for analysis and integration of geochemical data. Emphasis is put on applying interactive visualization techniques to multivariable geochemical data sets. Examples of visual displays are provided for data on heavy metal contents in bedrock, till and biogeochemical samples as well as time-related moss monitoring data.

 

INTRODUCTION

In Sweden, much information on the distribution of chemical elements in rocks as well as their dispersion and concentration in soil and water has been obtained by regional geochemical sampling of rocks, sediments, and plant roots from streams. Those three types of samples form an integrated strategy in environmental research. Interpretation of geochemical data is of great importance and includes application of different statistical, geostatistical and GIS techniques to describe and analyse the data, and to produce geochemical maps.

 

We consider that above-mentioned data analysis and display methods are not sufficient to extract and present information in geochemical data with desirable level of detail. Any of those methods cannot explicitly describe the relations and associations of elements in multivariate data, or study the pathways of elements through different sample media. The missing pieces of information could be acquired using available visualization techniques for multidimensional data.

 

The selected study area of 100 x 100 kilometers is located in southern Sweden, and the data is represented by contents of cobalt (Co), copper (Cu), lead (Pb), zinc (Zn) and vanadium (V) in 91 bedrock samples, 1411 till samples, and 1530 biogeochemical samples. In addition, moss monitoring data from three surveys (conducted in 1985, 1990 and 1995), totally 521 records, were selected for an extended area of 300 x 300 kilometers. The data (vector data and text files) were extracted from the databases in Geological Survey of Sweden (SGU). Composite bedrock samples represent all major types of bedrock. The glacial till samples are from below the zone of weathering and element contents in fine fraction are analysed. Biogeochemical samples consist mostly of plant roots from stream banks, and reflect both natural and anthropogenic load of metals in stream water.

 

METHODS

We present four complementary methods used to analyse our geochemical data.

 

1. Statistical methods are commonly used to describe data. In statistics, outliers (which are often present in geochemical data) do not belong to the data. Data transformations or removal of outliers is therefore a common practice. However, that approach is not sufficient to make the geochemical data meeting the assumptions about independent variables, normality of distribution and independent observations. At the same time, histograms and percentiles may be considered as quite robust and therefore suitable techniques for describing untransformed geochemical data. Histograms were be plotted for till, biogeochemical, and moss monitoring data. For one metal in till data, histograms of 9 overlapping sub-areas of equal size were produced to study the influence of spatial component to the histogram shape. As the ranges of concentration values vary substantially, and in order to be able compare the metal contents in the two sampled media, percentiles were calculated and plotted for both till and root sample data. Sensitivity of the calculated percentiles was assessed through deleting 1 % of the high-valued samples.

 

2. GIS provide a collection of tools for display and analysis of geo-referenced data. At the same time, use of visual tools has some drawbacks, for example, selection of appropriate display methods, and estimation of the quality of outcomes. A raster GIS - Idrisi for Windows version 2 - was used for visual displays, interpolations, and integration of the different data types. Till and plant root data were interpolated to a grid with resolution of 100 meters, using inverse distance weighting and Thiessen polygon techniques, respectively. Percentile maps were produced using interpolated data, and bedrock geochemistry was visualized with help of point symbols.

 

3. Geostatistics offers a collection of tools to quantify spatial relationships. Geostatistical techniques provided by GSTAT version 2.0 geostatistical package (Pebesma and Wesseling, 1998) were used for both studying spatial structures in till sample data, and for interpolation. Experimental variogram graphs of heavy metal data were calculated and plotted for different lag distances. Some kind of transformation was needed to improve the shape of experimental variograms in order to fit theoretical variogram models to them. The suitable transformation type was defined using histograms of original data. Cross-validation procedure was used to test the variogram models for self-consistency, and ordinary block kriging (block size 100 meters) was performed to estimate the values in unsampled locations.

 

4. Visualization is an approach to data analysis that stresses a penetrating look at the structure of data, and offers several interactive tools for data exploration. Visualization techniques, such as scatterplot, parallel coordinate and glyph displays - provided by visualization package XmdvTool version 2 (Ward, 1994) - were applied to all available data sets to investigate the relations between elements, and find patterns and element associations within the data. Moss monitoring data were integrated into one data file to visualize time-related trends in data. Till and biogeochemical data sets were integrated in order to study the pathways of elements from soil to water.

 

RESULTS AND DISCUSSION

Histograms give a good overview of the minimum and maximum values and their distribution in the data. Sub-area histograms showed differences in shapes and included thus information about the spatial component in data, even if the division of the study area was subjective. In many cases when the ranges of element contents extend into thousands of measurement units (biogeochemical sample data), histograms tend to become visually less informative. Percentiles give reliable information about the distribution of elements, as the removal of 1% of the samples from the high-ranged values did not have any considerable influence on the percentile values. Plots of percentiles for till and root data showed a presence of local anomalies, and relative enrichment of element contents in stream water, compared to the natural geochemical background levels in till.

 

Interpolated geochemical maps of metals in till showed presence of high frequency noise. Biogeochemical data should ideally be interpolated within the stream network or watersheds, but such information was not available with the required level of detail. Interpolation using Thiessen polygons made it possible to display the percentile maps that could be then integrated with till and bedrock geochemical data. Time-related data were not analysed with GIS, because the sampled locations differed for each year, and data was too sparse to interpolate. Regarding the outcomes of GIS, most of the results are obtained from visual displays, while the confidence limits cannot be routinely estimated and communicated on the maps.

 

Kriging interpolation produced maps of estimated values and variances, and the presence of noise was reduced, resulting in relatively smooth element maps. Experimental variograms showed a presence of outliers, which could have influenced the variogram shapes and model parameters even after the data transformations. The quality of the results from kriging may also be considered uncertain because of the subjective decisions made about lag distance, type of transformation, and model fitting. At the same time, variograms are undoubtedly helpful in studying the continuity of spatial variables.

 

Visualization techniques proved to be excellent tools in discovering the hidden structures in data. The original level of detail is preserved and the trends are also visible. Glyph displays were used in identifying and grouping similar samples in multivariate data sets (Figure 1). Parallel coordinates made it possible to follow individual samples through all dimensions (Figure 2), and identify the true multivariate outliers to be removed or replaced. Geochemical anomalies could be located and separated using their ”multivariate fingerprints” - associations of metals. Scatterplots were valuable in studying the visual correlations between the metals in multidimensional space. Exploration of time-related data showed the predominance of visualization techniques over GIS in integrating and analysing sparse data from different surveys. Integration of till and biogeochemical data sets gave interesting results about the element cycles. Visualization techniques work with sparse and dense, quantitative and qualitative data, and are therefore very promising in interpretation of geochemical, geological and other environmental data. The only limitations of used visualization techniques are related to visual appearance of displays in case of big differences in concentration ranges.

 

Combination of statistical, geostatistical, GIS and visualization techniques for geochemical data interpretation enhances the information obtained with each method separately, and gives more complete description of the distribution and associations of metals.

 

REFERENCES

Pebesma EJ, Wesseling CG (1998), Gstat, a program for geostatistical modelling, prediction and simulation. Computers and Geosciences. 24(1): 17-31.

Ward MO (1994), XmdvTool: Integrating multiple methods for visualizing multivariate data. Proceedings of Visualization ’94, pp. 326-333.

 

 

 


 


Figure 1. Glyph display of bedrock geochemical data.

 

 

 


 

 


Figure 2. Parallel coordinate display of bedrock geochemical data.