VISUALIZATION,
GIS AND GEOSTATISTICS FOR INTERPRETATION OF GEOCHEMICAL DATA.
Katrin Grünfeld
Division of
Geoinformatics, Dept. of Geodesy and Photogrammetry, Royal Institute of
Technology, 100 44 Stockholm, Sweden
This paper presents
a new approach for visual exploration and analysis of multivariable regional
geochemical data. We study techniques from geostatistics, Geographic
Information Systems (GIS) and visualization, and evaluate all three methods
with respect to their advantages and limitations. We suggest a combination of
selected statistical, geostatistical, GIS and visualization tools for analysis
and integration of geochemical data. Emphasis is put on applying interactive
visualization techniques to multivariable geochemical data sets. Examples of
visual displays are provided for data on heavy metal contents in bedrock, till
and biogeochemical samples as well as time-related moss monitoring data.
In Sweden, much
information on the distribution of chemical elements in rocks as well as their
dispersion and concentration in soil and water has been obtained by regional
geochemical sampling of rocks, sediments, and plant roots from streams. Those
three types of samples form an integrated strategy in environmental research.
Interpretation of geochemical data is of great importance and includes
application of different statistical, geostatistical and GIS techniques to
describe and analyse the data, and to produce geochemical maps.
We consider that
above-mentioned data analysis and display methods are not sufficient to extract
and present information in geochemical data with desirable level of detail. Any
of those methods cannot explicitly describe the relations and associations of
elements in multivariate data, or study the pathways of elements through
different sample media. The missing pieces of information could be acquired
using available visualization techniques for multidimensional data.
The selected study
area of 100 x 100 kilometers is located in southern Sweden, and the data is
represented by contents of cobalt (Co), copper (Cu), lead (Pb), zinc (Zn) and
vanadium (V) in 91 bedrock samples, 1411 till samples, and 1530 biogeochemical
samples. In addition, moss monitoring data from three surveys (conducted in
1985, 1990 and 1995), totally 521 records, were selected for an extended area
of 300 x 300 kilometers. The data (vector data and text files) were extracted
from the databases in Geological Survey of Sweden (SGU). Composite bedrock
samples represent all major types of bedrock. The glacial till samples are from
below the zone of weathering and element contents in fine fraction are
analysed. Biogeochemical samples consist mostly of plant roots from stream banks,
and reflect both natural and anthropogenic load of metals in stream water.
We present four
complementary methods used to analyse our geochemical data.
1. Statistical methods are
commonly used to describe data. In statistics, outliers (which are often
present in geochemical data) do not belong to the data. Data transformations or
removal of outliers is therefore a common practice. However, that approach is
not sufficient to make the geochemical data meeting the assumptions about
independent variables, normality of distribution and independent observations.
At the same time, histograms and percentiles may be considered as quite robust
and therefore suitable techniques for describing untransformed geochemical
data. Histograms were be plotted for till, biogeochemical, and moss monitoring
data. For one metal in till data, histograms of 9 overlapping sub-areas of
equal size were produced to study the influence of spatial component to the
histogram shape. As the ranges of concentration values vary substantially, and
in order to be able compare the metal contents in the two sampled media,
percentiles were calculated and plotted for both till and root sample data.
Sensitivity of the calculated percentiles was assessed through deleting 1 % of
the high-valued samples.
2. GIS provide a
collection of tools for display and analysis of geo-referenced data. At the
same time, use of visual tools has some drawbacks, for example, selection of
appropriate display methods, and estimation of the quality of outcomes. A raster
GIS - Idrisi for Windows version 2 - was used for visual displays,
interpolations, and integration of the different data types. Till and plant
root data were interpolated to a grid with resolution of 100 meters, using
inverse distance weighting and Thiessen polygon techniques, respectively.
Percentile maps were produced using interpolated data, and bedrock geochemistry
was visualized with help of point symbols.
3. Geostatistics offers a
collection of tools to quantify spatial relationships. Geostatistical
techniques provided by GSTAT version 2.0 geostatistical package (Pebesma and
Wesseling, 1998) were used for both studying spatial structures in till sample
data, and for interpolation. Experimental variogram graphs of heavy metal data
were calculated and plotted for different lag distances. Some kind of
transformation was needed to improve the shape of experimental variograms in
order to fit theoretical variogram models to them. The suitable transformation
type was defined using histograms of original data. Cross-validation procedure
was used to test the variogram models for self-consistency, and ordinary block
kriging (block size 100 meters) was performed to estimate the values in
unsampled locations.
4. Visualization is an
approach to data analysis that stresses a penetrating look at the structure of
data, and offers several interactive tools for data exploration. Visualization
techniques, such as scatterplot, parallel coordinate and glyph displays -
provided by visualization package XmdvTool version 2 (Ward, 1994) - were
applied to all available data sets to investigate the relations between
elements, and find patterns and element associations within the data. Moss
monitoring data were integrated into one data file to visualize time-related
trends in data. Till and biogeochemical data sets were integrated in order to
study the pathways of elements from soil to water.
Histograms give a
good overview of the minimum and maximum values and their distribution in the
data. Sub-area histograms showed differences in shapes and included thus
information about the spatial component in data, even if the division of the
study area was subjective. In many cases when the ranges of element contents
extend into thousands of measurement units (biogeochemical sample data),
histograms tend to become visually less informative. Percentiles give reliable
information about the distribution of elements, as the removal of 1% of the
samples from the high-ranged values did not have any considerable influence on
the percentile values. Plots of percentiles for till and root data showed a
presence of local anomalies, and relative enrichment of element contents in
stream water, compared to the natural geochemical background levels in till.
Interpolated
geochemical maps of metals in till showed presence of high frequency noise.
Biogeochemical data should ideally be interpolated within the stream network or
watersheds, but such information was not available with the required level of
detail. Interpolation using Thiessen polygons made it possible to display the
percentile maps that could be then integrated with till and bedrock geochemical
data. Time-related data were not analysed with GIS, because the sampled
locations differed for each year, and data was too sparse to interpolate.
Regarding the outcomes of GIS, most of the results are obtained from visual
displays, while the confidence limits cannot be routinely estimated and
communicated on the maps.
Kriging
interpolation produced maps of estimated values and variances, and the presence
of noise was reduced, resulting in relatively smooth element maps. Experimental
variograms showed a presence of outliers, which could have influenced the
variogram shapes and model parameters even after the data transformations. The
quality of the results from kriging may also be considered uncertain because of
the subjective decisions made about lag distance, type of transformation, and
model fitting. At the same time, variograms are undoubtedly helpful in studying
the continuity of spatial variables.
Visualization
techniques proved to be excellent tools in discovering the hidden structures in
data. The original level of detail is preserved and the trends are also
visible. Glyph displays were used in identifying and grouping similar samples
in multivariate data sets (Figure 1). Parallel coordinates made it possible to
follow individual samples through all dimensions (Figure 2), and identify the
true multivariate outliers to be removed or replaced. Geochemical anomalies
could be located and separated using their ”multivariate fingerprints” -
associations of metals. Scatterplots were valuable in studying the visual
correlations between the metals in multidimensional space. Exploration of
time-related data showed the predominance of visualization techniques over GIS
in integrating and analysing sparse data from different surveys. Integration of
till and biogeochemical data sets gave interesting results about the element
cycles. Visualization techniques work with sparse and dense, quantitative and
qualitative data, and are therefore very promising in interpretation of
geochemical, geological and other environmental data. The only limitations of
used visualization techniques are related to visual appearance of displays in
case of big differences in concentration ranges.
Combination of
statistical, geostatistical, GIS and visualization techniques for geochemical
data interpretation enhances the information obtained with each method
separately, and gives more complete description of the distribution and
associations of metals.
Pebesma EJ, Wesseling CG
(1998), Gstat, a program for geostatistical modelling, prediction and
simulation. Computers and Geosciences. 24(1): 17-31.
Ward MO (1994),
XmdvTool: Integrating multiple methods for visualizing multivariate data.
Proceedings of Visualization ’94, pp. 326-333.

Figure
1. Glyph display of bedrock geochemical data.

Figure 2. Parallel coordinate display of bedrock
geochemical data.