SAMPLING ERROR: THE NEGLECTED COMPONENT OF MEASUREMENT
UNCERTAINTY IN TRACE ELEMENT ANALYSES
Sharon Squire1, Michael H Ramsey2
and Michael, J. Gardner3
1Environmental
Toxicology, Applied Sciences, University of California, Santa Cruz, CA 95064.
E-mail: ssquire@es.ucsc.edu
2Centre
for Environmental Research, School of Chemistry, Physics and Environmental
Science, University of Sussex, Falmer, Brighton, UK, BN1 9QJ
3WRc NSF,
Henley Road, Medmenham, Marlow, Bucks, UK, SL7 2HD
Sampling is the first step in the measurement process
required to obtain the primary sample from the sampling target, such as soil in
a field, filtered water in an estuary, or atmospheric gases in a building.
Following sample preparation, the test materials can be analyzed by analytical
instrumentation. Modern analytical instruments, such as ICP-AES, are
sophisticated enough to measure analyte concentrations at trace levels (i.e. parts per trillion range) with
acceptable accuracy and reproducibility. However, improvements in analytical
measurements have raised issues of concern with regard to the collection of
primary samples, as the quality of the conclusion that can be made from a
geochemical investigation will depend on the quality of both sampling and
analysis. This implies that sampling protocols should be capable of identifying
(i) the heterogeneity of the target
analyte in both space and time and (ii)
potential contamination, loss of analyte, or an incorrect sampling
procedure. Methodologies for estimating
the systematic and random components of uncertainty are therefore necessary in
order to assess the fitness for purpose of geochemical measurements. A system
of internal quality control and inter-organizational sampling trials has
therefore been described for both spatially and temporally variable
environmental sampling targets.
The measurement process comprises
the collection of samples, the preparation of test materials in the laboratory
and the analysis of the derived test portions. The informal definition of
measurement uncertainty is “the interval around the result of a measurement
that contains the true value with high probability”1. Rather than
estimating individual analytical and sampling uncertainties separately, a
holistic approach is to consider sampling and analysis as the same measurement
process and to quantify their combined contribution to uncertainty by the
addition of their variances. Thus, measurement uncertainty can be considered to
have four component sources or error. These are the sampling and analytical
random errors quantified as precision, and the sampling and analytical
systematic errors quantified as bias. Internal quality control and
inter-organizational trials address aspect of precision, proficiency tests
address the accuracy of single results and reference materials address bias.
Such parameters have been quantified in recent research for an estimation of
uncertainty in delineating a hot spot of contaminated soil on a specially
created synthetic reference sampling target (RST), and also for estimating the
uncertainty in sampling temporally variable emissions methane, carbon dioxide
and oxygen gases emitted from a landfill site. This paper therefore reviews the
methodologies that can be applied to quantify measurement uncertainty using
examples from specific case studies.
SAMPLING PRECISION AND SAMPLING BIAS
Quantifying sampling precision
requires that primary samples be collected according to a defined protocol, but
randomized in some way for each sample (in either space or time)[1]. For example, characterizing the
analyte concentration of trace metals in a field of soil may involve the
replication of a sampling protocol in a different, randomly selected
orientation. Alternatively, approximately 10% of the total number of samples
can be collected in duplicate, separated from the nominal sampling location by
a distance representative of the surveying technology employed. This is a
quicker, cheaper method and assumes an average variance across the sampling
target. Duplicate analysis is made on both the nominal and duplicated sample
test portions using a balanced experimental design[2].
The geochemical, sampling and analytical variances can then be separated using
classical or robust analysis of variance.
For moving targets such as gas, sediment and water,
samples can be replicated within the frequency of variability. Ideally, the
frequency of sampling should be at least twice the frequency of the variation
for monitoring purposes[3].
For example, while characterizing the methane concentration of gas emitted from
a landfill the replication of samples could be based on different
meteorological conditions within the sampling frequency specified in a
monitoring protocol. Each sample is then analyzed in duplicate and the
measurements treated statistically to determine the sampling and analytical
components of variance. In this case the sampling component of variance
represents the natural variability of the sampling target as well as any errors
in the sample collection and preparation. Where in situ measurement techniques are used (e.g. infra red gas analyzer) both the sampling and analysis are
addressed at the same time.
Sampling
bias, which may be defined as the difference between the mean of several measurements and the true
value, is a contentious issue, being questioned by some authors[4], but
recognized by others[5]. Such
biases could arise from several causes such as contamination from the sampling
tools, inappropriate handling, or selective sampling1. The use of field blanks, control sites
and field spike samples was an initial step in tackling this problem, however,
the ‘true’ concentration is never known unless a reference sampling target
(RST) has been constructed. Such an RST was constructed in Ascot, UK and allowed the first estimates of the bias from sampling to be
obtained that was traceable to a known mass of pure analyte[6].
An alternative reference point for the estimation of the sampling bias
is the consensus value from a substantial number of measurements made by
different protocols and/or independent samplers5. The limitation of this method is that all the
participants could be equally biased, giving an underestimate of the
uncertainty. Internal quality control is therefore important to identify such
errors.
INTER-ORGANISATIONAL SAMPLING TRIALS
The collaborative trial
(or method performance study) is a ‘top down’ approach to uncertainty
estimation used to validate a sampling protocol. The approach requires a number
of participants (called samplers) to take two sets of samples from a target
using various interpretations of the same sampling
protocol. Each sample is then analyzed in duplicate, under randomized
repeatability conditions (to avoid confusing sampling and analytical
variations). Hierarchical analysis of variance (ANOVA) is then used estimate
precision (as standard deviations) between-samplers, within-samplers, and
between analytical duplicates. The within-sampler variation refers to one sampler
using the same procedure and equipment over a short period of time.
Reproducibility is derived from the sum of squares of the within- and
between-sampler standard deviations and refers to measurements made on a single
or composite sample, collected by different participants using the same
sampling protocol. If the uncertainty is found to be too large for particular
investigations (i.e. not
fit-for-purpose) then modifications to the protocol would be required, e.g. collecting composite rather than
single samples. The sampling bias of each individual sampler
can be considered as a component of random error when viewed as such in an
inter-organizational trial.
The sampling proficiency test
requires many organizations (n³8) to each use a sampling protocol
and equipment that they regard as fit-for-purpose.
The participants can analyze the samples themselves, or where the analytical
component of variance requires minimizing, the organizers can analyze all of
the samples under randomized repeatability conditions. The SPT
coordinator converts the results to scores that are reported to the
participants by a pre-determined date. Each participant can then compare their
result with other participants and with the assigned value. The z score value
that forms the criterion of performance is given as:
z = ( x – X ) / s
where x is the measured value, X is
the assigned value and s is the target value for standard
deviation[7].
A fitness-for-purpose criterion of 10% was considered realistic for relatively
stable gas emissions in landfill boreholes, but was found to be unrealistic for
concentrations that changed significantly over time. In such cases it may be
better to have a different criterion based on natural variability. However, an
external assessment of the FFP criterion could also be considered which
reflects the confidence that regulators require from such measurements. Scores
would be expected to fall outside of the range –3<z<3 in about 0.3 % of instances. Therefore scores
of ½z½³3 are
classified as unsatisfactory performances.
ESTIMATING SPATIAL
UNCERTAINTY
The fitness-for-purpose scoring system for spatial
delineation can be based on a cost - effectiveness approach[8].
Figure 1 gives an example of an estimated hot spot area relative to the
assigned or ‘true’ area from which the ‘excess cost’ score can be derived. The
true hot spot area is either known from spiking or can be estimated using
consensus.

‘Excess cost’ =
(E - i) +
(T – i)
Figure 1: Schematic diagram showing the
false positive and false negative delineations of a hot spot, which are factors
influencing the spatial scoring system for the CTS and SPT.
T
represents the hot spot assigned area, E
is the estimated hot spot area and i
is the area of overlap of the two regions, T
and E. The
cost of unnecessary remediation (false
positive) is (E - i) and the cost of
not remediating areas that should be remediated (false negative) is (T - i). Proportionality constants
and
are used to give
weight to the importance of each classification. The
value of the proportionality constants can be changed depending on the type of
contaminant and remediation techniques used on an individual site. The
value of the area E, is expressed
relative to the value of T, which is
taken as unity, hence E when T = 1
= E / T. The value of i is expressed as a fraction of T, so that i when T=1 = i / T. Scores range upwards from zero,
with a score of zero indicating perfect spatial delineation with no excess
cost. A larger score reflects greater ‘excess cost’.
For trails undertaken with the aim of spatially delineating a hot spot
of contaminated soil on a synthetic RST, a fitness-for-purpose criterion was
set at a score of £3 based on professional judgment. Constants were set at a=1 and b=4 (as the cost of leaving contaminated soil was considered to be
4 times more costly than removing uncontaminated soil) based on professional
judgment. Nine samplers performed a herringbone sampling protocol (n=25) in 2
orientations. All but one of the protocol designs had a fitness-for-purpose
score of £3. This indicated that
the herringbone sampling protocol was fit for the purpose in identifying the
true hot spot location and dimensions on this RST, with minimal
misclassification. There was no significant difference in within-sampler scores
compared with between-samplers scores using one-way analysis of variance.
However, a particular protocol orientation did tend to produce a distinctive
shape of the measured hot spot. For an SPT applied to the same site, 2
participants failed resulting from a mis-orientation of their sampling
locations when producing their final maps (scores >3).
CONLUSIONS
Sampling uncertainty is a relatively new concept, the importance of which is slowly beginning to be recognized within analytical chemistry. Recent research has shown that sampling uncertainty is often far greater than analytical uncertainty. Therefore, combining sampling and analytical uncertainties to provide and estimate of measurement uncertainty can enable the end-user to improve classifications and decisions within geochemical investigations of environmental media.
REFERENCES
[1] Thompson, M. (1999) Journal
of Environmental Monitoring. 1: 19N-21N.
[2] Garrett, R.G. (1969).
Economic Geology. 64: 568.
[3] Barcelona, M.J. (1988) In
Principles of environmental sampling (Keith, L.H. editor), American Chemical Society, Chapter 1.
[4]Gy, P.M. (1992) Sampling of
Heterogeneous and Dynamic Materials. Elsevier, Amsterdam, 26.
[5] Ramsey, M.H. and Argyraki,
A. (1997) Science of the Total Environment. 198: 243-257.
[6] Ramsey, M.H., Squire, S. and Gardner M.J. (1999).
Analyst. 124: 1701- 1706.
[7]Thompson, M. and Wood, R.
(1993) Pure and Applied Chemistry. 65: 2123 – 2144.
[8] Squire, S., Ramsey, M.H. and Gardner, M.J. (2000). Analyst. 125: 139-145.