SAMPLING ERROR: THE NEGLECTED COMPONENT OF MEASUREMENT UNCERTAINTY IN TRACE ELEMENT ANALYSES

 

Sharon Squire1, Michael H Ramsey2 and Michael, J. Gardner3

1Environmental Toxicology, Applied Sciences, University of California, Santa Cruz, CA 95064.

E-mail: ssquire@es.ucsc.edu

2Centre for Environmental Research, School of Chemistry, Physics and Environmental Science, University of Sussex, Falmer, Brighton, UK, BN1 9QJ

3WRc NSF, Henley Road, Medmenham, Marlow, Bucks, UK, SL7 2HD

 

ABSTRACT

Sampling is the first step in the measurement process required to obtain the primary sample from the sampling target, such as soil in a field, filtered water in an estuary, or atmospheric gases in a building. Following sample preparation, the test materials can be analyzed by analytical instrumentation. Modern analytical instruments, such as ICP-AES, are sophisticated enough to measure analyte concentrations at trace levels (i.e. parts per trillion range) with acceptable accuracy and reproducibility. However, improvements in analytical measurements have raised issues of concern with regard to the collection of primary samples, as the quality of the conclusion that can be made from a geochemical investigation will depend on the quality of both sampling and analysis. This implies that sampling protocols should be capable of identifying (i) the heterogeneity of the target analyte in both space and time and (ii) potential contamination, loss of analyte, or an incorrect sampling procedure.  Methodologies for estimating the systematic and random components of uncertainty are therefore necessary in order to assess the fitness for purpose of geochemical measurements. A system of internal quality control and inter-organizational sampling trials has therefore been described for both spatially and temporally variable environmental sampling targets.

 

INTRODUCTION

The measurement process comprises the collection of samples, the preparation of test materials in the laboratory and the analysis of the derived test portions. The informal definition of measurement uncertainty is “the interval around the result of a measurement that contains the true value with high probability”1. Rather than estimating individual analytical and sampling uncertainties separately, a holistic approach is to consider sampling and analysis as the same measurement process and to quantify their combined contribution to uncertainty by the addition of their variances. Thus, measurement uncertainty can be considered to have four component sources or error. These are the sampling and analytical random errors quantified as precision, and the sampling and analytical systematic errors quantified as bias. Internal quality control and inter-organizational trials address aspect of precision, proficiency tests address the accuracy of single results and reference materials address bias. Such parameters have been quantified in recent research for an estimation of uncertainty in delineating a hot spot of contaminated soil on a specially created synthetic reference sampling target (RST), and also for estimating the uncertainty in sampling temporally variable emissions methane, carbon dioxide and oxygen gases emitted from a landfill site. This paper therefore reviews the methodologies that can be applied to quantify measurement uncertainty using examples from specific case studies.

 

SAMPLING PRECISION AND SAMPLING BIAS

Quantifying sampling precision requires that primary samples be collected according to a defined protocol, but randomized in some way for each sample (in either space or time)[1]. For example, characterizing the analyte concentration of trace metals in a field of soil may involve the replication of a sampling protocol in a different, randomly selected orientation. Alternatively, approximately 10% of the total number of samples can be collected in duplicate, separated from the nominal sampling location by a distance representative of the surveying technology employed. This is a quicker, cheaper method and assumes an average variance across the sampling target. Duplicate analysis is made on both the nominal and duplicated sample test portions using a balanced experimental design[2]. The geochemical, sampling and analytical variances can then be separated using classical or robust analysis of variance.

For moving targets such as gas, sediment and water, samples can be replicated within the frequency of variability. Ideally, the frequency of sampling should be at least twice the frequency of the variation for monitoring purposes[3]. For example, while characterizing the methane concentration of gas emitted from a landfill the replication of samples could be based on different meteorological conditions within the sampling frequency specified in a monitoring protocol. Each sample is then analyzed in duplicate and the measurements treated statistically to determine the sampling and analytical components of variance. In this case the sampling component of variance represents the natural variability of the sampling target as well as any errors in the sample collection and preparation. Where in situ measurement techniques are used (e.g. infra red gas analyzer) both the sampling and analysis are addressed at the same time.

            Sampling bias, which may be defined as the difference between the mean of several measurements and the true value, is a contentious issue, being questioned by some authors[4], but recognized by others[5]. Such biases could arise from several causes such as contamination from the sampling tools, inappropriate handling, or selective sampling1. The use of field blanks, control sites and field spike samples was an initial step in tackling this problem, however, the ‘true’ concentration is never known unless a reference sampling target (RST) has been constructed. Such an RST was constructed in Ascot, UK and allowed the first estimates of the bias from sampling to be obtained that was traceable to a known mass of pure analyte[6]. An alternative reference point for the estimation of the sampling bias is the consensus value from a substantial number of measurements made by different protocols and/or independent samplers5. The limitation of this method is that all the participants could be equally biased, giving an underestimate of the uncertainty. Internal quality control is therefore important to identify such errors.

 

INTER-ORGANISATIONAL SAMPLING TRIALS

The collaborative trial (or method performance study) is a ‘top down’ approach to uncertainty estimation used to validate a sampling protocol. The approach requires a number of participants (called samplers) to take two sets of samples from a target using various interpretations of the same sampling protocol. Each sample is then analyzed in duplicate, under randomized repeatability conditions (to avoid confusing sampling and analytical variations). Hierarchical analysis of variance (ANOVA) is then used estimate precision (as standard deviations) between-samplers, within-samplers, and between analytical duplicates. The within-sampler variation refers to one sampler using the same procedure and equipment over a short period of time. Reproducibility is derived from the sum of squares of the within- and between-sampler standard deviations and refers to measurements made on a single or composite sample, collected by different participants using the same sampling protocol. If the uncertainty is found to be too large for particular investigations (i.e. not fit-for-purpose) then modifications to the protocol would be required, e.g. collecting composite rather than single samples.  The sampling bias of each individual sampler can be considered as a component of random error when viewed as such in an inter-organizational trial.  

The sampling proficiency test requires many organizations (n³8) to each use a sampling protocol and equipment that they regard as fit-for-purpose. The participants can analyze the samples themselves, or where the analytical component of variance requires minimizing, the organizers can analyze all of the samples under randomized repeatability conditions. The SPT coordinator converts the results to scores that are reported to the participants by a pre-determined date. Each participant can then compare their result with other participants and with the assigned value. The z score value that forms the criterion of performance is given as:

z = ( x – X ) / s                                                                                   

where x is the measured value, X is the assigned value and s is the target value for standard deviation[7]. A fitness-for-purpose criterion of 10% was considered realistic for relatively stable gas emissions in landfill boreholes, but was found to be unrealistic for concentrations that changed significantly over time. In such cases it may be better to have a different criterion based on natural variability. However, an external assessment of the FFP criterion could also be considered which reflects the confidence that regulators require from such measurements. Scores would be expected to fall outside of the range         –3<z<3 in about 0.3 % of instances. Therefore scores of ½z½³3 are classified as unsatisfactory performances.

 

ESTIMATING SPATIAL UNCERTAINTY

The fitness-for-purpose scoring system for spatial delineation can be based on a cost - effectiveness approach[8]. Figure 1 gives an example of an estimated hot spot area relative to the assigned or ‘true’ area from which the ‘excess cost’ score can be derived. The true hot spot area is either known from spiking or can be estimated using consensus.

 

 

‘Excess cost’ =  (E - i) +  (T – i)

 

Figure 1: Schematic diagram showing the false positive and false negative delineations of a hot spot, which are factors influencing the spatial scoring system for the CTS and SPT.

 

T represents the hot spot assigned area, E is the estimated hot spot area and i is the area of overlap of the two regions, T and E. The cost of unnecessary remediation (false positive) is  (E - i) and the cost of not remediating areas that should be remediated (false negative) is (T - i). Proportionality constants  and  are used to give weight to the importance of each classification. The value of the proportionality constants can be changed depending on the type of contaminant and remediation techniques used on an individual site. The value of the area E, is expressed relative to the value of T, which is taken as unity, hence E when T = 1 = E / T. The value of i is expressed as a fraction of T, so that i when T=1  =  i / T. Scores range upwards from zero, with a score of zero indicating perfect spatial delineation with no excess cost. A larger score reflects greater ‘excess cost’.

For trails undertaken with the aim of spatially delineating a hot spot of contaminated soil on a synthetic RST, a fitness-for-purpose criterion was set at a score of £3 based on professional judgment. Constants were set at a=1 and b=4 (as the cost of leaving contaminated soil was considered to be 4 times more costly than removing uncontaminated soil) based on professional judgment. Nine samplers performed a herringbone sampling protocol (n=25) in 2 orientations. All but one of the protocol designs had a fitness-for-purpose score of £3. This indicated that the herringbone sampling protocol was fit for the purpose in identifying the true hot spot location and dimensions on this RST, with minimal misclassification. There was no significant difference in within-sampler scores compared with between-samplers scores using one-way analysis of variance. However, a particular protocol orientation did tend to produce a distinctive shape of the measured hot spot. For an SPT applied to the same site, 2 participants failed resulting from a mis-orientation of their sampling locations when producing their final maps (scores >3).

 

CONLUSIONS

Sampling uncertainty is a relatively new concept, the importance of which is slowly beginning to be recognized within analytical chemistry. Recent research has shown that sampling uncertainty is often far greater than analytical uncertainty. Therefore, combining sampling and analytical uncertainties to provide and estimate of measurement uncertainty can enable the end-user to improve classifications and decisions within geochemical investigations of environmental media.

 

REFERENCES



[1] Thompson, M. (1999) Journal of Environmental Monitoring. 1: 19N-21N.

[2] Garrett, R.G. (1969). Economic Geology. 64: 568.

[3] Barcelona, M.J. (1988) In Principles of environmental sampling (Keith, L.H. editor), American  Chemical Society, Chapter 1.

[4]Gy, P.M. (1992) Sampling of Heterogeneous and Dynamic Materials. Elsevier, Amsterdam, 26.

[5] Ramsey, M.H. and Argyraki, A. (1997) Science of the Total Environment. 198: 243-257.

[6] Ramsey, M.H., Squire, S. and Gardner M.J. (1999). Analyst. 124: 1701- 1706.

[7]Thompson, M. and Wood, R. (1993) Pure and Applied Chemistry. 65: 2123 – 2144.

[8] Squire, S., Ramsey, M.H. and Gardner, M.J. (2000). Analyst. 125: 139-145.