10
3. Data Quality Control
3.1 Cruise identification
The quality analysis procedure used in the study requires a cruise identification of the data.
Unfortunately, in many cases the NODC collections still do not provide information to link the
data to a particular oceanographic cruise occupied by one and the same ship. The historical
profiles of the composite data set, selected for the analysis are distributed among about 10000
NODC archive codes, which often are linked to the data from many different cruises of one and
the same ship. Within each NODC archive cruise number all profiles were ordered by time,
and a time demarcation between a pair of new ’’cruises” was set as soon as the time span
between two consecutive stations exceeded 7 days. This time separation criterion was set
rather arbitrarily, and it does not exclude the possibility when the time separation of 7 or
more days occurred within one and the same hydrographic cruise, or when the gap between
the two cruises was less than a week. As a result, all selected hydrographic profiles were
ascribed to a total of 41757 new cruises.
3.2 Random errors
Evolution of measuring technique and methods along with very different quality standards
caused a high degree of inhomogeneity of the historical hydrographic data set and invoked a
large literature devoted to the problems of quality control of oceanographic data. Most of the
quality control procedures (Levitus et al., 1994; Olbers et al., 1992; Curry, 1996; Gouretski and
Jancke, 1999) were aimed to identify random errors in the data.
The quality evaluation of the composite dataset used in this study benefited from the fact that
all of the source data had already been validated to a certain degree. A description of the
validation procedure applied to the WOA01 data was given by Levitus et al. (1994) and
Conkright et al. (1994). All WOCE data have been checked for their quality both by
respective principal investigators and (in many cases) by independent experts. However, the
investigation of the historical hydrographic data quality for the South Atlantic (Gouretski and
Jancke, 1995) and for the North Atlantic (Lozier et al., 1995) showed that some highly
questionable data from the WOD98 database obviously passed through the quality checks
implying the necessity of a more rigorous quality control.
In order to further validate the data we used a method developed by Gouretski and Jancke
(1999) and tested for the South Pacific historical data set. Quality checking is done in the
density-parameter space. The method is based on the experimental fact that relations
between potential temperature (or density) and other parameters are locally well defined in
the World Ocean and are relatively tight below the thermocline level. The density-parameter
curves for a group of neighbour stations is approximated by vertical subdivision into small
density bins and connection of mean points for each bin with straight lines. Then mean
values and standard deviations of parameters from the mean curve within each bin are
computed and any value differing by more than some prescribed number of standard
deviations (2.5 in our case) from the mean curve is rejected (flagged).