A. Valente et al.: A compilation of global bio-optical in situ data
5739
temote-sensing reflectance, concentration of chlorophyll-a, spectral inherent optical properties, spectral diffuse
attenuation coefficient, and total suspended matter. Data were obtained from multi-project archives acquired via
3pen internet services or from individual projects acquired directly from data providers. Methodologies were
implemented for homogenization, quality control, and merging of all data. Minimal changes were made on the
original data, other than conversion to a standard format, elimination of some points, after quality control and
averaging of observations that were close in time and space. The result is a merged table available in text format.
Overall, the size of the data set grew with 148 432 rows, with each row representing a unique station in space
and time (cf. 136 250 rows in previous version; Valente et al., 2019). Observations of remote-sensing reflectance
increased to 68 641 (cf. 59781 in previous version; Valente et al., 2019). There was also a near tenfold increase
in chlorophyll data since 2016. Metadata of each in situ measurement (original source, cruise or experiment,
principal investigator) are included in the final table. By making the metadata available, provenance is better
documented and it is also possible to analyse each set of data separately. The compiled data are available at
nttps://doi.0org/10.1594/PANGAEA.941318 (Valente et al., 2022).
Introduction
Data collected by satellite ocean colour sensors provide syn-
optic observations on ocean productivity and the variabil-
ity of marine environment at high spatial and temporal res-
olutions. Ocean colour data, recognized as Essential Cli-
mate Variables by the Global Climate Observation System,
are invaluable to address key issues, such as the detection
of marine ecosystem modifications due to climate change,
the study of the global carbon cycle, and the assessment
of coastal water quality degradations (IOCCG, 2008; Mc-
Clain, 2009). A main goal of the ESA Ocean Colour Climate
Change Initiative (OC-CCI) was to generate a suite of ocean
colour products for use in climate studies (Sathyendranath et
al., 2019). For this purpose, the existing major data streams
for ocean colour were blended into a coherent ocean colour
data record. Currently, data from five ocean colour sensors
are being merged: the Sea-viewing Wide Field-of-view Sen-
sor (SeaWiFS) of NASA, the Medium Resolution Imaging
Spectrometer (MERIS) of ESA, the MODerate resolution
[maging Spectro-radiometer (MODIS) of NASA, the Visible
Infrared Imaging Radiometer Suite (VIIRS) of NASA and
NOAA, and the Ocean and Land Colour Instrument (OLCD)
of ESA. For the validation of the ESA OC-CCI satellite prod-
ucts, a compilation of in situ bio-optical data was produced.
This paper presents that compilation.
There are several sets of in situ bio-optical data worldwide
suitable for validation of ocean colour satellite data. While
some are managed by the data producers, others are in inter-
national repositories with contributions from multiple scien-
tists. Many have rigid quality controls and are built specifi-
cally for ocean colour validation. The use of only any one of
these data sets would limit the amount of data in validation
exercises. It is therefore vital to merge all these in situ data
sets to maximize the number of matchups available for val-
ıdation, with wider distribution in time and space, and con-
sequently to reduce uncertainties in the validation exercise.
However, merging several data sets together can be a com-
attos://doi.org/10.5194/essd-14-573 /-20U 7
plicated task. First, it is necessary to acquire and harmonize
all data sets into a single standard format. Second, during the
merging, duplicates between data sets must be identified and
removed. Third, the metadata should be propagated through-
out the process and made available in the final merged data
set. Ideally, the compiled merged data set would be made
available as a simple text table to facilitate ease of access
and manipulation. In this work, such unification of multiple
data sets is presented. This was done for the validation of the
ESA OC-CCT ocean colour products, but with the intent to
also serve the broader user community.
A merged data set is not without drawbacks: it is likely to
be large (with hundreds of thousands of observations) and so
not always easy to manipulate; because the merging is done
on pre-existing, processed databases, it is not possible to have
full control of the whole processing chain. Hence, the data
set would be a collection of observations collected by several
ınvestigators using different instruments, sampling methods,
and protocols, which might eventually have been modified by
the processing routines used by the repositories or archives.
Io minimize these potential drawbacks, we have, for the
most part, incorporated only data sets that have emerged
From the long-term efforts of the ocean colour and biologi-
cal oceanographical communities to provide scientists with
high-quality in situ data, and implemented additional quality
checks on the data to enhance confidence in the quality of the
merged product. Nevertheless, it is still recognized that dif-
ferent and unpredictable uncertainties may affect data from
the diverse sources due to the use of a variety of field/labora-
tory instruments, methods, and data reduction schemes.
Methodologies used for data harmonization and integra-
tion as well as a description of the acquired individual data
sets are provided in Sect. 2. Geographic distribution and
other characteristics of the final merged data set are shown
in Sect. 3, while Sect. 4 provides an overview of the data.
Earth Syst. Sci. Data, 14, 5737-5770, 2022