3740
2 Data and methods
2.1 Pre-processing and merging
The compiled global in situ bio-optical data set described
in this work has an emphasis, though not exclusively,
on open-ocean data. It comprises the following variables:
‚emote-sensing reflectance (“rrs”), chlorophyll-a concentra-
ion (“chla”), algal pigment absorption coefficient (“aph”),
detrital and coloured dissolved organic matter absorp-
tion coefficient (“adg”), particle backscattering coefficient
(“bbp”), diffuse attenuation coefficient for downward irradi-
ance (“kd”) and total suspended matter (“tsm””). The vari-
ables “rrs”, “aph”, “adg”, “bbp”, and “kd” are spectrally
dependent, and this dependence is hereafter implied. The
data were compiled from 27 sources (MOBY, BOUSSOLE,
AERONET-OC, SeaBASS, NOMAD, MERMAID, AMT,
[{CES, HOT, GeP&CO, AWI, ARCSSPP, BARENTSSEA,
BATS, BIOCHEM, BODC, CALCOFI, CCELTER, CIMT,
COASTCOLOUR, ESTOC, IMOS, MAREDAT, PALMER,
SEADATANET, TPSS, and TARA), each one described in
Sect. 2.2. The data sources in this work should also be viewed
as groups of data that were acquired from a specific source,
standardized with a specific method, and later merged into
the compilation. The compiled in situ observations are essen-
tially surface (i.e. no information depending on depth), have
a global distribution, and cover the period 1997 to 2021. The
listed variables, with the exception of total suspended matter,
were chosen as they are the operational satellite ocean colour
products of ESA OC-CCI project.
The compilation is provided in the format of three 2-
dimensional main tables that relate to each other via one
unique key identifying each row. The format of the tables
is described in Appendix B. Despite being provided in three
main tables, the compilation should still be viewed concep-
ually as one unique table, and as such it is still described
in that way. The data set contains two flags: “flag_time” and
“flag_chl_method”. The first is because three data sources
were used (ESTOC, MAREDAT, and TPSS) where infor-
mation on time (hour of the day) was not available. The
ime for these observations was set to 12:00:00 (UTC)
and the observations were flagged with “1” in column
“flag_time”. A second flag was necessary because in two data
sources (ARCSSPP and SEADATANET) there was uncer-
zainty on whether the compiled chlorophyll concentrations
were measured using fluorometric, spectrophotometric, or
HPLC (high-performance liquid chromatography) methods.
The compiled chlorophyll observations from these two data
sources were flagged with “1” in column “flag_chl_method”
and were marked as “chla_fluor”.
This is the third version of the compilation. The first
and second versions were described in Valente et al. (2016,
2019), respectively. Compared to the previous version (Va-
lente et al., 2019), the present version contains more mea-
surements of “rrs”, “chla”, and “aph”. The “rrs” stations
carth Syst. Sci. Data. 14. 5737-5770. 202%
A. Valente et al.: A compilation of global bio-optical in situ data
increased by — 15% (ie. from 59781 to 68 641), result-
ing from updates of AERONET-OC, BOUSSOLE, MOBY,
MERMAID, and AWLI. The new stations are mainly for the
period of 2019-2021 (previous version had “rrs” data until
2018). Regarding “chla”, a major increase in the number of
recent observations was obtained. The previous version had
“Chla” data until 2017, with 533 stations for the period 2016-—
2017. The current version has 5140 stations for 2016-2021.
which constitutes a near tenfold (964 %) increase since 2016.
The new “chla” data originate from updates of BOUSSOLE.
MERMAID, SeaBASS, HOT, AMT, PALMER, CCELTER,
CALCOFTI, AWTI, and IMOS. As for the number of “aph”
stations, it increased by — 30% (ie. from 3293 to 4265),
with most of the data between 2012-2020 (previous version
finished in 2012). The new “aph” data come from updates
of SeaBASS and AWI. Overall, the main objective of the
present version was to populate the compilation with more
recent data. Methodologies for data harmonization and inte-
gration (described below) have not been altered relative to
the last version.
Remote-sensing reflectance is a primary ocean colour
product defined as “rrs = Lw/Es”, where “Lw” is the upward
water-leaving radiance and “Es” is the total downward irradi-
ance at sea level. Another quantity that is often required is the
“normalized” water-leaving radiance (“nLw””) (Gordon and
Clark, 1981), which is related to remote-sensing reflectance
via “rrs = nLw/Fo”, where “Fo” is the top-of-the-atmosphere
solar irradiance. If not directly available, remote-sensing re-
flectance was calculated through the equations described
above, depending on the format of the original data. The
original data were acquired in an advanced form (e.g. time-
averaged, extrapolated to surface) from nine data sources de-
signed for ocean colour validation and applications (MOBY,
BOUSSOLE, AERONET-OC, SeaBASS, NOMAD, MER-
MAID, COASTCOLOUR, TARA, AWT), therefore only re-
quiring the conversion to a common format. In processing by
space agencies, the quantity “rrs”” is normalized to a single
Sun-viewing geometry (Sun at zenith and nadir viewing) tak-
ing in account the bidirectional effects as described in Morel
and Gentili (1996) and Morel et al. (2002). Thus, for con-
sistency with satellite “rrs” product, the latter normalization
was applied to the in situ “rrs”.
Chlorophyll-a concentration is a proxy measure for phy-
toplankton biomass and one of the most-widely used satel-
lite ocean colour products (IOCCG, 2008). To validate
satellite-derived chlorophyll-@ concentration, two different
variables were compiled: one of these represents chlorophyll-
a measurements made through fluorometric or spectropho-
[ometric methods, referred to hereafter as “chla_fluor”, and
the other is the chlorophyll concentration derived from
HPLC (high-performance liquid chromatography) measure-
ments, referred to hereafter as “chla_hple”. "The chloro-
phyll data were compiled from the following 25 data
sources: BOUSSOLE, SeaBASS, NOMAD, MERMAID.
AMT. ICES, HOT, GeP&CO, AWI., ARCSSPP. BAR-
httos://dol.org/10.5194/essd-14-5737-2022