Skip to main content

Full text: A compilation of global bio-optical in situ data for ocean-colour satellite applications

A. Valente et al.: A compilation of global bio-optical in situ data 
ENTSSEA, BATS, BIOCHEM, BODC, CALCOFI, CCEL- 
TER, CIMT, COASTCOLOUR, ESTOC, IMOS, MARE- 
DAT, PALMER, SEADATANET, TPSS, and TARA. One re- 
quirement for “chla_fluor” measurements was that they were 
made using in vitro methods (i.e. based on extractions of 
chlorophyll-a). Although this severely decreased the num- 
ver of observations, since in vivo fluorometry (e.g. fluorome- 
ters mounted on CTDs) is widely available in oceanographic 
databases, it was decided to exclude such data because of 
potential problems with the calibration of in situ fluorome- 
ter data. The variable “chla_hple” was calculated by sum- 
ming all reported chlorophyll-a derivatives, including di- 
vinyl chlorophyll-a, epimers, allomers, and chlorophyllide- 
a. The two chlorophyll variables are retained separately in 
che database to facilitate their use. HPLC measurements 
could be considered of higher quality, but fluorometric mea- 
surements are more numerous. Thus, one option for users is 
to use “chla_fluor” only when there are no “chla_hplc” mea- 
surements available. To be consistent with satellite-derived 
chlorophyll values, which are derived from the light emerg- 
ing from the upper layer of the ocean, all chlorophyll ob- 
servations in the top 10m (replicates at the same depth, or 
measurements at multiple depths) were averaged if the coef- 
ficient of variation among observations was less than 50%, 
otherwise they were discarded. The averages were then as- 
signed to the surface. The depth of 10m was chosen as a 
compromise between clear oligotrophic and turbid eutrophic 
waters. Other methods, such as chlorophyll depth-averages 
using local attenuation conditions (Morel and Maritorena, 
2001), require observations at multiple depths, which, given 
our decision to use only in vitro measurements, would have 
considerably reduced the final number of observations. 
Regarding the inherent optical properties (“aph”, “adg”, 
“bbp”), if not already calculated and provided in the con- 
tributed data sets, they were computed from related variables 
that were available: particle absorption (“ap”), detrital ab- 
sorption (“ad”), coloured dissolved organic matter (CDOM) 
absorption (“ag”), and total backscattering (“bb”). The fol- 
lowing equations were used: “adg = ad + ag”, “ap = aph 
- ad”, and “bb = bbp + bbw”. For the latter equation, the 
variable ”bbw” was computed using “bbw = bw/2”, where 
‘bw” is the scattering coefficient of seawater derived from 
Zhang et al. (2009). The diffuse attenuation coefficient for 
downward irradiance (“kd’””) did not require any conversion 
and was compiled as originally acquired. Observations of in- 
herent optical properties (surface values) and diffuse atten- 
uation coefficient for downward irradiance were acquired in 
total from six data sources designed for ocean colour vali- 
dation and applications (SeaBASS, NOMAD, MERMAID, 
AWI, COASTCOLOUR, TPSS), thus already subject to the 
processing routines of these data sets. Concerning total sus- 
pended matter, these data were compiled as originally avail- 
able from MERMAID and COASTCOLOUR. 
The merged data set was compiled from 27 sets of in 
situ data, which were obtained individually either from 
attos://doi.org/10.5194/essd-14-573 /-202: 
5741 
archives that incorporate data from multiple contribu- 
tors (SeaBASS, NOMAD, MERMAID, ICES, ARCSSPP, 
BIOCHEM, BODC, COASTCOLOUR, MAREDAT, SEA- 
DATANET), or from particular contributors, measurement 
programs, or projects (MOBY, BOUSSOLE, AERONET- 
OC, HOT, GeP&CO, AMT, AWI, BARENTSSEA, BATS, 
CALCOFI, CCELTER, CIMT, ESTOC, IMOS, PALMER, 
IPSS, TARA), and were subsequently homogenized and 
merged. Data contributors are listed in Table 2 and in the aux- 
iliary material. There were methodological differences be- 
tween data sets. Therefore, after acquisition, and prior to any 
merging, each set of data was pre-processed for quality con- 
trol and converted to a common format. During this process, 
data were discarded if they had: (1) unrealistic or missing 
date and geographic coordinate fields; (2) poor quality (e.g. 
original flags) or method of observation that did not meet the 
criteria for the data set (e.g. in situ fluorescence for chloro- 
phyll concentration); and (3) spuriously high or low data. For 
the last, the following limits were imposed: for “chla_fluor” 
and “chla_hple” [0.001-100] mg m73; for “rrs” [0-0.15] 
sr71; for “aph”, “adg”, and “bbp” [0.0001—-10] m7!; for 
“tsm” [0-1000] g m73; and for “kd” [(aw(A)-10] m7}, where 
“aw” is the pure water absorption coefficients derived from 
Pope and Fry (1997). Also, during this stage, three metadata 
strings were attributed to each observation: “dataset”, “sub- 
dataset”, and “contributor”. The “dataset”” contains the name 
of the original set of data and can only be one of the fol- 
lowing: “aoc”, “boussole”, “mermaid”, “moby”, “nomad”, 
“seabass”, “hot”, “ices”, “amt”, “gepco”, “arcsspp”, “awi“, 
“barentssea“, “bats‘, “biochem“‘, “bode‘‘, “calcofi“, “cec“, 
“ccelter“, “cimt“, “estoc“, “imos“, “maredat“, “palmer“, 
“seadatanet“, “tpss‘“, and “tara”. The “subdataset” starts with 
the “dataset” identifier and is followed by additional infor- 
mation about the data, as <dataset>_<cruise/station/site>) 
(e.g. “seabass_car81”). The “contributor” contains the name 
of the data contributor. An effort was made to homogenize 
the names of data contributors from the different sets of data. 
These three metadata are the link to trace each observation 
to its origin and were propagated throughout the processing. 
Sinally, this processing stage ended with each set of data be- 
ing scanned for replicate variable data and replicate station 
data, which when found, were averaged if the coefficient of 
varlation was less than 50 %, otherwise they were discarded. 
Replicates were defined as multiple observations of the same 
variable, with the same date, time, latitude, longitude, and 
depth. Replicate station data were defined as multiple mea- 
surements of the same variable, with the same date, time, lat- 
itude, and longitude. For the latter case, a search window of 
5 min in time and 200 m in distance was given to account for 
station drift. A small number of observations that were iden- 
tified as replicates had a different “subdataset” identifiers (ie. 
different cruise names). These observations were considered 
suspicious if the values were different and discarded. If the 
values were the same, one of the observations was retained. 
Earth Syst. Sei. Data, 14, 5737-5770. 2022
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.