Skip to main content

Full text: A compilation of global bio-optical in situ data for ocean-colour satellite applications

238 
A. Valente et al.: A compilation of global bio-optical in situ data 
Earth Syst. Sei. Data, 8, 235-252, 2016 
www.earth-syst-sci-data.net/8/235/2016/ 
observations at multiple depths, which, given our decision to 
use only in vitro measurements, would have reduced consid 
erably the final number of observations. 
With regard to the inherent optical properties (aph, adg, 
bbp), if not already calculated and provided in the con 
tributed datasets, they were computed from related vari 
ables that were available: particle absorption (ap), detrital 
absorption (ad), coloured dissolved organic matter (CDOM) 
absorption (ag) and total backscattering (bb). The follow 
ing equations were used adg = ad + ag, ap = aph + ad and 
bb = bbp + bbw. For the latter equation, the variable bbw 
was computed using bbw = bw / 2, where bw is the scatter 
ing coefficient of seawater derived from Zhang et al. (2009). 
The diffuse attenuation coefficient for downward irradiance 
(kd) did not require any conversion and was compiled as 
originally acquired. Observations of inherent optical prop 
erties (surface values) and diffuse attenuation coefficient for 
downward irradiance, were acquired from three data sources 
particularly designed for ocean-colour validation (SeaBASS, 
NOMAD, MERMAID) and were thus already subject to the 
processing routines of these datasets. 
The merged dataset was compiled from 10 sets of in 
situ data, which were obtained individually either from 
archives that incorporate data from multiple contributors 
(SeaBASS, NOMAD, MERMAID and ICES) or from par 
ticular measurement programs or projects (MOBY, BOUS 
SOLE, AERONET-OC, HOT, GeP&CO, AMT) and were 
subsequently homogenised and merged. Data contributors 
are listed in Table 2. There were methodological differences 
between datasets. Therefore, after acquisition, and prior to 
any merging, each set of data was preprocessed for qual 
ity control and conversion to a common format. During this 
process, data were discarded if they had (1) unrealistic or 
missing date, time and geographic coordinate fields; (2) poor 
quality (e.g. original flags) or a method of observation that 
did not meet the criteria for the dataset (e.g. in situ fluo 
rescence for chlorophyll concentration); and (3) spuriously 
high or low data. For the latter, the following limits were 
imposed: for chla_fluor and chla_hplc [0.001-100] mgm -3 ; 
forrrs [0-0.15] sr -1 ; for aph, adg and bbp [0.0001-10] m -1 ; 
for kd [aw(À)-10] m -1 , where aw is the pure water absorp 
tion coefficients derived from Pope and Fry (1997). Also dur 
ing this stage, three metadata strings were attributed to each 
observation: dataset, subdataset and pi. The dataset contains 
the name of the original set of data, and can only be one 
of the following: “aoc”, “boussole”, “mermaid”, “moby”, 
“nomad”, “seabass”, “hot”, “ices”, “amt” or “gepco”. The 
subdataset starts with the dataset identifier and is followed 
by additional information about the data, in the format 
<dataset>_<cruise/station/site>) (e.g. seabass_car71). The pi 
contains the name of the principal investigator(s). An effort 
was made to homogenise the names of principal investigators 
from the different sets of data. These three metadata are the 
link to trace each observation to its origin and were prop 
agated throughout the processing. Finally, this processing 
stage ended with each set of data being scanned for replicate 
variable data and replicate station data, which when found, 
were averaged if the coefficient of variation was less than 
50%; otherwise they were discarded. Replicates were de 
fined as multiple observations of the same variable, with the 
same date, time, latitude, longitude and depth. Replicate sta 
tion data were defined as multiple measurements of the same 
variable, with the same date, time, latitude and longitude. For 
the latter case, a search window of 5 min in time and 200 m 
in distance was given, to account for station drift. A small 
number of observations that were identified as replicates had 
different subdataset identifiers (i.e. a different cruise name). 
These observations were considered suspicious if the values 
were different and were discarded. If the values were the 
same, one of the observations was retained. This possibly 
originated from the same group of data being contributed to 
an archive by two different principal investigators. 
Once each set of data was homogenised, all data were 
integrated into a unique table. This final merging focused 
on the removal of duplicates between the sets of data. Al 
though some duplicates are known (e.g. MOBY, BOUS- 
SOLE, AERONET-OC and NOMAD data are found in 
SeaBASS and MERMAID sets of data), others are un 
known (e.g. how much of GeP&CO, ICES, AMT, HOT is 
within NOMAD, SeaBASS and MERMAID). Therefore, du 
plicates were identified using the metadata (dataset and sub 
dataset) when possible and temporal-spatial matches as an 
additional precaution. For temporal-spatial matches, several 
thresholds were used, but typically 5 min and 200 m were 
taken to be enough to identify most duplicated data, which 
reflected small differences in time, latitude and longitude, 
between the different sets of data. Larger thresholds were 
used in some cases as a cautionary procedure. This was the 
case when searching for NOMAD data in other datasets be 
cause NOMAD includes a few cases where merging of ra 
diometric and pigment data was done with large spatial- 
temporal thresholds (Werdell and Bailey, 2005). With regard 
to all data, if duplicates were found, data from the NOMAD 
dataset were selected first, followed by data from individ 
ual projects (MOBY, BOUSSOLE, AERONET-OC, AMT, 
HOT and GeP&CO) and finally for the remaining datasets 
(SeaBASS, MERMAID and ICES). This procedure was cho 
sen to preserve the NOMAD dataset as a whole, since it is 
widely used in ocean-colour validation. After all data were 
free of duplicates, they were merged consecutively by vari 
able in the final table. During this process, we also searched 
for rows (stations) that were separated from each other by 
time differences less than 5 min and horizontal spatial differ 
ences of less than 200 m. When such rows were found, the 
observations in those rows were merged into a single row. 
The compiled merged data were compared with the original 
sets to certify that no errors occurred during the merging. As 
a final step, a water-column (station) depth was recorded for 
each observation, which was the closest water-column depth 
from the ETOPOl global relief model (National Geophys
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.