Wong et al.
has spent a considerable effort in recent years in attempting to
identify and rectify these group-to-group differences. Second,
while many of the CTD data returned by the floats are of good
quality, there has been an alarming rise in drift in measured
salinity in recent years that is still not well-understood, as noted
in section Problems Encountered for Salinity. Work has been
underway for several years between float deployers, data users,
and manufacturers to characterize and fix this problem, and
these efforts continue. In this case, as in the past with the CTD
pressure sensor issues, it has been shown to be essential to
monitor data quality as closely as possible and for scientists and
manufacturers to remain in close communication. It is likely that
there will always be sporadic problems with some components in
the float and sensor supply chain that lead to compromised data
quality, but the effects of such problems can be minimized with
continued vigilance.
A third challenge for Argo is the delayed-mode quality
control (DMQC) process of the ever-growing and diversifying
dataset produced by the float array. To date, DMQC has been
conducted by operators at various institutions examining each
profile and determining what adjustments are necessary based
on comparison with reference data. With an existing dataset
comprised of over 2 million profiles, the burden placed on
human resources is considerable. Argo seemingly now has an
opportunity to make use of recent developments in machine
learning (ML) techniques in order to improve quality control
procedures, as outlined by Maze and Morrow (2017). An initial
fruitful approach might be to develop automated DMQOC checks
on all new data, and then to direct the flow of data with problems
(ie., those not passing the explicit tests) to a ML algorithm for a
second check, prior to human intervention. In such a scenario,
ML could be used to sort the data, identify problems, and
suggest the necessary changes, effectively reducing the workload
f human operators. Such an approach has been successfully
tested at Ifremer, resulting in a 25% reduction in DMQC operator
workload (Maze et al., 2020). ML algorithms such as decision
trees, neural networks, and Gaussian mixture models can also be
used to determine the best combination of existing DMOQOC tests,
to improve the quality of reference data (this is already in use
for QC of biogeochemical Argo measurements, as noted by Bittig
at al., 2018), and to improve the selection of historical profiles to
be used to evaluate new, incoming data (Maze et al., 2017). In the
future, advances in ML algorithms should provide an important
resource to the Argo community to help to meet the challenge
of maintaining the quality of its data from ever more floats and
diversified missions.
DISCUSSION AND CONCLUSION
The original goal of the Argo Program was to provide a
description of the mean state and variability in the upper
2 kilometers of the global ocean on sub-seasonal to decadal
timescales. This aspiration was motivated by the success of
WOCE in the 1990s to provide a first estimate of the state of the
global ocean. The evolution of the float program from WOCE to
Argo was not without technical challenges. The basic operation
rontiers in Marine Science | www.frontiersin.ore
Argo Data 1999-2019
of the extant float buoyancy engine was problematic and needed
redesign. The first CTD units in use performed poorly, and an
effective alternative needed to be found. There was no systematic
way to disseminate and manage real-time data. Furthermore,
there were no agreed methods to compare data to reference
datasets in order to make adjustments to measurements from
floats once they were deployed. All of these issues represented
daunting challenges at the turn of the 21st century.
Yet within a few years of the beginning of the Argo Program,
all these technical challenges were addressed in ways that were
adequate to make Argo successful. Continuing on from those
early years, the Argo Program has overcome two decades of
Challenges because it has been supported by a multi-national
team of dedicated scientists, engineers, and data experts, working
in a collaborative manner. The clear goals of the Argo Program,
the commitment to develop the necessary infrastructure, and the
willingness to share innovative improvements in both technology
and data methodology, have allowed Argo to revolutionize the
way large-scale oceanographic data are collected, disseminated,
and analyzed. Today, Argo is an international collaborative
project that involves 34 countries. As of September 2019, data
holdings at the Argo GDACs from 11 national DACs amounted
to 338 gigabytes of data from 15,231 floats. The seasonal and
spatial coverage of Argo is unprecedented, increasing the total
available number of observed profiles in many regions from < 10
per 1° square to over 50 nearly everywhere (Figure 14).
From its inception, Argo has made its data freely available
to the operational and research communities and the general
public and, in doing so, has led to a new paradigm in ocean
data sharing. This open-data policy, coupled with the exceptional
data coverage, have driven an explosion in ocean and climate
research (over 4,000 papers and 250 PhD theses have used Argo
data). Argo’s nearly global coverage makes it particularly useful
for detection of climate change signals, for estimation of the
ocean’s heat content, and for observation of the intensification
of the global hydrological cycle (Riser et al., 2016). Argo data
also underpin ocean and climate forecasting services, through
their now dominant role in ocean model initialization at most
forecasting centers. After 20 years, Argo has exceeded its original
aspirations. Science writer Justin Gillis of the New York Times
has described Argo as “one of the scientific triumphs of the age”
(Gillis, 2014).
In this paper, we have aimed to describe the core Argo
dataset collected over the first 20 years of the program.
When Argo was first conceived, aspirational uncertainties for
the measurements of pressure, temperature and salinity were
based on experience with other ocean observing programs,
such as hydrographic cruises and moorings, and with the
experiences acquired during WOCE. Today, we have been
able to estimate accuracies of 0.002°C for temperature, 2.4
dbar for pressure, and 0.01 PSS-78 for salinity, after delayed-
mode adjustments. As of 2019, the manufacturer calibration
specification of salinity from the SBE-41/41CP CTDs is
0.0035 PSS-78. In reality, however, the achieved accuracy
for float salinity to 2,000 dbar is closer to 0.01 PSS-78,
as assessed by using independent observations from GO-
SHIP measurements.
Zantembear 2020 1 Valıme 7 1 Article Z0