Skip to main content

Full text: REPORT OF THE MARKET SURVEY OF MARITIME ELECTRO-OPTICAL SENSORS AND AI ASSISTANCE FOR NAUTICALCREW

Which threshold for mAP or IoU is acceptable will be discussed within the IMO organisation and the 
related experts at a later time on the journey towards Maritime Autonomous Surface Ships (MASS) 
(IMO, 2024). There are already many opportunities and challenges within the maritime scientific com- 
munity (Felski and Zwolak, 2020). 
Models for object detection 
The study by Koch at al. (2024) compares five commonly used architectures for object detection. Table 
2 summarises properties of the five architectures investigated. The computational cost of an architec- 
ture plays an important role in the suitability of Al models used on vessels. The metric Floating Point 
Dperations (FLOPs) is established to quantify the computational cost and refers to how many floating 
Doint operations are required to run a single instance of a given model. It is a measurement used to 
evaluate the performance and resource requirements and provides information about the computa- 
tional complexity of the model as well as its efficiency. The higher the number of FLOPs, the more 
computing power is required. This makes it less suitable for real-time applications because it affects 
the latency. Additionally, in the European Al Act, the metric FLOPs is used to determine the criticality 
of an artificial system and defines the regulatory threshold. 
in Table 2, five selected architectures trained with additionally maritime data are compared with the 
COCO dataset in order to evaluate performance potentials. 
! Architecture Type Parameter 1 FLOPs I mAP (COCO) 
Faster-R-CN CNN ' 42 million !134G 37.0 
RetinaNet ' CNN 38 million | 1526 36.4 
DETR ' Vision Transformer ' 41 million 187G ' 42.0 
FocalTransfomer Vision Transformer 39 million 265G 45.5 
YOLOvV7 CNN 37 million 1056G 51.4 
Table 2: Comparison of popular architectures in the object detection domain in general applications. The aim is to have a high number of 
MAP and a low number of FLOPs (Koch et al., 2024). Vision Transformer and CNN are the Deep Learning architectures used 
Models for semantic segmentation 
Semantic segmentation involves the task of assigning each pixel in an image to a class and thus seg- 
menting the scene into sensible areas. Semantic segmentation is particularly useful for detecting areas 
rather than objects. This finer-grained understanding of the environment enables additional applica- 
tions such as estimating the horizon line and inclination angles of ships as well as context information 
for other systems (Koch et al., 2024). 
Table 3 compares some architectures with metrics evaluated on the generic data set ADE20K and the 
Cityscapes dataset from the automotive sector (Koch et al., 2024). 
Architecture Type ' Parameter FLOPs mIoU ; 
(ADE2Ok) 
DeepLabV3 | CNN | 42 million | 106 ı 44.1 
OCR Vision Transformer | 10.5 million '3406G 45.3 
SefFormer Vision Transformer | 44 million 79G 50.0 
ViT-Adapter 403 G 52:5 
Table 3: Comparison of common architectures in the area of semantic segmentation in general applications evaluated using the ADE20K 
dataset (Koch et al., 2024)
	        
Waiting...

Note to user

Dear user,

In response to current developments in the web technology used by the Goobi viewer, the software no longer supports your browser.

Please use one of the following browsers to display this page correctly.

Thank you.