Tasks on Surg-3M:

Based on the annotation of the Surg-3M dataset, we propose two novel surgical downstream tasks:

  1. Multi-label (35 classes) video classification of procedure types.
  2. Binary video classification of surgery types.

Leaderboard for the proposed tasks
To establish a baseline for the two tasks proposed in Surg-3M, we conducted an evaluation of SotA approaches, which serves as a benchmark for future research endeavors.

Method Procedure type Surgery type
mAP (%) F1-score (%) Accuracy (%) F1-score (%)
SlowFast 22.0 23.9 88.5 87.5
TimeSformer 42.1 37.5 93.2 92.7
MViTv2 49.5 41.8 95.8 94.6
Video Swin Transformer 51.4 47.9 98.8 98.7
SurgFM-Vid (ours) 57.8 49.3 98.9 98.9