Challenges
Tasks on Surg-3M:
Based on the annotation of the Surg-3M dataset, we propose two novel surgical downstream tasks:
- Multi-label (35 classes) video classification of procedure types.
- Binary video classification of surgery types.
Leaderboard for the proposed tasks
To establish a baseline for the two tasks proposed in Surg-3M, we conducted an evaluation of SotA approaches, which serves as a benchmark for future research endeavors.
Method | Procedure type | Surgery type | ||
---|---|---|---|---|
mAP (%) | F1-score (%) | Accuracy (%) | F1-score (%) | |
SlowFast | 22.0 | 23.9 | 88.5 | 87.5 |
TimeSformer | 42.1 | 37.5 | 93.2 | 92.7 |
MViTv2 | 49.5 | 41.8 | 95.8 | 94.6 |
Video Swin Transformer | 51.4 | 47.9 | 98.8 | 98.7 |
SurgFM-Vid (ours) | 57.8 | 49.3 | 98.9 | 98.9 |