Medical SAM 2: Segment Medical Images as Video via Segment Anything Model 2

Jiayuan Zhu1, Abdullah Hamdi1, Yunli Qi1, Yueming Jin2, Junde Wu1,
1University of Oxford 2National University of Singapore
MY ALT TEXT

When provided with a prompt in one 3D slice, MedSAM-2 can segment all later spatial-temporal 3D frames. When given a prompt in one 2D image, MedSAM-2 can accurately segment other 2D images that are not temporally related using the same criteria, which is an emergence of One-prompt Segmentation capability.

Abstract

Medical image segmentation plays a pivotal role in clinical diagnostics and treatment planning, yet existing models often face challenges in generalization and in handling both 2D and 3D data uniformly. In this paper, we introduce Medical SAM 2 (MedSAM-2), a generalized auto-tracking model for universal 2D and 3D medical image segmentation. The core concept is to leverage the Segment Anything Model 2 (SAM2) pipeline to treat all 2D and 3D medical segmentation tasks as a video object tracking problem. To put it into practice, we propose a novel self-sorting memory bank mechanism that dynamically selects informative embeddings based on confidence and dissimilarity, regardless of temporal order. This mechanism not only significantly improves performance in 3D medical image segmentation but also unlocks a One-Prompt Segmentation capability for 2D images, allowing segmentation across multiple images from a single prompt without temporal relationships. We evaluated MedSAM-2 on five 2D tasks and nine 3D tasks, including white blood cells, optic cups, retinal vessels, mandibles, coronary arteries, kidney tumors, liver tumors, breast cancer, nasopharynx cancer, vestibular schwannoma, mediastinal lymph nodules, cerebral artery, inferior alveolar nerve, and abdominal organs, comparing it against state-of-the-art (SOTA) models in task-tailored, general and interactive segmentation settings. Our findings demonstrate that MedSAM-2 surpasses a wide range of existing models and updates new SOTA on several benchmarks.

Video

MedSAM-2 Framework

MY ALT TEXT Building on the SAM2 framework, we propose treating 3D medical images and 2D medical image flows as videos to facilitate memory-enhanced medical image segmentation. This approach not only improves performance in 3D medical image segmentation but also unlocks One-Prompt Segmentation capability for 2D medical image flows. This is achieved by incorporating our proposed Self-Sorting Memory Bank, which selects the most confident embeddings based on the confidence predictions (α, β, γ) from the mask decoder.

3D Medical Images Segmentation Performance & Visualization

We show the comparison of MedSAM-2 with task-tailored models, interactive generalized models, and auto-tracking generalized models. Evaluated on 11 unseen tasks by Dice Score (%). We show comparison of MedSAM, our MedSAM-2, and ground truth on sequential 3D medical image segmentation on the BTCV dataset. Note how our MedSAM-2 produce more consistent 3D predictions leveraging the 3D context and maintaining high generalization capability compared to MedSAM.

2D Medical Images Segmentation Performance & Visualization

We show the comparison of MedSAM-2 with SOTA segmentation methods over BTCV dataset evaluated by Dice Score (%). Task-tailored models, interactive generalized models, auto-tracking generalized models are marked in yellow, green, blue. We show several examples of 2D segmentation on diverse datasets.

BibTeX

@misc{zhu_medical_2024,
      title={Medical SAM 2: Segment medical images as video via Segment Anything Model 2},
      author={Jiayuan Zhu and Abdullah Hamdi and Yunli Qi and Yueming Jin and Junde Wu},
      year = {2024},
      eprint={2408.00874},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
     }