Download PDFOpen PDF in browser

MedBLIP: Multimodal Medical Image Captioning Using BLIP

EasyChair Preprint 13501

7 pagesDate: May 31, 2024

Abstract

Medical image captioning is an important AI task in healthcare, automating the generation of text descriptions to support the management and interpretation of medical images. Our team participated in the second task of the ImageCLEFmedical-Caption 2024 challenge using the ROCOv2 dataset with the BLIP approach. Methods: Our approach leveraged the BLIP architecture for multimodal medical image captioning. This architecture employs a ViT (Vision Transformer) as the image encoder and a BERT (Bidirectional Encoder Representations from Transformers) as the text model. Results: We ranked 5th according to BERTscore and placed 3rd with ROUGE, BLEURT, and RefCLIP scores. Additionally, we achieved 2nd place for BLEU-1, METEOR, and CIDEr scores. Notably, we obtained the top position with a CLIP score of 0.827074, demonstrating the effectiveness of our approach in medical image captioning. Conclusion: Our participation in the ImageCLEFmedical-Caption 2024 challenge demonstrated the effectiveness of the BLIP architecture for medical image captioning, achieving a high CLIP score of 0.82707. This result demonstrates the model’s potential to generate accurate and informative textual descriptions from medical images, thereby aiding diagnosis and assisting non-experts in understanding medical images.

Keyphrases: BERT, BLIP, CLEF 2024, Image Captioning, medical image processing, pre-trained models

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:13501,
  author    = {Van Thien Phan and Khanh Trinh Nguyen and Anh Duc Dang Quang Hoang and Tien Quan Phan and Bao Thien Nguyen Tat},
  title     = {MedBLIP: Multimodal Medical Image Captioning Using BLIP},
  howpublished = {EasyChair Preprint 13501},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser