Download PDFOpen PDF in browserImage Caption Generation With Adaptive TransformerEasyChair Preprint 10466 pages•Date: May 28, 2019AbstractEncoder-decoder framework based image caption has made promising progress. The application of various attention mechanisms has also greatly improved the performance of the caption model. Improving the performance of every part of the framework or employ more effective attention mechanism will benefit the eventual performance. Based on this idea we make improvements in two aspects. Firstly we use more powerful decoder. Recent work shows that Transformer is superior in efficiency and performance to LSTM in some NLP tasks, so we use Transformer to substitute the traditional decoder LSTM to accelerate the training process. Secondly we combine the spatial attention and adaptive attention into Transformer, which makes decoder to determine where and when to use image region information. We use this method to experiment on the Flickr30k dataset and achieve better results. Keyphrases: Adaptive Attention, image caption, transformer
|