Download PDFOpen PDF in browser

Listening Head Motion Generation for Multimodal Dialog System

EasyChair Preprint 15113

6 pagesDate: September 27, 2024

Abstract

This paper addresses the listening head generation (LHG), i.e., an avatar head motion in dialogue systems. In face-to-face conversations, head motion is a modality frequently used by both listeners and speakers. Listeners, in particular, tend to leverage head motion along with other backchanneling cues to react to the speaker and regulate the flow of the conversation. The type of head motion during dialogues varies between cultures and individuals, which implies that head motion generation for natural communication requires considering them. Additionally, existing works for head motion generation have primarily tackled speaker head generation, with limited work on listeners. In this study, we have created a multimodal dataset of casual Japanese conversation and a scalable, real-time LHG model that adapts to individual differences in head motion. We also developed the LHG that reflects individual tendencies via fine-tuning the model. The proposed models were evaluated through subjective experiments rated by four testers. The results showed that the proposed models successfully generated natural head motion and improved the appropriateness of head motion by focusing on individual tendencies. Further analysis was conducted to compare the differences between our method and actual human motion.

Keyphrases: Japanese Casual Dialogue Dataset, Listening Head Generation, dialogue system, motion synthesis

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:15113,
  author    = {Tamon Mikawa and Yasuhisa Fujii and Yukoh Wakabayashi and Kengo Ohta and Ryota Nishimura and Norihide Kitaoka},
  title     = {Listening Head Motion Generation for Multimodal Dialog System},
  howpublished = {EasyChair Preprint 15113},
  year      = {EasyChair, 2024}}
Download PDFOpen PDF in browser