Download PDFOpen PDF in browserListening Head Motion Generation for Multimodal Dialog SystemEasyChair Preprint 151136 pages•Date: September 27, 2024AbstractThis paper addresses the listening head generation (LHG), i.e., an avatar head motion in dialogue systems. In face-to-face conversations, head motion is a modality frequently used by both listeners and speakers. Listeners, in particular, tend to leverage head motion along with other backchanneling cues to react to the speaker and regulate the flow of the conversation. The type of head motion during dialogues varies between cultures and individuals, which implies that head motion generation for natural communication requires considering them. Additionally, existing works for head motion generation have primarily tackled speaker head generation, with limited work on listeners. In this study, we have created a multimodal dataset of casual Japanese conversation and a scalable, real-time LHG model that adapts to individual differences in head motion. We also developed the LHG that reflects individual tendencies via fine-tuning the model. The proposed models were evaluated through subjective experiments rated by four testers. The results showed that the proposed models successfully generated natural head motion and improved the appropriateness of head motion by focusing on individual tendencies. Further analysis was conducted to compare the differences between our method and actual human motion. Keyphrases: Japanese Casual Dialogue Dataset, Listening Head Generation, dialogue system, motion synthesis
|