cross attention, MOOC learning, Multimodal Emotion Recognition, physiological signals, video semantic information.