Interaction With Gaze, Gesture, and Speech in a Flexibly Configurable Augmented Reality System

THMS-2021

Zhimin Wang, Haofei Wang, Huangyue Yu, and Feng Lu

Abstract

Multimodal interaction has become a recent research focus since it offers better user experience in augmented reality (AR) systems. However, most existing works only combine two modalities at a time, e.g., gesture and speech. Multimodal interactive system integrating gaze cue has rarely been investigated. In this article, we propose a multimodal interactive system that integrates gaze, gesture, and speech in a flexibly configurable AR system. Our lightweight head-mounted device supports accurate gaze tracking, hand gesture recognition, and speech recognition simultaneously. The system can be easily configured into various modality combinations, which enables us to investigate the effects of different interaction techniques.We evaluate the efficiency of these modalities using two tasks: the lamp brightness adjustment task and the cube manipulation task.We also collect subjective feedback when using such systems. The experimental results demonstrate that the Gaze+Gesture+Speech modality is superior in terms of efficiency, and the Gesture+Speech modality is more preferred by users. Our system opens the pathway toward a multimodal interactive AR system that enables flexible configuration.

Bibtex

@ARTICLE{wang_21THMS, author={Wang, Zhimin and Wang, Haofei and Yu, Huangyue and Lu, Feng}, journal={IEEE Transactions on Human-Machine Systems}, title={Interaction With Gaze, Gesture, and Speech in a Flexibly Configurable Augmented Reality System}, year={2021}, volume={51}, number={5}, pages={524-534}, doi={10.1109/THMS.2021.3097973}}