User Tools

Site Tools


virtualcontrolml:multimodal_control_schemes

Multi-modal Control Schemes

The most recent generation of HCI devices can provide more than one mode of input. Multiple input modes can be very useful when creating new high-bandwidth control schemes and rich natural user interfaces. Creating multimodal control schemes provides parallel input channels that can be used on demand. This can enable valuable hands-free control methods during run-time as well as allowing for augmentation of existing hand-held modes.

Accelerometer Control

When creating a control scheme for a Android cell phone or tablet the built-in Accelerometer and Gyro (or IMU) sensors can be utilized to allow for smart device tilt gestures. These gestures can be mapped to walking, driving or flight controls so that users can hold the device in the normal fashion, use the touch controls and at the same time subtly steer a character.

Some devices have more than one built-in accelerometer sensor. These advanced devices can allow for additional gestures to be detected from different sides of the device. For example: some devices are sensitive enough to detect taps on the sides of the device and recognize the difference between a tap on the left side, a tap on the right side and still not confuse the action with a device tilt. Using device-based “accelerometer tap gestures” can add more functionality to cellphone based virtual gamepads so that the user experience is closer to a more traditional game controller. When accelerometer tilt gestures are used in conjunction with touch controls users can create schemes that mimic wiimote style interactions.

Wearable Controls

Most wearables have built-in accelerometer sensors and further extend their functionality by natively fusing data from accelerometers, gyrometers and magnometers to create “inertial measuring units” or IMU's. Wearables that contain IMU's can track at least 9DOF (“degrees of freedom” - x,y,z axis plus 3 rotational axis planes) which allows for more advanced controls using 3D tilt gestures (roll, pitch, yaw) along with more reliable 3D path analysis (x,y,z).

The Nod Ring

Depending on the desired function of the wearable, most have additional sensors and inputs beyond accelerometers and IMU's. For example: the Nod Ring has a capacitive touch pad, two capacitive touch buttons and two mechanical buttons as well as a 6DOF IMU. All packed into a light, low profile, Bluetooth wireless ring that can operate for up-to 8 hours between charges.

A Nod Ring Device Ideum, Dual Nod Controls

Multiple Nod rings can be connected via Bluetooth to create rich bi-manual control schemes. Left-hand controls can be mapped directly to button-qualified motion gestures such as tilt to allow for character control. Independent right-hand controls can allow the user to use motion drag gestures such as those from a joystick to control the main camera view. This mimics familiar left and right hand divisions seen in traditional keyboard controls and gamepads but maps them to immersive hand motion gestures.

The Myo Armband

The Myo armband is a unique wearable device that uses an embedded IMU and a fully integrated electromyography (EMG) sensor suite to track the motion of the arm and specific poses of the hand. This allows a wide range of 3D motion hand gestures to be created using the Bluetooth armband.

A Myo Armband Device Ideum, Dual Myo Controls

Multiple Myo devices can be connected via Bluetooth to a single view creating rich multiuser or bi-manual control schemes. In the example shown above, the left hand pose and motion is mapped to dpad controls and the right hand pose and motion is mapped to weapons fire. Using this multi-device method more than one gesture can be processed at the same time so users can move and shoot while still using mutually exclusive pose-qualified 3D motion gestures. This provides more versatile configuration options and elegantly avoids gesture conflict as each device can process input controls independently.

* Voice and Touch Controls
* Voice and Motion Gesture Controls
* Head and Voice Controls

Tobii Eye Tracking

Eye tracking systems have recently developed into much more compact, precise and robust peripheral tracking devices. Most of these new generation eye tracking devices track reflected IR from occipital facial features to identify and trace relative pupil positions and calculated gaze locations. Knowing where and what a user is looking at can add unprecedented real-time context to any interaction.

Tobii EyeX Desktop Eye Tracking Device Assassin's Creed, Gaze Tracking

Game developers have begun to use eye tracking to determine what dynamic content is being examined by the user. If the user looks at the face of a character it can turn and look back at the screen. alternatively if the user is not paying attention to areas of the screen enemies may jump out for a dynamic surprise effect. Eye tracking can be used for virtual control schemes in a number of ways. For example multimodal controls can be created that couple voice commands to eye tracking so that the application or game will only “listen” for commands when the user is looking at the display.


Multi-modal Desktop Control Schemes

Some devices like the Intel RealSense camera can provide three distinct input modalities. Using parallel gesture processing methods, these input streams can create rich context-driven control schemes that can be used to engage an array of desktop, gaming and IOT applications.

Voice commands can be added to any control scheme that has access to a microphone-connected system. In the RealSense camera there is an integrated microphone, this allows simple phrases and commands to be listened to directly by the camera device. Using middleware such as GestureWorks Fusion, voice commands can be mapped to existing Windows key commands or system events and used to globally control applications or the operating system.

In addition to voice input the RealSense camera is capable of head, face and bi-manual hand tracking. Because of this (head) gaze-qualified 3D motion gestures can be applied to applications along with (head) gaze-qualified voice commands. This enables users to target application windows and UI elements only when looking in the direction of the display to ensure that only objects of interest are manipulated by motion gestures or that the system only listens to voice commands when the user looks at the window.

A Dynamic GUI Overlay on Chrome A Simple Static Hand Pose Gesture

In the example shown above a virtual overlay is created on the Chrome Web Browser when operating the Youtube website with the Fusion utility. The overlay provides dynamic feedback to show the user what gestures are available in this (website) context and when they are recognized. This is extremely powerful when operating a “hands-free” interface from a distance as existing browser UI elements and feedback are designed for close proximity mouse or touch operation.

Prototype RealSense Camera Ideum, GestureWorks Fusion POC

In addition to content targeting and rich multimodal gesture combinations, eye tracking can be used to automatically switch application focus.

When using multiple application windows that need multiple unique control schemes it is important to determine which application or window is under immediate user control (i.e. has operating system focus).

Eye tracking can reliably determine what application window the user is “looking at” as it provides an implicit real-time selection based on attention. This can then be used to reliably define the context for multimodal interactions which allows the correct controls to be dynamically mapped to the relevant application.

Multi-surface Control Schemes

virtualcontrolml/multimodal_control_schemes.txt · Last modified: 2016/04/29 11:54 by paul