User Tools

Site Tools


Mixed Controls for Mixed Reality

Advanced blended motion controls can also be combined with existing surface touch interactions, tangibles and traditional peripheral device controls such as joysticks or styli. Custom mixed controls can be created to build rich auto-haptic feedback from real-world items and brought into VR and AR worlds. Mixed Reality (MR) controls will likely need a mix of discrete wearable devices (with active haptics) motion tracking and active object tracking to reduce input latency and provide high bandwidth natural user interfaces. Interactive tangible objects can add a very important dimension to interactions in MR as both real world context and virtual scene context is naturally aligned. The more types of tangibles users have to leverage the more refined and varied the use of virtual-real tools becomes.

Tangible Control Cube Custom Smart Props & Tools Phree, Smart Virtual Stylus

Simple passive tangibles can be easily created by users with a 3D printer. A white cube can be formed with well-known geometry and clean, white diffusive faces. The mesh of the cube can be reliably recognized when placed on table-top surfaces or grasped in the hand and the faces of the cube can be used to present visual feedback. User interaction with the cube can be easily mapped to UI controls to create low-cost configurable tangible controls.

Props can be created with 3D printing methods that use highly customized 3D models and associated object meshes. These objects can be used as configurable tools which can have integrated moving parts, wireless sensor modules (Intel Curie) or simple buttons that can provide more personalized form-factors or more nuanced controls.

Other active devices can be developed that specialize in specific and familiar types of interactions. For example the Phree digital stylus can be used to write on any surface. As the device has Bluetooth connectivity, the stroke data generated from drawing and writing can be placed correctly in a 3D scene on a known surface. The device would then enable high fidelity free-form surface interaction in VR and AR modeled space.


Unlike virtual reality, augmented reality and mixed reality present users with a real-time one-to-one view of the “real world” scene either as a digital background or by letting light naturally pass into the user's eyes. The higher the resolution and fidelity of the display the greater the fidelity any interactions need to be considered “natural”. Some devices require the user to perform exaggerated actions or simplify motion gestures to avoid showing visible latency or imprecision but this can lead to high energy or relatively slow-paced interactions. Any hand pose and motion gesture interaction used for AR and MR applications should present poses with a full scope of hand dexterity and allow for low-energy micro-gestures that leverage a rich, expressive motion gesture language that matches interaction fidelity with display fidelity.

CamBoard, micro-flick gesture Nod, micro-tap gesture Soli, micro-slide gesture

Examples of micro-gestures can be seen above: PMD nano, Nod ring and project “Soli”. The PMD Camboard Nano can be placed close to a keyboard device and enables a workflow that blends traditional key input with a powerful set of microgesture point, swipe and flick gestures. When used in this combination micro gestures provide an additional complimentary input channel that adds valuable bandwidth by reducing the need to full break from a typing task in order to take control of the cursor and reduces the cognitive load of using keyboard shortcuts by providing global gestures with simple low energy actions.

The Nod ring is a great example of a discrete wearable device that can be used for rich microgesture support. The Nod ring uses precise IMU tracking along with a set of five buttons. When low-energy touch buttons are used together with motion analysis a wide array of low-energy gestures can be recognized with strong delimiting capabilities that avoid accidental input. Additionally Nod can be used along with a Leap Motion device (without interference) to create rich interactive gesture spaces that support full hand pose and motion tracking along with complimentary sets of (occlusion free) finger microgestures.

Project Soli presents a compelling use of finger microgestures that leverage rich “auto-haptic feedback”. Using short-range radar methods the Soli sensor can detect minute changes in pose and motion of close proximity hands. The Soli sensor can distinguish the difference between subtle complimentary motions of fingers on the same hand using robust pattern matching and directly map these microgestures to predefined controls. Auto-haptic feedback can be seen in a number of Soli microgestures: when two finger tips touch or as two finger surfaces rub against each other. This type of dynamic surface feedback (of fingertips) is especially powerful as it creates vivid (self-managed) feedback and can be performed when the hand is in mid-air or in a variety of convenient poses that do not require other haptic surfaces or devices.

One of the most challenging aspects of creating fluid micro-gesture interactions is managing the distinction of, and dynamic transition between, gestures. Gesture conflict of this type can be handled using a combination of gestures using gesture priority, associative ranking and constructing nested, context-driven, null gesture spaces. Personalized micro-motion gestures enable a host of subtle fingertip interactions that reveal fundamentally new interactive sub-spaces and levels of immersion that have previously been inaccessible to HCI.

For more information on micro-gestures and how they can be used in next generation HCI see Micro-Gesture Index

Scene & Surface Interaction

Developing open world surface and 3D object interactions will be a crucial part of removing perception barriers in mixed reality interactions. Detailed surface tracking and modeling can be created using dense real-time mapping methods. Once the precise position, orientation and extent of a surface is known it can be uniquely tagged and qualified for specific interaction analysis. This could enable rich surface interaction using mesh intercept and object collision tracking methods to determine when finger tips encounter registered surfaces.

Dense Planar Surface Mapping Close Range Indoor Object Mesh Classifiers Long Range Outdoor Scene Segmentation

Passive everyday 3D objects will need to have a minimum level of context associated with them even if the system has never encountered them before. Most ambiguity about object identification within scenes can be resolved with (preexisting) object knowledge, however, low cost, real-time methods exist that support (supervised) learning which can identify and track new objects. This can take the form of automated semantic classifiers based on object mesh cues, cloud-supported computer vision identification techniques and community-supported classification methods. Once classified, any object registered by the MR system should develop an interaction history relative to the user and the constructed MR world space which can begin to anticipate interactive behaviors and optimize for commonly traveled pathways.

Simple Kinect (Z Threshold)
Surface Extrusion Tracking
Dense Mesh Reconstruction
Intersect Tracking
Common High Fidelity
Touch Surfaces

Qualified interactive surfaces and classified interactive 3D objects will need to have a consistent global context. This can be enabled using medium and long range RGB-D computer vision methods to perform scene segmentation and semantic classification to build a real-time modeled world. Common paths that a user tracks in this model become more refined over time. As classified object libraries become larger and local object mesh models receive more complete data from multiple perspectives, new objects or items that move throughout normal interactions become easier and more reliable to identify and track.

Significant improvements in local scene awareness and intelligent classification can be achieved by leveraging shared resources and online learning systems. Pools of open world scene data can be passed between users in close proximity or when connected to the cloud to update local context maps with a mix of relevant, auto-classified and curated content. Both the content and context of items will be reinforced as users dynamically update scenes in step with object interactions. This will notably improve classification methods and continually expand in scope to provide increasingly sophisticated global scene awareness. In turn this provides an integrated mechanism for cultivating and expanding robust 3D scene and object interactions.

Universal Interaction Primitives

Certain basic VR/AR/MR world interactions can be thought of a core interactions or interaction primitives. These fundamental gestures represent the minimum set of gestures needed to perform all of the most commonly performed UI tasks. Core interactions are frequently performed, and critical to the primary functionality of the MR operation. Each primitive should be crafted to present as part of a target “natural user interface” and considered to require a critically low cognitive load (for the user) to perform.

Within applications and operating systems it is common to create a set of “seeded” gestures that are easily “discoverable” and yet still provide high fidelity standard mapped controls that most users can operate with little or no learning curve. These core gestures can be used as standard entry points for users that will enable them to have a reliable first experience with a device or interface. The associated gesture recognition system can then be configured to learn nuances of specific users and personalize gestures to fit a custom style. The more the interface is used to better the gesture engine becomes at recognizing each gesture and the better it becomes at separating similar poses or motions. As this occurs key gesture (and micro-gesture) fidelity options can emerge that leverage the smaller error margin and enable a greater number of robust interaction styles.

For example a simple “pinch” 3D hand motion gesture is commonly performed in several different ways. When using an interface for the first time, an index-thumb-open-hand-pinch may yield the same result as a index-thumb-open-hand-pinch, a index-middle-thumb-pinch or even a full-hand-grab gesture. However as the user operates the system for a greater period of time clear patterns of preferred use emerge that can indicate which exact pose (or set of poses) will be exclusively more effective for the user. As a result a the weighted probability of the other poses can have a significantly lower chance of triggering a pinch event. This would automatically re-open these alternate poses (and gestures) as a refined subset of gestures which can be reserved for other discrete tasks or provide reliable micro-gesture transition paths. To learn more about elegant pose degradation and transition pathways see: GestureML-Advanced Gesture Mapping.

In summary this “gesture cross-mapping” approach can be described as providing “strong redundant default pose and motion mapping” that ensures elegant gesture degradation channels. When used as part of a supervised learning or training system, cross-mapping can over time provide an expanded, fully separable set of poses and gestures that can be used to confidently define distinct gesture subsets as clear patterns of natural user preference emerge.

Interaction Recording & Gesture Training

As surfaces and objects are identified and categorized as potentially interactive surfaces can be converted to dedicated virtual displays or touch screens, everyday desktop objects such as coffee cups can be made interactive by training MR systems in real-time. Once object features have been well defined, an object's state changes can be recorded by the user to define a required action (which can then be directly linked as an input control). The next time the action is performed on the coffee cup it will be recognized as a tangible object gesture. As the user builds more interactions in a constructed MR environment the system will be able to provide predictions and suggestions for interaction maps based on patterns, preferences and embedded context summarized in a rich (VCML) user interaction profile.

meta-augmented-reality-glasses.jpgMeta, AR Fingertip Motion Tracking
and Feedback
Omnitouch, Wearable Multitouch
Interaction Everywhere
Ikea, Kitchen Concierge
and Augmented Table

A very important part of creating a convenient workflow for robust user configurable interfaces is the strategic use of persistent visual feedback. Users can clearly observe when interactions fail and see subtle cues as to why an interaction may not perform as expected, weather the issue requires a object recognition, finger registration, input sensitivity or behavior categorization refinement. This will be a valuable part of expanding MR interfaces as it will directly accommodate for user directed corrections in primary interaction workflows and “learning modes”.

Interactive Object Networks

Beyond relatively simple one-to-one object interactions and controls, an immersive MR control scheme can provide the means to create links between objects and create direct control-links between 3D objects, surfaces, IOT devices or digital content. For example: media sharing can be enabled between a smart watch and a predefined virtual display by performing a “virtual hyperlink” gesture such as selecting the image on the watch, performing a hand motion pinch over the content, and dragging it onto a dynamic virtual screen. Alternatively, a smart light bulb can be controlled by an object by pointing to the light with the left hand and the object with the right hand to create an interaction network. The context for these types of interactions can be provided by environmentally aware MR devices and the same devices can deliver rich visual feedback representing simple state updates such as recognition events, connection events, download updates as simple overlay animations or detailed visual representations of connection and data transfer methods.

Shared Mixed Reality

Creating rich HMD driven interactive environments builds strong user-centric experiences that can provide breakthrough levels of immersion. However an very important element to consider is the exploration of collaborative (VR/AR/MR) environments that enable multiple users to be immersed in a shared virtual space. This provides a means for more than one user to view and interact with the same object in the same world-space. Unlike other modes of display and interaction (multi-user multitouch displays) users can directly interact with each other or virtual representations of other users (avatars) which creates a more complete collaborative space with natural human-human interaction.


Research has shown that drawing from eye tracking and gaze awareness information (collected from integrated HMD sensors) can present valuable cues for users working or playing in collaborative VR/AR/MR spaces as it can not only increase social presence of the user but also be a very effective way to reduce cognitive load and leverage hands-free, non verbal communication methods when performing cooperative tasks.

One of the main barriers to immersion in collaborative mixed reality spaces are the number functional and cognitive “seams” that prevent users from easily moving from one interaction sub-space to another. For example: any control scheme that forces the user to frequently change modes of operation or learn new operation techniques to complete common tasks requires a greater cognitive load that directly competes with the task itself. This risks making simple tasks labor-some and more complex task difficult to achieve in the MR medium with practice. To combat this natural seams can be placed that mimic real-world indicators and controls to change modes of operation with the number of seams minimized. One way this can be achieved is by leveraging real-world tangibles directly as strongly typed interaction tools along with providing auto-switching high-precision bare-hand manipulation methods that can be triggered by the combination of user proximity, eye contact and bi-manual interaction.

Simulated (Real-World) Object Physics

Overlaying digital images onto real, moving objects in a 3D scene, or inserting digital objects into a 3D scene, requires an effective and lightweight real-time use of physically simulated behaviors. While still present in virtual reality the property known as “uncanny valley” is far more apparent to users in Augmented Reality and Mixed Reality applications as object interactions and characters are always directly compared to a real-world scene. Creating convincing MR elements requires a unique approach to balancing “real-world physics” and artistic license in mixed world interactives.

Fuzzy Logic Simplified Shadows AR Vector Pool Table Collisions Interactive AR Sand Table Terrain

One approach is to carve out an interactive style that compliments key elements of physical simulation in the same way that select custom shaders create dynamic shadows, lighting and rich textures that allow digital objects to visually fit in a real world scene without attempting to be a “complete” or “full” simulation. An analogous approach to Newtonian mechanics, material deformation and fluid simulation may see physical attributes simulated with more detailed precision the closer the interaction/reaction is to the user. Avoiding the uncanny valley of MR interaction will likely also require careful attention to the dynamic modes in which the 3D surfaces associated with tools, hands and fingertips directly interact with the world elements from direct deformation collisions, sticky pinch methods to dynamic inertia (virtual mass) or apply virtual surface friction that compliments material smoothness. Strategically simplifying physical models will be as important as the management and choreography of transitions between physically simulated behaviors and modes of operation.

Using Physics to Reduce Apparent Latency

Physics can also be used to directly simulate the real-time dynamic motion and reduce apparent input lag. It is widely known that input lag from direct manipulation operations of real-world objects can be visibly detected when it exceeds 5ms. This is made more apparent when motion needs to be 1:1 or fast moving (as in AR and MR). For example: Any motion tracking lag of in bare-hand features (finger tips) can be easily seen from (overlayed) display feedback on the hand. Also as fast moving tangible objects are tracked the typical “electron to photon” latency of the tracking and display system is 5-30ms.

Predicting dynamic 2D surface paths Predicting 3D paths of bouncing objects Using hover 3D motion to predict 2D surface touch

By tracking the 3D trajectory of objects the path can be modeled so that the future position of the object can be confidently predicted. Display feedback image can then be projected (preemptively placed) where the moving object is expected to be in 3D space to reduce and in some cases eliminate apparent input lag (below 5ms). This approach can be directly applied to certain motions of tracked bare-hand features and similar methods can be applied to fingertip surface intersections to create more responsive tap or drag 2D surface interactions for mixed reality applications.

Personalized Immersive Environments

Over time, as MR users operate wearable input and display devices, the various types of interactions and visual feedback will need to create a personalized model of common environments, user preferences and associated gesture affordances. This can be summarized into a condensed interaction profile (using an extended VCML format) that evolves over time. Personalization is an important part of providing human-scale interactive environments with a strong sense of VR presence and seamless MR integration. Rich customization methods that incorporate environmental changes and user-driven refinements will likely emerge as a foundational expectation for high fidelity natural user experiences.

virtualcontrolml/mixed_reality_control_schemes.txt · Last modified: 2016/04/29 12:20 by paul