Real-time Gesture Recognition for Natural Multimodal
Interaction
Ying Yin
Gesture Taxonomy
Gesture Taxonomy
Gesture Taxonomy
Gesture Taxonomy
Gesture Taxonomy
Gesture Taxonomy
Temporal Model of Gestures
Pre-stroke -> Nucleus -> Post-stroke
Contributions
Human computer interaction
- Smoothly handles path and pose gestures
- Responds to discrete and continous flow gestures appropriately and promptly
Machine learning
- Probabilistic model that unifies two forms of gestures
- Hidden state -> gesture phase -> response timing
- Achieves balance between accuracy and responsiveness
System Overview
System Overview
Horizontal Orientation
Vertical Orientation
Salience Detection for Gesture [ICMI'13]
Vertical Orientation
Salience Detection for Gesture [ICMI'13]
Features
- Motion features
- Relative position, velocity, acceleration
- Hand pose features
- Histogram of oriented gradients (HOG) from depth data
- Principal component analysis (PCA) dimension reduction
Real-time Gesture Recognition
- Model: unified probabilistic model
- Training: learn parameters by examples
- Inference: online simultaneous segmentation and recognition
Why Unified Model
- Previous work
- Path gestures: HMM or HCRF
- Pose gestures: Nearest Neighbor and SVM
- Possible approaches to combine both
- Classify form first and use different methods
- Make early decisions that are hard to correct later
- Apply different methods concurrently
and compare probabilities
- How to compare probabilities from two different models?
- Unified probabilistic model
based on hierarchical HMM
- Comparable probabilities
- Different topologies for different forms
- Make soft decisions and propagate probabilities
- Until a response is required according to gesture flow
Temporal Modeling of Gestures
State Transition Diagram
Temporal Modeling of Gestures
State Transition Diagram
Hierarchical HMM
State Transition Diagram
Path and Pose Gestures
- Different topologies
- Different training strategies
Path Gestures
Path Gestures
Path Gestures
Path Gesture
- Training strategies
- Embedded training
- Two-pass training
- Viterbi alignment
- Baum-Welch expectation maximization
Pose Gesture
Pose Gesture
- Training strategies
- Expectation-maximiation for GMM
- Use Bayesian Information Criterion (BIC) to choose the number of mixtures
Combining and Flattening HMM
Combining and Flattening HMM
Combining and Flattening HMM
Combining and Flattening HMM
Combining and Flattening HMM
Combining and Flattening HMM
Combining and Flattening HMM
Online Inference
Fixed-Lag Smoothing
Online Inference
Fixed-Lag Smoothing
Online Inference
Fixed-Lag Smoothing
Online Inference
Fixed-Lag Smoothing
Simultaneous Segmentation and Recognition
hidden state -> gesture phase -> gesture
Simultaneous Segmentation and Recognition - Discrete Flow
Simultaneous Segmentation and Recognition - Discrete Flow
Simultaneous Segmentation and Recognition - Discrete Flow
Simultaneous Segmentation and Recognition - Discrete Flow
Simultaneous Segmentation and Recognition - Discrete Flow
Simultaneous Segmentation and Recognition - Discrete Flow
Simultaneous Segmentation and Recognition - Continuous Flow
Simultaneous Segmentation and Recognition - Continuous Flow
Simultaneous Segmentation and Recognition - Continuous Flow
Simultaneous Segmentation and Recognition - Continuous Flow
Evaluation
- Hybrid performance metrics
- Event based metric for discrete flow gestures
- Frame based metric for continuous flow gestures
- Average F1 for online recognition: 80.6%
- Average response time: 0.3s before coming to rest
Compare Different Topologies
Compare Different Topologies
Compare Different Topologies
Compare Different Topologies
Compare Different Topologies
Effect of Lag Time
Extensibility
- Fast training time
- Easy to add new gestures
Future Work
- Two-hand gestures
- Improve pose gesture recognition
- Improve user independent base model
Contributions
- Taxonomy-aware HHMM framework
- Probabilistic model that unifies recognition of two forms of gestures
- Hidden state -> gesture phase -> response timing
- Achieves a balance between accuracy and responsiveness
- Natural gesture interaction
- Smoothly handles path and pose gestures
- Responds to discrete and continous flow gestures appropriately
Acknowledgements
- Prof. Randall Davis
- Prof. Antonio Torralba and Prof. Bill Freeman
- Group members: Andrew, Jeremy, Yale, Kai, William, Nira, Aaron, Chih-yu, Sonya, Tom
- Friends and family
Problem with Frame-based Evaluation
0