6.835 Intelligent Multimodal User Interfaces

Resources for 6.835 term projects

We provide a list of toolkits/libraries that you can use to build your own multimodal user interface.

Body / hand tracking

Microsoft Kinect SDK [link] [Guidelines] [Sample Code]

Description: The Kinect for Windows software development kit (SDK) enables developers to use C++, C#, or Visual Basic to create applications that support gesture and voice recognition by using the Kinect for Windows sensor and a computer or embedded device.
Platform: Windows
Language: Microsoft Visual Studio (VB, C++, C#)
Input: Kinect Sensor

OpenNI SDK [link] [guide]

Description: The OpenNI API provides access to OpenNI-Compliant depth sensors. It allows an application to initialize a sensor and receive depth, RGB, and IR video streams from the device. It provides a single unified interface to sensors and .ONI recordings created with depth sensors.
Platform: Windows, Linux, OSX
Language: C++
Input: OpenNI-Compliant depth sensors

Leap Motion SDK [getting started] [overview] [demos]

Description: The Leap Motion SDK includes a service that runs on your computer, providing hand, finger and joint pose data via various language-specific APIs to your desktop or web application.
Platform: Windows, Mac, Web
Language: JavaScript, C#/Unity, C++, Python, Objective-C
Input: Leap Motion sensor

Head / eye / face tracking

Watson Head Tracker [link]

Description: Real-time Head Pose Estimation and Tracking, Eye Gaze Estimation and Gesture Recognition from USB or stereo camera
Platform: Windows, Linux, OSX
Language: C++, JAVA
Input: USB or stereo camera

Active Appearance Model using OpenCV (AAM-OpenCV) [link]

Description: Active Appearance Model Face Tracker using OpenCV in C++
Platform: Windows, Linux, OSX
Language: C++
Input: Webcam

EyeAPI [link]

Description: Obtain eye center location in low resolution images or videos.
Platform: Windows, Linux, OSX
Language: C++
Input: Webcam

FaceAPI [link]

Description: 6 DOF Head tracking
Platform: Windows
Language: C++
Input: Webcam

CLM-Z Face Tracker [link] [CVPR'12 paper]

Description: 3D Constrained Local Model (CLM-Z) for robust facial feature tracking under varying pose.
Platform: Windows
Language: C++ Visual Studio Project
Input: Video data

Speech

HTML5 Web Speech API [demo] [tutorial]

Description: A JavaScript API that enables web developers to incorporate speech recognition and synthesis into their web pages. It enables developers to use scripting to generate text-to-speech output and to use speech recognition as an input for forms, continuous dictation and control. Also allows web pages to control activation and timing and to handle results and alternatives.
Platform: Primarily Chrome/Safari users as of Feb 2015
Language: JavaScript
Input/Output: Voice

CMU Sphinx [link]

Description: Open Source Toolkit For Speech Recognition
Platform: Windows, Linux, OSX, Android
Language: C, JAVA
Input: Voice

Microsoft Speech Platform SDK [link]

Description: Provides functionality in the Microsoft Grammar Development Tools to help you validate, debug, test, and optimize grammars for voice applications.
Platform: Windows
Language: C#
Input: Voice

The WAMI Toolkit [link]

Description: Web-based speech recognition toolkit
Platform: JavaScript-enabled web browser
Language: JavaScript
Input: Voice
Note: Developed by the the Spoken Language Systems group at MIT CSAIL.

OpenEAR: Munich Open-Source Emotion and Affect Recognition Toolkit [link] [ACII'09 paper]

Description: This provides audio feature extraction algorithms implemented in C++.
Platform: Windows, Linux, OSX
Language: C++
Input: Audio

OpenEars [link]

Description: OpenEars is an shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement round-trip English language speech recognition and text-to-speech on the iPhone and iPad and uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries, and it is free to use in an iPhone or iPad app.
Platform: iOS
Language: Objective-C
Input: iPhone mic

General Computer Vision / Machine Learning Libraries

Gesture Recognition Toolkit [link]

Description: The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, c++ machine-learning library that has been specifically designed for real-time gesture recognition.
Platform: Windows, Linux, OSX
Language: C++

Weka: Data Mining Software in Java [link]

Description: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Platform: Windows, Linux, OSX
Language: JAVA

OpenCV computer vision library [link]

Description: The most popular, extensively documented library of programming functions for the realtime computer vision. Contains more than 2500 optimized algorithms.
Platform: Windows, Linux, OSX, iOS, Android
Language: C++, Python, JAVA

libSVM [link] [ACM TIST'11 paper]

Description: LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification.
Platform: Windows, Linux, OSX
Language: C++, Matlab wrapper, Python wrapper

BudgetedSVM [link]

Description: C++ toolbox containing highly optimized implementations of three recently proposed algorithms for scalable training of Support Vector Machine (SVM) models: Adaptive Multi-hyperplane Machines (AMM), Budgeted Stochastic Gradient Descent (BSGD), and Low-rank Linearization SVM (LLSVM). The toolbox also includes Pegasos, a state-of-the-art linear SVM solver, as it is a special case of AMM.
Platform: Windows, Linux, OSX
Language: C++, Matlab wrapper

SVM^light [link]

Description: An implementation of Support Vector Machines (SVMs) in C.
Platform: Windows, Linux, OSX
Language: C++, Matlab wrapper, Python wrapper
Note: Check out SVM^struct and SVM^Rank as well

Bayes Net Toolbox for Matlab [link]

Description: The Bayes Net Toolbox (BNT) is an open-source Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models.
Platform: Windows, Linux, OSX
Language: Matlab

hCRF Library [link]

Description: Implements 3 algorithms for sequence labeling tasks: CRF, HCRF, and LDCRF. Optimized for multi-threading. Works with sparse or dense input features.
Platform: Windows, Linux
Language: C++, Matlab wrapper, Python wrapper

Robot Operating Sysmtem (ROS) [link]

Description: ROS (Robot Operating System) provides libraries and tools to help software developers create robot applications. It provides hardware abstraction, device drivers, libraries, visualizers, message-passing, package management, and more.
Platform: Linux
Language: C++

OpenFramework [link]

Description: An open source C++ toolkit designed to assist the creative process by providing a simple and intuitive framework for experimentation. The toolkit is designed to work as a general purpose glue, and wraps together several commonly used libraries (libraries on graphis, audio analysis, image and video analysis, etc).
Platform: Windows, Linux, OSX, iOS, Android
Language: C++

Useful Links

Compressive Sensing Resources [link]

VRML: Visual Recognition and Machine Learning Summer School [2010] [2011] [2012]

MLSS: Machine Learning Summer Schools [link]

Survey on gesture datasets [1] [2]

Last updated February 16, 2015