Resources for 6.835 term projects
We provide a list of toolkits/libraries that you can use to build your own multimodal user interface.
Body / hand tracking
- Description: The Kinect for Windows software development kit (SDK) enables developers to use C++, C#, or Visual Basic to create applications that support gesture and voice recognition by using the Kinect for Windows sensor and a computer or embedded device.
- Platform: Windows
- Language: Microsoft Visual Studio (VB, C++, C#)
- Input: Kinect Sensor
- Description: The OpenNI API provides access to OpenNI-Compliant depth sensors. It allows an application to initialize a sensor and receive depth, RGB, and IR video streams from the device. It provides a single unified interface to sensors and .ONI recordings created with depth sensors.
- Platform: Windows, Linux, OSX
- Language: C++
- Input: OpenNI-Compliant depth sensors
- Description: The Leap Motion SDK includes a service that runs on your computer, providing hand, finger and joint pose data via various language-specific APIs to your desktop or web application.
- Platform: Windows, Mac, Web
- Language: JavaScript, C#/Unity, C++, Python, Objective-C
- Input: Leap Motion sensor
Head / eye / face tracking
Watson Head Tracker
[link]
- Description: Real-time Head Pose Estimation and Tracking, Eye Gaze Estimation and Gesture Recognition from USB or stereo camera
- Platform: Windows, Linux, OSX
- Language: C++, JAVA
- Input: USB or stereo camera
Active Appearance Model using OpenCV (AAM-OpenCV)
[link]
- Description: Active Appearance Model Face Tracker using OpenCV in C++
- Platform: Windows, Linux, OSX
- Language: C++
- Input: Webcam
EyeAPI
[link]
- Description: Obtain eye center location in low resolution images or videos.
- Platform: Windows, Linux, OSX
- Language: C++
- Input: Webcam
FaceAPI
[link]
- Description: 6 DOF Head tracking
- Platform: Windows
- Language: C++
- Input: Webcam
- Description: 3D Constrained Local Model (CLM-Z) for robust facial feature tracking under varying pose.
- Platform: Windows
- Language: C++ Visual Studio Project
- Input: Video data
Speech
HTML5 Web Speech API
[demo]
[tutorial]
- Description: A JavaScript API that enables web developers to incorporate speech recognition and synthesis into their web pages. It enables developers to use scripting to generate text-to-speech output and to use speech recognition as an input for forms, continuous dictation and control. Also allows web pages to control activation and timing and to handle results and alternatives.
- Platform: Primarily Chrome/Safari users as of Feb 2015
- Language: JavaScript
- Input/Output: Voice
CMU Sphinx
[link]
- Description: Open Source Toolkit For Speech Recognition
- Platform: Windows, Linux, OSX, Android
- Language: C, JAVA
- Input: Voice
Microsoft Speech Platform SDK
[link]
- Description: Provides functionality in the Microsoft Grammar Development Tools to help you validate, debug, test, and optimize grammars for voice applications.
- Platform: Windows
- Language: C#
- Input: Voice
The WAMI Toolkit
[link]
- Description: Web-based speech recognition toolkit
- Platform: JavaScript-enabled web browser
- Language: JavaScript
- Input: Voice
- Note: Developed by the the Spoken Language Systems group at MIT CSAIL.
OpenEAR: Munich Open-Source Emotion and Affect Recognition Toolkit
[link]
[ACII'09 paper]
- Description: This provides audio feature extraction algorithms implemented in C++.
- Platform: Windows, Linux, OSX
- Language: C++
- Input: Audio
OpenEars
[link]
- Description: OpenEars is an shared-source iOS framework for iPhone voice recognition and speech synthesis (TTS). It lets you easily implement round-trip English language speech recognition and text-to-speech on the iPhone and iPad and uses the open source CMU Pocketsphinx, CMU Flite, and CMUCLMTK libraries, and it is free to use in an iPhone or iPad app.
- Platform: iOS
- Language: Objective-C
- Input: iPhone mic
General Computer Vision / Machine Learning Libraries
Gesture Recognition Toolkit
[link]
- Description: The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, c++ machine-learning library that has been specifically designed for real-time gesture recognition.
- Platform: Windows, Linux, OSX
- Language: C++
Weka: Data Mining Software in Java
[link]
- Description: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
- Platform: Windows, Linux, OSX
- Language: JAVA
OpenCV computer vision library
[link]
- Description: The most popular, extensively documented library of programming functions for the realtime computer vision. Contains more than 2500 optimized algorithms.
- Platform: Windows, Linux, OSX, iOS, Android
- Language: C++, Python, JAVA
- Description: LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM). It supports multi-class classification.
- Platform: Windows, Linux, OSX
- Language: C++, Matlab wrapper, Python wrapper
BudgetedSVM
[link]
- Description: C++ toolbox containing highly optimized implementations of three recently proposed algorithms for scalable training of Support Vector Machine (SVM) models: Adaptive Multi-hyperplane Machines (AMM), Budgeted Stochastic Gradient Descent (BSGD), and Low-rank Linearization SVM (LLSVM). The toolbox also includes Pegasos, a state-of-the-art linear SVM solver, as it is a special case of AMM.
- Platform: Windows, Linux, OSX
- Language: C++, Matlab wrapper
SVM^light
[link]
- Description: An implementation of Support Vector Machines (SVMs) in C.
- Platform: Windows, Linux, OSX
- Language: C++, Matlab wrapper, Python wrapper
- Note: Check out SVM^struct and SVM^Rank as well
Bayes Net Toolbox for Matlab
[link]
- Description: The Bayes Net Toolbox (BNT) is an open-source Matlab package for directed graphical models. BNT supports many kinds of nodes (probability distributions), exact and approximate inference, parameter and structure learning, and static and dynamic models.
- Platform: Windows, Linux, OSX
- Language: Matlab
hCRF Library
[link]
- Description: Implements 3 algorithms for sequence labeling tasks: CRF, HCRF, and LDCRF. Optimized for multi-threading. Works with sparse or dense input features.
- Platform: Windows, Linux
- Language: C++, Matlab wrapper, Python wrapper
Robot Operating Sysmtem (ROS)
[link]
- Description: ROS (Robot Operating System) provides libraries and tools to help software developers create robot applications. It provides hardware abstraction, device drivers, libraries, visualizers, message-passing, package management, and more.
- Platform: Linux
- Language: C++
OpenFramework
[link]
- Description: An open source C++ toolkit designed to assist the creative process by providing a simple and intuitive framework for experimentation. The toolkit is designed to work as a general purpose glue, and wraps together several commonly used libraries (libraries on graphis, audio analysis, image and video analysis, etc).
- Platform: Windows, Linux, OSX, iOS, Android
- Language: C++
Useful Links
Compressive Sensing Resources
[link]
VRML: Visual Recognition and Machine Learning Summer School
[2010]
[2011]
[2012]
MLSS: Machine Learning Summer Schools
[link]
Survey on gesture datasets
[1]
[2]
Last updated February 16, 2015