Student Research Opportunities - Fall 2012, Spring 2013.

Knit: Integrating Human Based Partial Analyses of Big Data

We are developing means to automatically assist analysts/experts to identify patterns and detect anomalies in big data streams as they arise when heterogeneous, unstructured data sources are consulted. Our approach solely relies upon analysts and their ability to group/categorize "common situations" into patterns. As one can imagine, an analyst can only process a subset of the big data. We are developing machine learning algorithms that will use these partial groupings for a subset of the big data from each analyst and knit together, i.e. synthesize, integrate and/or merge, these discerned similarities providing a coherent global assessment.

Specifically, your goal will be to develop and analyze different machine learning methods under different characteristics of analysts and different situational scenarios. We have emulated analyst behavior and data streams through a probabilistic model of an analyst response to the data streams emulating a scenario. Our goal is to analyze different ML methods and develop new ones. You will be co-advised by a post-doc with extensive experience in machine learnng.

Requirement: The student must demonstrate excellent programming skills. Experience in machine learning and data mining is a plus.

Contact: Please send a CV to Kalyan Veeramachaneni (


Predicting blood pressure in an ICU setting

We are building a large scale predictive system that predicts the blood pressure for a patient under intensive care. The project relies on cloud-scale machine learning of many diverse predictive models. A variety of tasks are on the agenda including cloud-scale empirical experimentation, cross-referencing model predictions to clinical events, time series modeling, unsupervised learning of similar blood pressure segments and ultimately transforming many model outputs which are in the form of probabilities and predictions into visualizations that are succinct and informative to the doctors and intensivists who will use this system. This is an exciting project which focuses on making machine learning matter in the real world scenarios and creating impact. You will work with a team of doctors, post-docs and graduate students.ForecastingProblem.png: 1988x509, 13k (August 21, 2012, at 02:20 PM)

MEng, Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: or


Visualizing the progress of a machine learning system computing on the cloud

We are currently building a large scale, cloud-based, machine learning system. This paper explains the n-1th version of our system. Our system is a collection of distributed compute units and its design allows it to elastically shrink and expand as compute units are added or removed. One of the fundamental challenges posed by such a system is to monitor both learning progress and system status as the computation progresses in the long run. Progress monitoring especially becomes challenging when we execute the algorithm on 300-1000 cores. This project aims to develop a variety of distributed techniques to aggregate information about the progress from different nodes and create efficient, elastic visual interfaces (think of zooming in or zooming out of a model of the system and at each zoom level obtaining appropriately abstracted relevant information) which would allow the user to see the progress of the computation and system configuration in order to make decisions about which nodes could be eliminated or added. This project has a team of graduate students and a postdoc. You will be working in a team with a lot of experience in this system making it all the more fun to enable the team to visualize what they are building.

MEng, Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: or


Mining data from MIT EdX

We are building a variety of machine learning algorithms for mining data generated while delivering educational content to hundreds and thousands of students all over the world. A very fundamental question that folks in education are attempting to answer is: "What worked?" Answering this question would require us to analyze data in novel ways, for example building models of students, balancing for confounding factors. We are looking for a talented UROP or MEng student to work with a Research Scientist and a group of scientists and fellows at the MIT EdX team. This project has possible transformative affects on the next generation education systems. Read about EdX here and here.edX_Logo_Col_RGB_FINAL.jpg: 605x403, 33k (August 21, 2012, at 02:41 PM)

MEng, Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: or


Machine learning for Wind Energy

We are building a variety of machine learning systems for wind energy systems. Our focus currently is on building a statistical inference system that is able to predict the long term wind energy at a test location. Preliminary analysis has shown that these techniques provide good performance when compared to a variety of techniques used in the wind community. You can find more information here. Currently, we are looking to expand this work and collect data sets that are particularly challenging and adapt our algorithms. Our final goal is also to provide a web based interface to our algorithms such that their use is available to wind energy projects at community and small wind level. These small scale projects are particularly affected by poor resource assessment since the project budgets are so low that they cannot afford ML algorithm development and statistical analysis. This service would enable them to cheaply assess the wind resource at a site.Wind.png: 377x640, 177k (August 21, 2012, at 02:44 PM)

Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge
Please contact:


Probabilistic generative models for mathematical expressions that fit the data

We are interested in exploring data in new ways. Our immediate question is whether it is possible to generate white-box (i.e. interpretable) non-linear models which could further be mined for more information about the underlying generating process. Our current tact is to model the data via a large set of mathematical expressions which can be evaluated in a straight forward way with respect to their fit to the data. We then form a probability distribution over the better expressions of the set and then generate newer ones by sampling this distribution.GPtoPGM.png: 1551x841, 213k (August 06, 2012, at 12:32 PM)

Our initial steps towards this approach are documented here. They raise more questions, e.g. how best to model variable length mathematical expressions as a probability density function? How many samples of a pdf are sufficient to reflect its properties? How to learn this distribution from sub-samples that are a better fit? How to efficiently recover mathematical expressions of varying length from a pdf describing a sample?

You will closely work with a postdoc and will learn probabilistic analysis techniques, machine learning and data mining. Experience in java, MATLAB is required. Background in machine learning and data mining is preferred but not necessary. Please contact

Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge
Please contact:

Group News
MIT UROP, MEng and UAP projects

About the Group

Joining ALFA

Research Projects

Wind Energy

Private or Protected Links

Site Admin

New Member Guide

Very Important Resources

Misc Resources: Logos, templates, equipment inventory



Technical Approaches 2013

edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki