Student Research and SUPER-UROP Opportunities - Summer/Fall 2013

Deadline for Summer UROP appointments with Alfa Group via Hackers Heaven: Friday, April 26th

Summer'13 UROP deadline for Direct Funding (funded by MIT UROP office) is this Thursday, April 18th, at 5:00 PM. Students must submit their online proposals to supervisors before 5:00PM in order to be considered for Direct Funding. All students who wish to participate must submit an application online for the Summer session.

Deadline for Summer UROP appointments with Alfa Group funded by Alfa Group: Friday, April 26th

ALFA Group has 8 super-UROP projects listed here on EECS department super-UROP site. Click here for a short list of their titles. If you see a regular UROP project below that is more interesting, please contact us about it.

Deadline for Super-UROP applications (with a short proposal) being submitted to EECS by students is Tuesday, April 30th

Projects listed here are only for students enrolled at MIT. Please do not contact us if you are not a MIT student. See our Visitor and Membership Information if you want to join our group from outside MIT.


Table of Contents


FlexGP: Evaluating a Million models on a Billion cases

Our FlexGP system currently generates thousands of non-linear models that are of the form y=f(x), where f(.) could be any mathematical function generated from a set of operators, log, sin, sqrt. For example an expression could be y=log(x1)+sin(x2). For big data problems we have to perform multiple passes through the data, each time applying the model to the data and measuring its accuracy, to identify the best set of non-linear expressions that best explain the data. In this regard we are investigating and developing methods in order to be able to evaluate a million models on a billion data points in a fraction of a second? How can we get these speeds? Is it possible to get these speeds? These speeds will be paramount in making learning from big data successful. We are seeking an undergraduate, AUP, or senior who is interested in a challenging, very tangible, measurable problem like this and enjoys generating speed ups never been heard of before.

FlexGP-logo.png: 187x135, 29k (September 29, 2012, at 09:17 AM)

Re-opened for Summer, Fall 2013
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Big Data+ Machine learning + Medicine + Volunteer compute: Could it get anymore exciting?

Come join us and learn how we are building a large scale machine learning system through which we are attempting to solve some of the most challenging problems for our society. The most fascinating part of this is that we want to do this by using the the left over cpu cycles on machines all across the world. Technically, this creates a challenge for us as we are not able to centrally coordinate and plan data distribution, algorithmic steps and collect and process results. During the first two years we have made a lot of progress and are now seeking students to work with us in deploying multiple data problems and understand the challenges that these problems present us in our framework. You will work with a team consisting of post-docs and graduate students.

Re-opened for Summer, Fall 2013
For Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Machine Learning and Big Data: Performance Analytics

Machine learning systems depend on parameters and sometimes, buried deep inside, some randomized initial state. So, when we run them on BigData with different parameterizations, how can we unify the results? As well, how can we interpret an ML system's rules, classifiers, or models to learn how to iterate with an updated question for the ML system so we get better accuracy? Is the algorithm having trouble predicting a certain class? Why? Is it because of class imbalance or inadequate discriminatory power of a feature? Should we adjust the objective function to address these issues? Are the results consistent? Robust?

This project will introduce you to a machine learning system, executing a large scale distributed setting, learning from large datasets. It will familiarize you with the process of engineering a problem for the ML algorithm to solve, making sense of the algorithm's results and behavior, then iterating with new ideas on addressing the problem. The specific setting is a prediction problem: will an ICU patient's blood pressure be high, medium or low after a lead time. We have collected a large archive of data, have generated multiple predictive models, and are now analyzing the results. We are asking questions such as: What cohort of patients are hard to predict and why? Which class labels are hard to predict and why? Your project will be closely integrated into team effort where the team consists of a post-doc, and graduate students.

Re-opened for Summer, Fall 2013
Juniors and Seniors for 6.UAT, UAP''
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Feature Decision Boundaries and Quantization for Big Data Classification with ML

When building a rule based classifier (aka decision list) that allows readability, the decision boundaries have a significant effect on the accuracy of the solutions. The goal of this project is to develop efficient methods and algorithms to identify decision boundaries for large feature sets. We are working with a large scale classification problem in the medical domain with possibly hundreds and thousands of variables, some of which are tightly correlated. Efficient methods to identify thresholds for decision boundaries is intractable. You will work with a team of researchers with strong experience in this area. This is an exciting project where you will learn interpretable machine learning, big data implications for some traditional algorithms and develop methods that could have significant impact in the medical domain. Your project will be closely integrated into team effort where the team consists of a post-doc, and graduate students.

Re-opened for Summer, Fall 2013\\Juniors and Seniors
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Scalable methods for fusing Multiple models generated for big data

When dealing with big data we generate thousands of models where each model specializes on a subset of the data. Once we generate these thousands of models we are developing techniques that are able to combine these multiple models by learning weights for fusing their predictions. The techniques range from simple average to weighted sum to probabilistic approaches. Known as ensemble learning these methods have been able to allow users to reach prediction accuracies higher than one single model. See for example here. We are however interested in using ensemble learning in the case of a large datasets. We are seeking a student that will work closely with a graduate student and a post-doc to scale our algorithms that learn weights and test our methodology on multiple big data sets including possibly the heritage health care prize.

Juniors and Seniors, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Predicting "Rare" events in an ICU

We are developing a prediction system that predicts rare events like hypotensive episodes in an ICU setting. We have assembled a large arterial blood pressure feature-level dataset from a publicly available waveform dataset. One of the challenges is that the balance of the classes in the data is extremely skewed due to the rare nature of the events we are interested in. This imbalance in the data can significantly impact the accuracy of the forecast and it especially affects the dynamics of our iterative learning engine. The goal of the project is to develop and identify an efficient method to balance the data and techniques that could address this problem within our framework.

Re-opened for Summer, Fall 2013\\MEng, Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


BP-Watch: Predicting blood pressure in an ICU setting

We are building a large scale predictive system that predicts the blood pressure for a patient under intensive care. The project relies on cloud-scale machine learning of many diverse predictive models. A variety of tasks are on the agenda including cloud-scale empirical experimentation, cross-referencing model predictions to clinical events, time series modeling, unsupervised learning of similar blood pressure segments and ultimately transforming many model outputs which are in the form of probabilities and predictions into visualizations that are succinct and informative to the doctors and intensivists who will use this system. This is an exciting project which focuses on making machine learning matter in the real world scenarios and creating impact. You will work with a team of doctors, post-docs and graduate students.

ForecastingProblem.png: 1988x509, 13k (August 21, 2012, at 02:20 PM)

Re-opened for Summer, Fall 2013
Juniors and Seniors looking to lead to MEng via 6.UAT, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Bringing Large Scale Machine learning service to the desktop

Wouldn't it be exciting to be able to call on command line: classify(datafileLoc), or regress(dataFileLoc) and spin of 100s or more nodes on the cloud that access the data from the location specified at dataFileLoc and run machine learning and return results. We have built a large scale, cloud-based, machine learning system. This paper explains the latest version of our system. Our system is a collection of distributed compute units and its design allows it to elastically shrink and expand as compute units are added or removed. In this project we are looking at two issues: first is to develop interfaces that can monitor learning progress and system status as the computation progresses and second provides desktop based simple access to the system.

Progress monitoring especially becomes challenging when we execute the algorithm on 300-1000 cores. This project aims to develop a variety of distributed techniques to aggregate information about the progress from different nodes and create efficient, elastic visual interfaces (think of zooming in or zooming out of a model of the system and at each zoom level obtaining appropriately abstracted relevant information) which would allow the user to see the progress of the computation and system configuration in order to make decisions about which nodes could be eliminated or added. The desktop based access allows user to specify machine learning commands like classify(data) or regress(data) and spin off a large scale system as this. This project has a team of graduate students and a postdoc. You will be working in a team with a lot of experience in this system making it all the more fun to enable the team to visualize what they are building

Re-opened for Summer, Fall 2013
Juniors and Seniors, UAP
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Mining a MOOC's activity data: 6.002X explored

We are building a variety of machine learning algorithms for mining data generated while delivering educational content to hundreds and thousands of students all over the world. A very fundamental question that folks in education are attempting to answer is: "What worked?" Answering this question would require us to analyze data in novel ways, for example building models of students, balancing for confounding factors. We are looking for a talented UROP or MEng student to work with a Research Scientist and a group of scientists and fellows at the MIT EdX team. This project has possible transformative affects on the next generation education systems. Read about EdX here and here.edX_Logo_Col_RGB_FINAL.jpg: 605x403, 33k (August 21, 2012, at 02:41 PM)

Re-opened for Summer, Fall 2013\\Juniors and Seniors
Background: Course 6 courses in software and machine learning knowledge (6.034 and 6.867)
Please contact: alfa-apply@csail.mit.edu


Here is a title list of ALFA Group's Super UROP projects. More information about them is on the EECS Super UROP faculty projects website.

"Importing Machine Learning Techniques to the Database"

"Educational Data Mining -- Informing Online Education"

"Temporal Bayesian Modeling for Online Learning Data"

"Clinical Healthcare and Machine Learning"

"Building Large scale Matrix Decomposition Methods"

"Probabilistic Latent Variable Modeling via Volunteer Computing"

"Investigating Differential Privacy: Big Data and its Access Issues"

"Heterogeneous Computing for Machine Learning: GPU.. plus CPU plus Cloud"

to top

Home
Group News
MIT UROP, MEng and UAP projects

About the Group

Research Agenda
Technical Approaches
Publications
Members

Research Projects

FlexGP
Knowledge Discovery

Other Active Projects

Somewhat Active Projects

Past Projects

Education and Community

GP Benchmarks

Educational Outreach

Joining ALFA

PhD and PostDoc Applicants

Visitor Info

MIT UROP, MEng and UAP projects

GPEM reference info

People Details

Una-May O'Reilly

Kalyan Veeramachaneni

Members

Contact

Private or Protected Links

Site Admin

New Member Guide

UAP 2013

EvoDesign Private Wiki

More on FlexGP

CSAIL Quanta Cloud Notes

VMW Project Wiki

Wind Energy Systems

PetaBricks Wiki

Givaudan Wiki

LKSF Wiki

Mitre Coev Step Up Wiki

GME Notes

IBM Notes

AM Competition

Chest and GP

EvoMetabolic

GPML

CoEv Games

UM Dec2012

MOOC info

Group Admin

Wiki Pages Instructions

edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki