Scalable Technology For Online Education

Our broad goal is to contribute Computer Science technology toward helping, in an online or blended learning setting, teachers teach better and students learn better. The project motivates and guides our technical research agenda around bigData and scalable machine learning.

ALFA-MOOC-AGENDA-v3.png: 672x409, 148k (November 16, 2013, at 09:02 PM)
Link to this image

Our aims channel the efforts and passions of many group members including Dr. O'Reilly, Dr. Veeramachaneni, Dr. Erik Hemberg, six graduate students and multiple superUROPs and UROPs. It also allows us to collaborate within MIT (Teaching and Learning Lab) and, externally, with edX, Coursera and Stanford University.

Table of Contents

Publications and Working Reports

  1. MOOCdb Working Report, Dec 2, 2013
  2. MOOCEnImages: Examples of analytics based on MOOCdb for 6.002x: Circuits and Electronics (Spring 2012) (PDF) (MoocEnImages is a predecessor to MoocVis (alternately spelled MoocViz)
  3. MOOC Visualizations: Examples of analytics based on MOOCdb for 6.002x: Circuits and Electronics (Spring 2012) (Online version)
  4. MoocViz: A Large Scale, Open Access, Collaborative Data Analytics Framework for MOOCs, Franck Dernoncourt, Choung Do, Sherif Halawa, Una-May O'Reilly, Colin Taylor, Kalyan Veeramachaneni and Sherwin Wu, DDE@NIPS 2013: Data Directed Education.
  5. Analyzing Millions of Submissions to Help MOOC instructors Understand Problem Solving, Fang Han, Kalyan Veeramachaneni and Una-May O'Reilly, DDE@NIPS 2013: Data Directed Education.


Don't be fooled, MOOCdb is more than a database or even a database schema. MOOCdb is an effort, a solution, a project and an community initiative.

moocdb-overview.png: 1198x376, 269k (September 14, 2013, at 09:23 PM)

To address heterogenous data formats, bloated raw data, fragmented data sources, and un-identified equivalences across MOOCs, MOOCdb re-organizes originally recorded MOOC data by means of a general purpose schema for analytics. The schema, while no relevant content is lost, reduces the raw data size and allows the population of a database for one or more courses. It:

  • makes the extraction of specific data for different research studies relatively lightweight.
  • facilitates extractions into formats that can be handled by commodity data tools such as Microsoft Excel.
  • facilitates and promotes shared software scripting including the extraction of variables (i.e. their definition), data visualization and even modeling.
    • minimizes wasted redundant efforts toward developing a software tool chain for data science research.
  • imposes no requirement to share the data. Its shared schema facilitates research collaboration at the level of shared variable definitions or higher.

MOOCdb report and other documentation

  1. MOOCdb Working Report, Dec 2, 2013

News and Updates

to Top


MoocViz, formerly MOOC En Images is a generalizable analytics framework Alfa is designing which is under-pinned by the data organization of MOOCdb. The framework provides multiple views of the data which allows diverse investigators to quantitatively measure facets of the course either retrospectively or as it is taught. It efficiently executes queries such as the total number of submissions per week from the students who got an A grade in the course and visualizes them along selected pairs or triplets of four fundamental data axes: the student, time, space and course.

final_grade_distribution.png: 664x510, 34k (September 14, 2013, at 10:30 PM)
Figure 1. Final grade distribution of students who received a certificate in 6.002x. Students who earned at least 60% received a C, at least 70% a B, and at least 87% an A. Over 59% of students who received a certificate got an A in the class.
duration_per_grade.png: 1092x591, 39k (September 14, 2013, at 10:29 PM)
Figure 2. Cumulative time spent on resources, averaged by grade in the class. This shows a correlation between time spent and grade, although A and B grades had relatively similar cumulative times.
wiki_editis.png: 1506x775, 173k (September 14, 2013, at 10:30 PM)
Figure 3. An overlay of the number of forum posts on wiki edits. A 1 to 3 day delay of the spikes of activity between the two types of collaboration is discernable.
submissions_per_event.png: 1013x640, 172k (September 14, 2013, at 10:28 PM)
Figure 4. Observed events per submission per country of students who received a certificate. This value measures how long a student will spend on course material before submitting an assignment.

MOOC En Images is demonstrated with the data recorded during 6.002x in Spring 2012. This was the first ever course offering by MITx. View a a PDF document entitled MOOC En Images: Examples of analytics based on MOOCdb for 6.002x: Circuits and Electronics (Spring 2012) or examine it online. to Top

Feature Marketplace

We are experimenting with engaging the crowd to solicit their opinions regarding the predictors for various course outcomes such as dis-engagement. More details will be provided soon.

to Top

MOOC Analytics

With the advent of massive online educational courses, it is possible to capture the records of many students' interactions with lecture videos, lab material, practice problems, online texts, and formal assessments, plus student peer to peer interaction on online forums. This wealth of behavioral data, when complemented by demographic information and understanding of course pedagogy is prompting an exciting new wave of educational research approaches and findings based upon large scale, software-enabled data analysis and machine learning. Our goal is to develop scalable machine learning algorithms to mine MOOC behavioral data to answer questions such as:

  1. What resources are most helpful in gaining knowledge?
  2. Do students learn in different ways or styles?
  3. What is the optimum teaching rate?
  4. Why do students not persist in the course?

MOOC data comes in many forms (forums, quizzes, video, etc) and different models (dynamic bayesian networks, hidden markov models, etc) are needed to answer questions about online learning.

to Top

Blended Learning: Evolutionary Processes and Computation

STU 2013: Evolutionary Processes and Systems

ALFA leader Dr. Una-May O'Reilly and post-doc Dr. Erik Hemberg taught a week long course on Evolutionary Processes and Systems at Shantou University, China in May 2013. A report is available and more photos from 2013 are here.

2014: Evolutionary Systems and Computation

The 2014 course will become ALFA's first venture into blended learning. (Yes, eventually we will analyze our own course's data -- truly eating our own dog food.) Week 1 will be taught in person at STU with an intensive 40 units of work, similar to 2013. The next 8 weeks will be taught online with the edX platform. Students will be systematically introduced to Python and the distributed genetic algorithm. In Week 10, the instructors will again be in the classroom and lab, face to face with students. They will wrap up the course thematically and technically integrate student software modules assigned as homeworks into a class project where students, by playing a game on a mobile device, facilitate an algorithm computationally evolving successful game playing strategies. For a longer description see STU 2014 Evolutionary Systems and Computation description

to Top

Evidence-Based Education Research for All

We are using Alfa's MOOCdb project as a framework to explore specific technical aspects related to global research collaboration around online education data. We are exploring software sharing, common data organization, data privacy and data access. The piloting effort uses data from several of MITx courses. Our activities recognize the importance of a transparent, fair access policy on online education data which respects privacy, confidentiality and the legal obligations of data controllers. We recognize that computer technology will play a key role in supporting general access so that respect for privacy and confidentiality is ideally balanced with the public good of investigating the digital evidence gathered from online education offerings. We are interested in joining forces with all who want to participate in these activities. Please contact us for more information.

Lori Breslow, Ph.D. Director, MIT Teaching & Learning Laboratory
Hal Abelson, CSAIL, MIT
Danny Weitzner, Digital Information Group (DIG), CSAIL, MIT
to Top

Research Support

ALFA is very grateful for research support from:
lksf-logo.jpg: 240x56, 2k (September 15, 2013, at 09:24 PM) quanta-research-logo2.jpg: 223x119, 13k (September 15, 2013, at 09:32 PM)

Some Humor

The elephant represents MOOCs....

elephant.png: 494x367, 170k (September 14, 2013, at 11:04 PM)
...ALFA found the dog
blackdog.png: 666x328, 61k (September 14, 2013, at 11:04 PM)
and this is what it looks like...ALFA's MOOC view

Putting aside the humor, our ALFA animal is so rich in complexity that we are eager to interact with it from nose to tail. Hence we are gaining experience with the teaching medium, considering data access policies, designing open solutions to data organization, research collaboration, shared analytics and software (see MoocDB), conducting our own machine learning based data analytics and using crowd-sourcing to facilitate community hypothesis definition and testing.

to Top

Group News
MIT UROP, MEng and UAP projects

About the Group

Joining ALFA

Research Projects

MOOC Education
Wind Energy

Private or Protected Links

Site Admin

New Member Guide

Very Important Resources



GPEM reference info
Technical Approaches
Mix Resources: Logos, templates, code, inventory

edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki