Scalable Technology For Online Education

Our broad goal is to contribute Computer Science technology toward helping, in an online or blended learning setting, teachers teach better and students learn better. The project motivates and guides our technical research agenda around bigData, data science and scalable machine learning.

ALFA-MOOC-AGENDA-v3.png: 672x409, 148k (November 16, 2013, at 09:02 PM)
Link to this image

Our aims channel the efforts and passions of many group members including Dr. O'Reilly, Dr. Veeramachaneni, Dr. Erik Hemberg, six graduate students and multiple superUROPs and UROPs. It also allows us to collaborate within MIT (Teaching and Learning Lab) and, externally, with edX, Coursera and Stanford University.

Table of Contents

Publications, Theses and Working Reports

  1. Technology for Mining the Big Data of MOOCs, Una-May O'Reilly, Kalyan Veeramachaneni. Winter 2014, Research and Practice in Assessment.
  2. Stopout Prediction in Massive Open Online Courses, Colin Taylor, M.Eng Thesis completed in MIT Dept of EECS, 2014. Advisors: Kalyan Veeramachaneni, Una-May O'Reilly.
  3. Modeling Problem Solving in Massive Open Online Courses, Fang Han M.Eng Thesis completed in MIT Dept of EECS, 2014. Advisors: Kalyan Veeramachaneni, Una-May O'Reilly.
  4. Developing data standards and technology enablers for MOOC data science, Kalyan Veeramachaneni, Una-May O’Reilly, Report for MOOC Research Initiative Grant Progress Report: October 2013 - April 2014.
  5. arXiv#1408.3382 Likely to stop? Predicting Stopout in Massive Open Online Courses, Colin Taylor, Kalyan Veeramachaneni, Una-May O'Reilly
  6. arXiv# 1406.2015 MOOCdb: Developing Standards and Systems to Support MOOC Data Science Kalyan Veeramachaneni, Sherif Halawa, Franck Dernoncourt, Una-May O'Reilly, Colin Taylor, Chuong Do.
  7. arXiv# 1407.5238 Towards Feature Engineering at Scale for Data from Massive Open Online Courses, Kalyan Veeramachaneni, Una-May O'Reilly and Colin Taylor.
  8. MoocViz: A Large Scale, Open Access, Collaborative Data Analytics Framework for MOOCs, Franck Dernoncourt, Choung Do, Sherif Halawa, Una-May O'Reilly, Colin Taylor, Kalyan Veeramachaneni and Sherwin Wu, DDE@NIPS 2013: Data Directed Education.
  9. Analyzing Millions of Submissions to Help MOOC instructors Understand Problem Solving, Fang Han, Kalyan Veeramachaneni and Una-May O'Reilly, DDE@NIPS 2013: Data Directed Education.


Don't be fooled, MOOCdb is more than a database or even a database schema. MOOCdb is an effort, a solution, a project and an community initiative.

moocdb-overview.png: 1198x376, 269k (September 14, 2013, at 09:23 PM)

To address heterogenous data formats, bloated raw data, fragmented data sources, and un-identified equivalences across MOOCs, MOOCdb re-organizes originally recorded MOOC data by means of a general purpose schema for analytics. The schema, while no relevant content is lost, reduces the raw data size and allows the population of a database for one or more courses. It:

  • makes the extraction of specific data for different research studies relatively lightweight.
  • facilitates extractions into formats that can be handled by commodity data tools such as Microsoft Excel.
  • facilitates and promotes shared software scripting including the extraction of variables (i.e. their definition), data visualization and even modeling.
    • minimizes wasted redundant efforts toward developing a software tool chain for data science research.
  • imposes no requirement to share the data. Its shared schema facilitates research collaboration at the level of shared variable definitions or higher.

News and Updates

to Top


MoocViz, formerly MOOC En Images is a generalizable analytics framework Alfa is designing which is under-pinned by the data organization of MOOCdb. The framework provides multiple views of the data which allows diverse investigators to quantitatively measure facets of the course either retrospectively or as it is taught. It efficiently executes queries such as the total number of submissions per week from the students who got an A grade in the course and visualizes them along selected pairs or triplets of four fundamental data axes: the student, time, space and course.

final_grade_distribution.png: 664x510, 34k (September 14, 2013, at 10:30 PM)
Figure 1. Final grade distribution of students who received a certificate in 6.002x. Students who earned at least 60% received a C, at least 70% a B, and at least 87% an A. Over 59% of students who received a certificate got an A in the class.
duration_per_grade.png: 1092x591, 39k (September 14, 2013, at 10:29 PM)
Figure 2. Cumulative time spent on resources, averaged by grade in the class. This shows a correlation between time spent and grade, although A and B grades had relatively similar cumulative times.
wiki_editis.png: 1506x775, 173k (September 14, 2013, at 10:30 PM)
Figure 3. An overlay of the number of forum posts on wiki edits. A 1 to 3 day delay of the spikes of activity between the two types of collaboration is discernable.
submissions_per_event.png: 1013x640, 172k (September 14, 2013, at 10:28 PM)
Figure 4. Observed events per submission per country of students who received a certificate. This value measures how long a student will spend on course material before submitting an assignment.

IN 2013 MOOC En Images was demonstrated with the data recorded during 6.002x in Spring 2012 (the first ever course offering by MITx). View a a PDF document entitled MOOC En Images: Examples of analytics based on MOOCdb for 6.002x: Circuits and Electronics (Spring 2012) or examine it online. to Top

Feature Marketplace

We are experimenting with engaging the crowd to solicit their opinions regarding the predictors for various course outcomes such as dis-engagement. See Towards Feature Engineering at Scale for Data from Massive Open Online Courses, Kalyan Veeramachaneni, Una-May O'Reilly and Colin Taylor. Arxiv report: 1407.5238.

to Top

MOOC Analytics

With the advent of massive online educational courses, it is possible to capture the records of many students' interactions with lecture videos, lab material, practice problems, online texts, and formal assessments, plus student peer to peer interaction on online forums. This wealth of behavioral data, when complemented by demographic information and understanding of course pedagogy is prompting an exciting new wave of educational research approaches and findings based upon large scale, software-enabled data analysis and machine learning. Our goal is to develop scalable machine learning algorithms to mine MOOC behavioral data to answer questions such as:

  1. Why do students not persist in the course? For our current status, see Stopout Prediction in Massive Open Online Courses
  2. How do student solve problems? For our current status, see Modeling Problem Solving in Massive Open Online Courses
  3. What resources are most helpful in gaining knowledge?
  4. Do students learn in different ways or styles?
  5. What is the optimum teaching rate?

MOOC data comes in many forms (forums, quizzes, video, etc) and different models (dynamic bayesian networks, hidden markov models, etc) are needed to answer questions about online learning.

to Top

Blended Learning: Evolutionary Processes and Computation

STU 2013: Evolutionary Processes and Systems

ALFA leader Dr. Una-May O'Reilly and post-doc Dr. Erik Hemberg taught a week long course on Evolutionary Processes and Systems at Shantou University, China in May 2013. A report is available and more photos from 2013 are here.

2014: Evolutionary Systems and Computation

STU 2014: Evolutionary Processes and Systems

The 2014 course was ALFA's first venture into blended learning. Students were exposed to different ways of learning, systematically introduced to Python and the use of evolution for computational intelligence in a Tron like game. Week 1 was taught in person at STU with an intensive 20 units of work. Over the next 10 weeks, material was taught online with the edX platform. In the final week of the course (Climax week), the instructors were back in the classroom and lab, face to face with students. The students wrapped up the course by pursuing projects around extending the Tron game, evolutionary algorithm and teaching others with customized versions of the course material. See STU 2014 Evolutionary Systems and Computation description

to Top

Evidence-Based Education Research for All

We are using Alfa's MOOCdb project as a framework to explore specific technical aspects related to global research collaboration around online education data. We are exploring software sharing, common data organization, data privacy and data access. The piloting effort uses data from several of MITx courses. Our activities recognize the importance of a transparent, fair access policy on online education data which respects privacy, confidentiality and the legal obligations of data controllers. We recognize that computer technology will play a key role in supporting general access so that respect for privacy and confidentiality is ideally balanced with the public good of investigating the digital evidence gathered from online education offerings. We are interested in joining forces with all who want to participate in these activities. Please contact us for more information.

In September, 2013 I drafted, with Lori Breslow, a white paper entitled ``Education Data for All: Forging Policies, Developing Structures'' in which we propose organizing a week-­‐long conference that will bring together educational researchers and computer scientists with system design, machine learning, database and/or data mining expertise to begin an effort to address the challenges posed by MOOC data as well as the opportunities they present. We believe it is imperative this effort be undertaken as soon as possible for data is amassing at an ever expanding rate. Processes and policies need to take into account the current technology and methods of online learning while anticipating what will happen in the future.

Lori Breslow, Ph.D. Director, MIT Teaching & Learning Laboratory
Hal Abelson, CSAIL, MIT
Danny Weitzner, Digital Information Group (DIG), CSAIL, MIT
to Top

Research Support

ALFA is very grateful for research support from:
lksf-logo.jpg: 240x56, 2k (September 15, 2013, at 09:24 PM) quanta-research-logo2.jpg: 223x119, 13k (September 15, 2013, at 09:32 PM)

Some Humor

The elephant represents MOOCs....

elephant.png: 494x367, 170k (September 14, 2013, at 11:04 PM)
...ALFA found the dog
blackdog.png: 666x328, 61k (September 14, 2013, at 11:04 PM)
and this is what it looks like...ALFA's MOOC view

Putting aside the humor, our ALFA animal is so rich in complexity that we are eager to interact with it from nose to tail. Hence we are gaining experience with the teaching medium, considering data access policies, designing open solutions to data organization, research collaboration, shared analytics and software (see MoocDB), conducting our own machine learning based data analytics and using crowd-sourcing to facilitate community hypothesis definition and testing.

to Top

Group News
MIT UROP, MEng and UAP projects

About the Group

Joining ALFA

Research Projects

Wind Energy

Private or Protected Links

Site Admin

New Member Guide

Very Important Resources

Misc Resources: Logos, templates, equipment inventory



Technical Approaches 2013

edit SideBar

Blix theme adapted by David Gilbert, powered by PmWiki