Project iDiary

From DRLWiki

Jump to: navigation, search

File:iDiary.png

Google your life

Contents for This Page

Motivation/Challenges

"What did you do today?" When we hear this question, we try to think back to our day's activities and locations. When we end up drawing a blank on the details of our day, we reply with a simple, "not much." Remembering our daily activities is a difficult task. For some, a manual diary works. For the rest of us, however, we don't have the time to (or simply don't want to) manually enter diary entries. The goal of this project is to create a system that automatically generates answers to questions about a user's history of activities and locations.

This system uses a user's GPS data to identify locations that have been visited. Activities and terms associated with these locations are found using latent semantic analysis and then presented as a searchable diary. One of the big challenges of working with GPS data is the large amount of data that comes with it, which becomes difficult to store and analyze. This project solves this challenge by using compression algorithms to first reduce the amount of data. It is important that this compression does not reduce the fidelity of the information in the data or significantly alter the results of any analyses that may be performed on this data. After this compression, the system analyzes the reduced dataset to answer queries about the user's history.

Why is it hard? This challenge is hard for two reasons. First, the system has to keep track of a user's visited locations. This is difficult because GPS data comes in very large quantities. This is evident from a few estimation calculations. One GPS packet (which includes latitude, longitude, and a timestamp) is on the order of 100 bytes. If a single phone collects one GPS packet every second, the phone would collect about 10 megabytes of data per day. In 2010, there were approximately 300 million smart phones sold [3]. If even a third of these phones were to continuously collect GPS data, 1 petabyte of data would be generated each day. That's enough data to fill one thousand external hard drives, each with a terabyte capacity, per day.

Large quantities of data are difficult to store and even harder to analyze. Generally, having more data also means having more noise in the data. This noise makes it difficult to distill the important information. We need to find ways to analyze this large amount of noisy data.

The second reason why this challenge is difficult is due to the necessity for activity recognition. That is, the system needs to convert GPS data into activities that are performed by the user. This translation of raw data into human-readable text requires the system to associate external information about locations with the locations'coordinates as well as parse this information to determine activities performed at that location. This process is difficult, which makes the challenge difficult.

Why is it Interesting? Being able to automatically generate answers to queries about a user's history is interesting not only because it is a problem that comes up on a daily basis, but also because a solution to this problem can be used to solve many other problems. A solution to this challenge would be able to manage large quantities of GPS data. Looking beyond GPS data, the ability to manage large quantities of data would be valuable in commercial businesses, scientific research, government analyses, and many other applications. As such, this challenge is interesting not only for forgetful users who would like to remember their previous activities and visited locations, but is also of interest to many other fields.

Our approach

The solution is to compress this large amount of data into a smaller, less noisy sketch of the data, and then run algorithms to analyze this compressed data. Compressing the data first is the key insight, as it allows the system to manage large quantities of data. This solution uses novel coreset creation and trajectory clustering algorithms to compress the data. After compression, the solution uses latent semantic analysis with the compressed data to perform search queries.

What is a coreset?

A coreset is a small subset of data elements, possibly augmented with additional information that represents the original dataset with respect to a family of queries / cost function computations. This representation is approximate, and allows us to control the tradeoff between approximation accuracy and coreset size / complexity.

iDiary system

During the project, several generations of the system have been developed. An overall schematic of the system is given below.

The core of the system is in the coreset creation, which generates a compact representation for the data streams obtained from the user (GPS, images).

Experiments

References

[1]

Soliman Nasser, Andew Barry, Marek Doniec, Guy Peled, Guy Rosman, Daniela Rus, Mikhail Volkov,, Dan Feldman - Fleye on the Car: Big Data meets the Internet Of Things
The 14th International Conference on Information Processing in Sensor Networks (IPSN) , Seattle, Washington, April 2015
Bibtex
Author : Soliman Nasser, Andew Barry, Marek Doniec, Guy Peled, Guy Rosman, Daniela Rus, Mikhail Volkov,, Dan Feldman
Title : Fleye on the Car: Big Data meets the Internet Of Things
In : The 14th International Conference on Information Processing in Sensor Networks (IPSN) -
Address : Seattle, Washington
Date : April 2015

[2]

Mikhail Volkov, Guy Rosman, Dan Feldman, John W Fisher III,, Daniela Rus - Coresets for Visual Summarization with Applications to Loop Closure
IEEE International Conference on Robotics and Automation (ICRA) , Seattle, Washington, USA, May 2015
Bibtex
Author : Mikhail Volkov, Guy Rosman, Dan Feldman, John W Fisher III,, Daniela Rus
Title : Coresets for Visual Summarization with Applications to Loop Closure
In : IEEE International Conference on Robotics and Automation (ICRA) -
Address : Seattle, Washington, USA
Date : May 2015

[3]

Guy Rosman, Mikhail Volkov, Dan Feldman, John W Fisher III,, Daniela Rus - Coresets for k-Segmentation of Streaming Data
Advances in Neural Information Processing Systems 27 (NIPS 2014) ,2014
Bibtex
Author : Guy Rosman, Mikhail Volkov, Dan Feldman, John W Fisher III,, Daniela Rus
Title : Coresets for k-Segmentation of Streaming Data
In : Advances in Neural Information Processing Systems 27 (NIPS 2014) -
Address :
Date : 2014

[4]

Rohan Paul Dan Feldman Daniela Rus Paul Newman - Visual Precis Generation using Coresets
International Conference on Robotics and Automation ,2014
http://people.csail.mit.edu/dannyf/icra2014.pdf
Bibtex
Author : Rohan Paul Dan Feldman Daniela Rus Paul Newman
Title : Visual Precis Generation using Coresets
In : International Conference on Robotics and Automation -
Address :
Date : 2014

[5]

Dan Feldman, Andrew Sugaya, Cynthia Sung, Daniela Rus - iDiary: from GPS signals to a text-searchable diary
SenSys pp. 6,2013
http://people.csail.mit.edu/dannyf/idiary.pdf
Bibtex
Author : Dan Feldman, Andrew Sugaya, Cynthia Sung, Daniela Rus
Title : iDiary: from GPS signals to a text-searchable diary
In : SenSys -
Address :
Date : 2013

[6]

D. Feldman, A. Sugaya, D. Rus - An Effective Coreset Compression Algorithm for Large Scale Sensor Networks
Proc. 11th ACM/IEEE Conf. on Information Processing in Sensor Networks (IPSN) 2012
http://people.csail.mit.edu/dannyf/ipsn12.pdf
Bibtex
Author : D. Feldman, A. Sugaya, D. Rus
Title : An Effective Coreset Compression Algorithm for Large Scale Sensor Networks
In : Proc. 11th ACM/IEEE Conf. on Information Processing in Sensor Networks (IPSN) 2012 -
Address :
Date :

People

  • Daniela Rus (PI, MIT)
  • Danny Feldman, Postdoctorate Associate.
  • Guy Rosman, Postdoctorate Associate.
  • Mikhail Volkov, Graduate Student.
  • Cindy Sung, Graduate Student.
  • Cathy Wu, Graduate Student.
  • Minh-Tue Vo, Undergraduate Student.
Personal tools