<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Haystack Blog &#187; Social Computing</title>
	<atom:link href="http://groups.csail.mit.edu/haystack/blog/category/social-computing/feed/" rel="self" type="application/rss+xml" />
	<link>http://groups.csail.mit.edu/haystack/blog</link>
	<description>MIT CSAIL Research</description>
	<lastBuildDate>Tue, 24 Nov 2009 04:05:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Building a Social Data Commons</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/11/23/building-a-social-data-commons/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/11/23/building-a-social-data-commons/#comments</comments>
		<pubDate>Tue, 24 Nov 2009 03:28:04 +0000</pubDate>
		<dc:creator>Adam Marcus</dc:creator>
				<category><![CDATA[Databases]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Social Computing]]></category>
		<category><![CDATA[Thought Piece]]></category>
		<category><![CDATA[eGovernment]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=743</guid>
		<description><![CDATA[Inspired by Ted’s vision of what he’d like to see happen to data.gov, I decided to have a try at my hopes for it. Ted’s desires for data.gov are all ones that I agree would make the data more accessible. I would now like to discuss what else I might want in a world where [...]]]></description>
			<content:encoded><![CDATA[<p>Inspired by <a href="http://groups.csail.mit.edu/haystack/blog/2009/11/18/plotting-a-course-for-data-gov/">Ted’s vision</a> of what he’d like to see happen to <a href="http://www.data.gov/">data.gov</a>, I decided to have a try at my hopes for it. Ted’s desires for data.gov are all ones that I agree would make the data more accessible. I would now like to discuss what else I might want in a world where such steps were taken: a world in which government data was centralized, versioned, searchable, and accessible.</p>
<p>Now what? Given the large and growing pile of data we will optimistically uncover, we will run into new frustrations. People will claim that the published data formats are not the ones that their analysis tool requires. People will be overwhelmed by dataset size, not knowing where to start. People will unknowingly recreate someone else’s data-munging workflows on the way to repeating analyses of the same data. People will become the next bottleneck if data ever ceases to be.</p>
<p>There’s no one answer to the concerns listed above because everyone has a different goal for the data. To handle these issues, we will need more than a place to find up-to-date datasets—-we will also need a place where it is easy for people to share ideas and strategies for tackling data. We will need a <em>social data commons</em>.</p>
<p>Whereas blogs and wikis help report findings, steps, and missteps, a social data commons can be the place to go to “talk shop” about the available data. Even if people post their solutions using decentralized means, there will be benefit to pooling all of these resources in one place on the web. Here are some tools that will help the data-tinkerers get things done:</p>
<ul>
<li><strong>Data-munging war stories</strong>. The first stage in data analysis is often long and frustrating. One must digest the dataset in the form they received it, and transform, clean, and filter out the subset that they wish to analyze, visualize, or otherwise present. The workflow differs for each dataset and application, but to the extent that people can share tools and instructions for processing each dataset, these should be written up in the form of recipes for baking the data.</li>
<li><strong>Crowdsourced analysis</strong>. Datasets can be overwhelming. While many exploration tasks are easily automated, it is often easiest to leave certain tasks (e.g., “Find the interesting pictures”) to humans. <a href="https://www.mturk.com/mturk/">Mechanical Turk</a> gives us a hint at what this might look like, and the Guardian provides a wonderful <a href="http://mps-expenses.guardian.co.uk/">example</a> of crowdsourced public data analysis in action.</li>
<li><strong>Current uses showcases</strong>. To spark competition, avoid duplicating work, and inspire follow-on projects, visitors should see a showcase of the current uses of each dataset. Aside from links to sites built around a dataset, the list can include <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/">embedded visualizations</a> of finished work.</li>
<li><strong>Analysis wishlists</strong>. Given that data released by a government reaches more than just programmers, there will be more people with ideas than people who can implement the ideas. People with ideas should be given an outlet, and passers-by should be asked to vote on these ideas to help data geeks with some free cycles discover the most insteresting unimplemented project.</li>
<li><strong>Data wishlists</strong>.  If an agency were to dedicate resources to releasing another dataset, which one is in highest demand?  As Ted <a href="http://groups.csail.mit.edu/haystack/blog/2009/11/18/plotting-a-course-for-data-gov/">mentioned</a>, governments should let demand drive delivery.</li>
<li><strong>Forums</strong>. No set of tools will encompass all use cases for social data analysis. A discussion forum can lead to the formation of interest groups while serving as a catch-all for needs not served by the list above.</li>
</ul>
<p>The US government might hit a few bumps trying to implement some of these social features. For example, a conflict of interest might arise if the showcase of uses of a dataset includes a site critical of the current administration. Having the executive branch ban spam or abusive comments on a forum draws concern over limitations of <a href="http://www.wired.com/techbiz/people/magazine/17-04/st_thompson">free speech</a>.  These details are not roadblocks, but they do signal that we can’t expect a social overlay to spring out of data.gov <em>per se</em>—-if we want these features, we may have to build and manage them on a third party.</p>
<p>I’m sure there’s more to the social data commons than I listed here. What did I miss, and where can we seek further inspiration?</p>
<p><em>Thanks to Ted for reading the first version of this entry.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/11/23/building-a-social-data-commons/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Plotting a Course for Data.gov</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/11/18/plotting-a-course-for-data-gov/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/11/18/plotting-a-course-for-data-gov/#comments</comments>
		<pubDate>Wed, 18 Nov 2009 16:15:59 +0000</pubDate>
		<dc:creator>Edward Benson</dc:creator>
				<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Social Computing]]></category>
		<category><![CDATA[Thought Piece]]></category>
		<category><![CDATA[eGovernment]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=734</guid>
		<description><![CDATA[The US Government efforts to create a culture of open government data is a big deal. Hopefully it signals a shift from the “pull” model of FOIA to a “push” mindset in which data is proactively returned to the public without first having to ask (and pay). Still, data.gov has a lot of room for [...]]]></description>
			<content:encoded><![CDATA[<p>The US Government efforts to create a culture of open government data is a big deal. Hopefully it signals a shift from the “pull” model of FOIA to a “push” mindset in which data is proactively returned to the public without first having to ask (and pay). Still, data.gov has a lot of room for improvement, as Clay Johnson of Sunlight Labs mentions <a href="http://www.sunlightlabs.com/blog/2009/get-your-act-together-datagov/">here</a> and <a href="http://www.sunlightlabs.com/blog/2009/what-id-change-about-datagov/">here</a>. </p>
<p>Clay’s criticisms are well founded, but what I’d like to see more of is some brainstorming about what our ideal data.gov would look like. A <a href="http://thedextrousweb.com/2009/10/the-wraps-come-off-data-gov-uk/">recent post</a> about the coming  data.gov.uk site provides a nice foil for us, for one, as the UK seems to be taking a very different approach. But more importantly, what would you want to see in a government data site, and how would you use it? </p>
<p>I heard once that it is a good exercise to try to compress an idea into three sentences or less — it forces you to understand what you really want to say. So here is my three sentence suggestion:</p>
<ul>
<li><b>Bring it all under one roof</b>. The current data.gov site is like Yahoo! from the mid-90s: it is just a directory of links to other sites. This is a noble start, but we really need to get a single point of access if we want to revolutionize eGovernment. The government is an immense, heterogeneous organization, so this is as much an organizational challenge as a technical one. But there are plenty of precedents of systems which allow individual data publishers (the government agencies) to retain control over the publishing and updating of their own data, while allowing data consumers (the public) to access it all from a single location.</li>
<li><b>.. But don’t forget to give credit.</b> When offering a single access point for all the data, it is essential to keep metadata that tracks which data came from where. This is as important for book-keeping and data integration reasons as it is for simply giving credit where credit is due. Agencies that publish data sets of great use should be recognized for their work.</li>
<li><b>Build it as you go</b>. We don’t need the perfect system overnight. No single ontology, schema, or data format will be able to encompass all the government’s data. That’s OK — it doesn’t have to. Don’t let fear of not getting it perfect slow down incremental progress toward our goal. Just bringing the data under one roof is a fantastic start; you can always try to begin standardizing formats and “linking” it later. My next blog post will specifically address this topic.</li>
<li><b>…And version data sets</b>. The benefits of offering “versions” of datasets are threefold. First, it allows you to maintain a system in which data providers feel comfortable updating their data at will. Second, it allows the implementors of the system to feel comfortable experimenting with data integration techniques and knowing that, if it doesn’t work out, users still have access to the same system they did last week. Third, it is the ultimate expression of openness: like a subversion repository for the government, everyone will be able to see the evolution of data over time.</li>
<li><b>Help users discover data</b>. With the sheer volume of data available, publishing it isn’t enough — you have to help people find what they want. The current data.gov site already does a decent job of offering search functionality. We can go further, providing data “footnotes” for bloggers to link back into the data.gov site (see the <a href="http://projects.csail.mit.edu/datapress/">DataPress</a> project for an idea of how this might work), suggestions of “hot” data sets for particular areas of interest, or a government data blog that highlights new and important data that has been recently published.</li>
<li><b>.. And let them tell you what they want.</b> Your users — the citizens — are your best assets. Let them prioritize your tasks for you by allowing them to suggest and vote on features and data sets they would like to see added. This type of decentralized management strategy is making waves among the business community, and the same mindset can be applied to government.</li>
</ul>
<p>So there are my three sentences: Bring it all under one roof, but don’t forget to give credit. Build it as you go, and version data sets along the way. Help users discover data, and let them tell you what they want.</p>
<p>What are your three?</p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/11/18/plotting-a-course-for-data-gov/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Information Glut, or Information Gluttons?</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/11/17/information-glut-or-information-gluttons/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/11/17/information-glut-or-information-gluttons/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 22:00:48 +0000</pubDate>
		<dc:creator>David Karger</dc:creator>
				<category><![CDATA[PIM]]></category>
		<category><![CDATA[Social Computing]]></category>
		<category><![CDATA[Thought Piece]]></category>
		<category><![CDATA[CSAIL]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=729</guid>
		<description><![CDATA[I had an interesting discussion with my student Katrina Panovich today.  I&#8217;m intrigued by the way people use twitter for &#8220;ambient awareness&#8221;&#8212;watching what goes by, but not worrying about what they miss.   I find this paradoxical&#8212;if you don&#8217;t care about missing stuff, why watch at all?  Especially given that each arriving tweet provides some degree [...]]]></description>
			<content:encoded><![CDATA[<p>I had an interesting discussion with my student <a href="http://people.csail.mit.edu/kp/">Katrina Panovich</a> today.  I&#8217;m intrigued by the way people use twitter for &#8220;ambient awareness&#8221;&#8212;watching what goes by, but not worrying about what they miss.   I find this paradoxical&#8212;if you don&#8217;t care about missing stuff, why watch at all?  Especially given that each arriving tweet provides some degree of distraction from whatever you&#8217;re doing?   KP actually remarked that she liked twitter better when fewer people were on it, so there was less information to follow.   Again, the paradox&#8212;you can always arrange to follow less on twitter.  The problem is the &#8220;insurmountable opportunity&#8221;&#8212;some of that new content might be really important.   But right now we are trusting to luck to see that content.</p>
<p>I proposed researching some tools that, instead of relying on luck to determine which tweets you see, instead figure out the most valuable ones to show you.  I don&#8217;t believe that filtering by person (a big mix of different interests) and hashtag (unreliable, often nonexistent) is the best way to locate the tweets that are most useful to me.  But KP poured some cold water on this idea, arguing that tools that improved your ability to filter tweets would just lead to people following more users, such that they got swamped with tweets again.  More generally, that regardless of what information filtering tools we get, we will always push them to the limit of delivering too much information.</p>
<p>I realized I&#8217;ve experienced this myself&#8212;I used to visit various web sites to gather information. As I began to find it burdensome to keep up with all these web sites, I ultimately switched to an RSS reader to make it easier for me.  But that has simply allowed me to subscribe to more sources than I was following manually, such that I am again feeling swamped by my information feeds.</p>
<p>Does this mean that any assault on the cliched &#8220;information overload&#8221; problem is doomed, since whenever we fix it people will load up more?  It seems the only hope is to convince people that they don&#8217;t actually need the information they are gathering.</p>
<p>This idea actually relates to another line of our research, on note-taking.   People like to write down all sorts of little scraps of information.  But according to data we&#8217;ve logged from our <a href="http://listit.csail.mit.edu/">list.it notetaking plugin</a> for firefox (13,000 users&#8212;you should give it a try), a lot of those notes are never retrieved.  So why are they written down?  Perhaps it&#8217;s because people worry they might need them later, even though they never do.   Something similar seems to be going on with information streams&#8212;once they exist, people start to worry they might miss something important, even though they never worried about it before.   If we could somehow convince people that they could find anything that really mattered, they might become less gluttonous followers of information.</p>
<p>And this leads to another project of ours, FeedMe, described in a <a href="http://groups.csail.mit.edu/haystack/blog/2009/11/16/introducing-feedme-a-new-sharing-tool-for-google-reader/">recent post</a> on this blog.   I&#8217;d love to stop following a bunch of my newsfeeds, if only I could be confident that the really good bits would be brought to my attention.  There are collaborative filtering tools like Digg, but I don&#8217;t trust them to know what I like.  FeedMe is instead based on having friends forward interesting content to me.   I trust my friends more than any algorithm; if enough of them read a given blog, I can stop on the assumption that they&#8217;ll forward interesting content to me.   But right now I don&#8217;t have good feedback on how many of my friends are reading what blogs.   That might be an interesting feature to add to FeedMe.  An alternative might be for a group of friends to &#8220;divvy up&#8221; a blog, each reading a subset of the content and deciding which to forward to which friends.  This would need some supporting interfaces, of course.</p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/11/17/information-glut-or-information-gluttons/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introducing FeedMe: A New Sharing Tool for Google Reader</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/11/16/introducing-feedme-a-new-sharing-tool-for-google-reader/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/11/16/introducing-feedme-a-new-sharing-tool-for-google-reader/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 22:12:03 +0000</pubDate>
		<dc:creator>Michael Bernstein</dc:creator>
				<category><![CDATA[News]]></category>
		<category><![CDATA[Social Computing]]></category>
		<category><![CDATA[User Interfaces]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=721</guid>
		<description><![CDATA[A few weeks ago Adam and I blogged about some of our recent work investigating how link-sharing happens on the web. In contrast to most sharing tools out there, which broadcast your shares to anyone who will listen, we found that lots of sharing happens point-to-point, from friend to friend. An interesting outcome of this [...]]]></description>
			<content:encoded><![CDATA[<p>A few weeks ago <a href="http://groups.csail.mit.edu/haystack/blog/2009/10/13/feedme-understanding-and-supporting-social-link-sharing-on-the-web/">Adam and I blogged</a> about some of our recent work investigating how link-sharing happens on the web. In contrast to most sharing tools out there, which broadcast your shares to anyone who will listen, we found that lots of sharing happens point-to-point, from friend to friend. An interesting outcome of this friendsourced link discovery is that it is highly personalized: rather than getting what the Internet or your social network finds interesting, as you would on Digg or Facebook, you get what people think <em>you</em> would find interesting.</p>
<p>We want to empower this kind of sharing to happen more. So, we&#8217;ve built <a href="http://feedme.csail.mit.edu">FeedMe</a>, a Greasemonkey plug-in for Google Reader on Firefox.  Today, we&#8217;re releasing it publicly!</p>
<p><img class="alignnone" title="FeedMe Screenshot" src="http://groups.csail.mit.edu/haystack/feedme/demo.png" alt="" width="611" height="92" /></p>
<p>FeedMe makes it easier to send brief e-mails to your friends to share links. Once you&#8217;ve started sharing, it starts recommending friends who might be interested in seeing the post you&#8217;re looking at. Ideally this makes it even faster to share with more people who would find content relevant, without requiring you to type or switch windows.</p>
<p>FeedMe has loads of other bells and whistles: optional digesting of shares, indications of whether the recipient already received the URL, how many URLs the recipient has gotten recently, and One-Click Thanks for recipients to tell you what they like (similar to the Like feature on Facebook).  If you don&#8217;t use Google Reader, you can still use FeedMe&#8212;we&#8217;ve created a bookmarklet that allows you to share any webpage!</p>
<p>Give it a whirl, and let us know what you think at <a href="mailto:feedme@csail.mit.edu">feedme@csail.mit.edu</a>! For you academicy types, you can check out more about FeedMe <a href="http://hdl.handle.net/1721.1/49426">in our tech report</a>.</p>
<p>To try FeedMe out, head over <a href="http://feedme.csail.mit.edu">here</a>, and let us know what you think.</p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/11/16/introducing-feedme-a-new-sharing-tool-for-google-reader/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Blogs and the Dissemination of Scientific Research</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/11/04/blogs-and-the-dissemination-of-scientific-research/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/11/04/blogs-and-the-dissemination-of-scientific-research/#comments</comments>
		<pubDate>Wed, 04 Nov 2009 14:50:24 +0000</pubDate>
		<dc:creator>Michael Bernstein</dc:creator>
				<category><![CDATA[Social Computing]]></category>
		<category><![CDATA[Thought Piece]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=687</guid>
		<description><![CDATA[HCI research needs to get better at spreading the word, sooner, in the Web 2.0 era.  Typically, by the time that CHI rolls around, the research being presented is at least 7 months old.  When (or if) a group decides to post PDFs early, the papers are so distributed that interested readers can&#8217;t find them. [...]]]></description>
			<content:encoded><![CDATA[<p>HCI research needs to get better at spreading the word, sooner, in the Web 2.0 era.  Typically, by the time that CHI rolls around, the research being presented is at least 7 months old.  When (or <em>if</em>) a group decides to post PDFs early, the papers are so distributed that interested readers can&#8217;t find them. What&#8217;s more, the research that is posted isn&#8217;t presented in a web-friendly way: how many web pages do you read in PDF form?  But, when HCI research is made available in an interesting and accessible form, it <a href="http://tech.slashdot.org/tech/08/10/12/0228206.shtml" target="_blank">often</a> <a href="http://www.newscientist.com/article/dn17554-after-the-boom-is-wikipedia-heading-for-bust.html" target="_blank">gets</a> <a href="http://hardware.slashdot.org/story/09/10/05/185226/Microsoft-Research-Shows-Off-Multi-Touch-Mouse-Prototypes?from=rss" target="_blank">great</a> <a href="http://www.forbes.com/2009/01/21/postits-digital-tools-tech-intel-cz_lg_0122postits.html" target="_blank">press</a>.</p>
<p>What I&#8217;m thinking might remedy the situation is a CHI early results <span>blog</span>.  It would work like this: when a paper is accepted to CHI, the <span>SIGCHI</span> <span>blog</span> administrators e-mail the authors and invite them to write a short <span>blog</span> post describing the results that will be appearing at the conference. It should be written for a general web audience and other HCI researchers and practitioners. This would not just be the abstract and intro; the blog would highly encourage pictures, videos, and any other media. The drafts would be vetted for readability, and then posted as soon as they are ready (with some flow control to make sure we don&#8217;t post too many items at once). This means results could be posted as early as December.</p>
<p>If successful, this could be a great way to accelerate the dissemination of important research results, generate lots of positive buzz for the conference and the papers in it, keep research conversations going year-round, and increase the number of HCI posts on Slashdot. And I mean, Slashdot is our currency, really.</p>
<p>What do you think?  Would you post in a centralized SIGCHI blog when your papers get accepted?</p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/11/04/blogs-and-the-dissemination-of-scientific-research/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>FeedMe: Understanding and Supporting Social Link Sharing on the Web</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/10/13/feedme-understanding-and-supporting-social-link-sharing-on-the-web/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/10/13/feedme-understanding-and-supporting-social-link-sharing-on-the-web/#comments</comments>
		<pubDate>Wed, 14 Oct 2009 03:20:36 +0000</pubDate>
		<dc:creator>Adam Marcus</dc:creator>
				<category><![CDATA[Social Computing]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=589</guid>
		<description><![CDATA[Which approach do you take to managing information overload on the web?  Do you unleash the firehose on yourself, subscribing to RSS feeds or relying on content aggregators to keep up with the news?  Or do you take small sips from the stream of content, regularly checking a small set of websites to look for [...]]]></description>
			<content:encoded><![CDATA[<p>Which approach do you take to managing information overload on the web?  Do you unleash the firehose on yourself, subscribing to RSS feeds or relying on content aggregators to keep up with the news?  Or do you take small sips from the stream of content, regularly checking a small set of websites to look for updates?  It&#8217;s a common problem: firehosers dedicate much of their time to finding the golden nugget in the stream, whereas the sippers have given up on hearing everything&#8212;they will settle for a subset of the news. In both cases, highly personalized information often misses the recipient, or arrives late.</p>
<p>We&#8217;ve been looking at ways to empower another source highly personalized content: our friends, family, and coworkers. They already share web pages with us by e-mail, in person, and on social networks.  Social link sharing is often high-quality and personalized: quality is vetted by people you trust, and personalization is implicit when your social network uses its notion of your interests and tastes to forward you links.  Social link sharing is not perfect either&#8212;we all have that friend that&#8217;s filling up our mailbox with e-mails that contain the subject line &#8220;Fwd: Fwd: Fwd: puppies,&#8221; and our considerate friends sometimes avoid sending us content to avoid being perceived as that person.</p>
<p><a href="http://people.csail.mit.edu/msbernst">Michael</a> and <a href="http://people.csail.mit.edu/marcua">I</a> have been working on a multi-stage project to <em>understand</em> the social processes behind web content sharing and to <em>support</em> those processes by introducing a novel tool called FeedMe to facilitate such sharing.  We&#8217;ve published our findings in <a href="http://hdl.handle.net/1721.1/49426">this technical report</a>, and have summarized the results below.  Today we&#8217;ll be sharing part 1, where we will discuss our initial exploration to understand social link sharing; in the next post, you&#8217;ll be hearing about the tool we built based on these findings, and you&#8217;ll get a chance to sign up for a public release of FeedMe!</p>
<p><strong>Link-Sharing Surveys</strong><br />
We conducted two surveys encompassing 140 users of Amazon Mechanical Turk, one focusing on what it&#8217;s like to receive posts, and the other focusing on what people think about when sharing.</p>
<p>In our receiver surveys, we learned several things:</p>
<ul>
<li><strong>E-mail is the dominant link-sharing medium</strong>.  Receivers cited a lack of time as a reason for why they do not visit content aggregators to find the top web content.  Sharers share content through email over all other mechanisms, because it is ubiquitous on the internet, and is a consistent protocol for sending content with anyone.  Another interesting tidbit: in addition to being the dominant link-sharing mechanism, e-mail tied regularly visiting one&#8217;s favorite websites as the the dominant information-finding mechanism.  Few users utilized feed readers, social aggregators, or social networks for links.  It turns out that e-mail is <em>not</em> dying in favor of Facebook and Twitter, especially not for the average user.<img class="aligncenter size-full wp-image-590" title="Table 1" src="http://groups.csail.mit.edu/haystack/blog/wordpress/wp-content/uploads/2009/09/table1.png" alt="Table 1" width="399" height="244" /><img class="aligncenter size-full wp-image-591" title="Table 2" src="http://groups.csail.mit.edu/haystack/blog/wordpress/wp-content/uploads/2009/09/table2.png" alt="Table 2" width="409" height="249" /></li>
<li><strong>Topic Interest Drives Enjoyment</strong>.  The biggest reasons receivers cited for liking shared content was the relevance and entertainment value of the content.  Off-topic shares were off-putting for them.  Sharers were conscious of this; relevance and timeliness were their biggest concerns.<img class="aligncenter size-full wp-image-595" title="Table 3" src="http://groups.csail.mit.edu/haystack/blog/wordpress/wp-content/uploads/2009/09/table3.png" alt="Table 3" width="399" height="141" /><img class="aligncenter size-full wp-image-596" title="Table 4" src="http://groups.csail.mit.edu/haystack/blog/wordpress/wp-content/uploads/2009/09/table4.png" alt="Table 4" width="402" height="198" /></li>
<li><strong>Link Sharing is Burdensome when it is a Repetitive Firehose</strong>.  Receivers disliked it most when sharers could not rate-limit themselves.  One user complained about a sharer who blindly forwards 10-20 e-mails per day.</li>
<li><strong>Small Audiences are Best</strong>.  A small recipient list is a good predictor of whether recipients will appreciate the content.</li>
<li><strong>Friends are the Most Common Target</strong>.  Sharers share more content with friends than family or co-workers, and their set of receiving friends are a small group that they regularly communicate with.</li>
<li><strong>Receivers Want Even More</strong>.  If guaranteed high-quality content, receivers claimed they would like to have more links shared with them.</li>
</ul>
<p>From active sharers, we learned:</p>
<ul>
<li><strong>Sharing Correlates with Seeking</strong>.  Individuals that identify with spending a large amount of time seeking out web content are also those that identify with sharing a large amount of content, and having their contacts in mind as they read web content.<img class="aligncenter size-full wp-image-592" title="Figure 2" src="http://groups.csail.mit.edu/haystack/blog/wordpress/wp-content/uploads/2009/09/figure2.png" alt="Figure 2" width="408" height="311" /></li>
<li><strong>Sharing does not Imply Sociality</strong>.  You might think that sharing activity is guided by how much of a social butterfly you are. Not so. We measured two types of social capital, and neither was able to explain sharing practice.</li>
</ul>
<p><strong>Next Up&#8230;</strong><br />
With this information in mind, we sought out to build a tool to help heavy information seekers share more content.  Next week we&#8217;ll be sharing FeedMe, the tool we built to address this issue.  Until then, feel free to look through our <a href="http://hdl.handle.net/1721.1/49426">technical report</a> for the detailed results.</p>
<div id="_mcePaste" style="overflow: hidden; position: absolute; left: -10000px; top: 993px; width: 1px; height: 1px;"><img class="aligncenter size-full wp-image-591" title="Table 2" src="http://groups.csail.mit.edu/haystack/blog/wordpress/wp-content/uploads/2009/09/table2.png" alt="Table 2" width="409" height="249" /></div>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/10/13/feedme-understanding-and-supporting-social-link-sharing-on-the-web/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Introducing &#8220;Eyebrowse&#8221; &#8211; Track and share your web browsing in real time</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/08/28/introducing-eyebrowse-track-and-share-your-web-browsing-in-real-time/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/08/28/introducing-eyebrowse-track-and-share-your-web-browsing-in-real-time/#comments</comments>
		<pubDate>Fri, 28 Aug 2009 06:55:05 +0000</pubDate>
		<dc:creator>Max Van Kleek</dc:creator>
				<category><![CDATA[Collective Intelligence]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Social Computing]]></category>
		<category><![CDATA[Web Architectures]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=450</guid>
		<description><![CDATA[We&#8217;ve launched a service for letting people share, in real time, what pages they&#8217;re looking at on the web.  Our system, eyebrowse, lets the person choose exactly what sites they want to share their viewing patterns about, and eyebrowse does the rest &#8212; producing statistical visualisations of your web browsing habits over time, compared to [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve launched a service for letting people share, in real time, what pages they&#8217;re looking at on the web.  Our system, eyebrowse, lets the person choose exactly what sites they want to share their viewing patterns about, and eyebrowse does the rest &#8212; producing statistical visualisations of your web browsing habits over time, compared to your friends and the world.  It&#8217;s called &#8220;eyebrowse&#8221; and is available here:</p>
<p><strong><a href="http://eyebrowse.csail.mit.edu">http://eyebrowse.csail.mit.edu</a></strong></p>
<p>It currently requires Firefox/Iceweasel and works on all major platforms.  All data that is collected is <strong>public</strong> and available to <strong>anyone</strong> who wants it (we do not horde or claim to own any of your data. We like Twitter&#8217;s model.)  We will soon provide a nice interface with daily tarballs of the database in RDF, XML and CSV.</p>
<p><strong>Why would you want to share your web trails?</strong></p>
<p>1. For Science!  It&#8217;s not fair that certain Search Engine Companies can do web trail research because they have access to massive repositories of data.  There should be public corpora for IR researchers around the world.  And these should be OPEN.</p>
<p>2. For your friends!  You look at lots of cool stuff on the web every day.  You might not think of explicitly sharing every single thing you read.  Eyebrowse is lightweight enough that you just have to tell it once per site you want to share.  I&#8217;ve already discovered tons of weird things that my friends are looking at that they would not have bothered to share explicitly.</p>
<p>3. To understand your own browsing habits.  How many times do you read ACM/IEEE every day? I bet you don&#8217;t know. Now you can get quantitative statistics and visualise long-term journal revisitation patterns &#8211; and other things.</p>
<p><strong>Will it violate my privacy?</strong></p>
<p>1. We give you control.  You have to tell eyebrowse explicitly what you want to share on a site-by-site (host) basis. You can take things off the whitelist at any time.  You can also go back and delete things that it has logged in the past all through our web interface.   It also respects Private Browsing (aka pornmode) and will not log any data regardless during this mode.</p>
<p>2. It fosters contemplation/awareness: We are trying to also raise awareness of what OTHERS (e.g. Google Analytics) are collecting about you as you surf the web, by showing you what you can learn from yourself by selectively publishing your own data feeds.</p>
<p>By letting people selectively publish web trails in an open, non-invasive way, we are hoping to foster a discussion of how we can use our web browsing behavior to build more adaptive and effective interfaces that <strong>respect people&#8217;s privacy</strong>.</p>
<p>Feedback is appreciated.  Please email us directly at : eyebrowse@csail.mit.edu</p>
<p>Oh and eyebrowse is free and open source software, licensed under the MIT License.  The source is available as part of the list-it codebase here: <a href="http://code.google.com/p/list-it">http://code.google.com/p/list-it</a></p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/08/28/introducing-eyebrowse-track-and-share-your-web-browsing-in-real-time/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SIGIR09: Telling Experts from Spammers: Expertise Ranking in Folksonomies</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/07/22/sigir09-telling-experts-from-spammers-expertise-ranking-in-folksonomies/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/07/22/sigir09-telling-experts-from-spammers-expertise-ranking-in-folksonomies/#comments</comments>
		<pubDate>Wed, 22 Jul 2009 20:58:30 +0000</pubDate>
		<dc:creator>David Karger</dc:creator>
				<category><![CDATA[Collective Intelligence]]></category>
		<category><![CDATA[Publication]]></category>
		<category><![CDATA[SIGIR]]></category>
		<category><![CDATA[Semantic Web]]></category>
		<category><![CDATA[Social Computing]]></category>
		<category><![CDATA[CSAIL]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=386</guid>
		<description><![CDATA[From our friends in Southhampton (correction: and Hasso-Platner), a study of how to differentiate experts (who really know how to tag stuff) from spammers (who want to tag their own stuff, but try to acquire credibility by copying tags others have used).   They try to exploit the difference that the people who tag first are [...]]]></description>
			<content:encoded><![CDATA[<p>From our friends in Southhampton (correction: and Hasso-Platner), a study of how to differentiate experts (who really know how to tag stuff) from spammers (who want to tag their own stuff, but try to acquire credibility by copying tags others have used).   They try to exploit the difference that the people who tag first are obviously not copying.  They compared their classifier to some obvious baselines, such as assigning expertise to those with the most tags.  Evaluating their classifier was tricky because there isn&#8217;t a ground-truth data set.   So they used a simulation, inserting a variety of different simulated experts and spammers into the tag stream of delicious, and checking how there classifier deals with them. Their classifier won.</p>
<p>Of course, you can only draw limited confidence from this kind of simulation.  Their simulated users fit their model of the world (spammers labeled late) so of course a tool designed to their model will do well on their simulated users.  I wonder, would it have been that hard to just do manual labeling of expertise on some real delicious users?  This would obviously give more trustable results than simulations.   Indeed, they found that by manual examination, the top 50 users of the tag &#8220;mortgage&#8221; were spammers.  However, they say that the problem was finding a good ground truth for experts.   But that suggests it would still be possible to evaluate differentiation of spammers from non-spammers, even if you can&#8217;t evaluate differentiation of experts.</p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/07/22/sigir09-telling-experts-from-spammers-expertise-ranking-in-folksonomies/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SIGIR09: simultaneously medeling semantics and structure of threaded discussions</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/07/20/sigir09-simultaneously-medeling-semantics-and-structure-of-threaded-discussions/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/07/20/sigir09-simultaneously-medeling-semantics-and-structure-of-threaded-discussions/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 19:26:53 +0000</pubDate>
		<dc:creator>David Karger</dc:creator>
				<category><![CDATA[Publication]]></category>
		<category><![CDATA[SIGIR]]></category>
		<category><![CDATA[Social Computing]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=357</guid>
		<description><![CDATA[A group from SMR asia is working on modeling threaded discussions.  Threaded discussions pervade IMs, chat rooms, web forums, and mailing lists.  They&#8217;re hierarchical.  This group wants to mine the semantics (discover the topics) and the structure (author-reply relationships). The applications include spam blocking, reply constructions (figuring out which specific posts other posts are replying [...]]]></description>
			<content:encoded><![CDATA[<p>A group from SMR asia is working on modeling threaded discussions.  Threaded discussions pervade IMs, chat rooms, web forums, and mailing lists.  They&#8217;re hierarchical.  This group wants to mine the semantics (discover the topics) and the structure (author-reply relationships). The applications include spam blocking, reply constructions (figuring out which specific posts other posts are replying to, which may not be clear if the system is linear like chat) and expert identification.  Oviously later posts often reply to earlier ones, but which one? They also hope to identify and remove chitchat and spam.</p>
<p>As for the model, they posit that each thread has several topics (which kind of contradicts the &#8220;pure&#8221; notion of thread, but is certainly true in practice).  Conversely, they assume each post in the thread is just a couple of topics.  They try to approximate each post as a linear combination of the (topics of the) previous posts, but a sparse one (only a few nonzeros to meet the idea that each post is narrow).   For training, they used forums like slashdot which do track replies to a specific comment.</p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/07/20/sigir09-simultaneously-medeling-semantics-and-structure-of-threaded-discussions/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Phishing Prevention as Social Computing</title>
		<link>http://groups.csail.mit.edu/haystack/blog/2009/05/25/phishing-prevention-as-social-computing/</link>
		<comments>http://groups.csail.mit.edu/haystack/blog/2009/05/25/phishing-prevention-as-social-computing/#comments</comments>
		<pubDate>Tue, 26 May 2009 00:11:45 +0000</pubDate>
		<dc:creator>Katrina Panovich</dc:creator>
				<category><![CDATA[Social Computing]]></category>

		<guid isPermaLink="false">http://groups.csail.mit.edu/haystack/blog/?p=335</guid>
		<description><![CDATA[Phishing has been around for a long time (by internet standards), but a new batch of phishing attempts on Facebook has been seeming to spread like wildfire.  Facebook is attempting to prevent some phishing scams, but many URLs, often with spaces in the middle, sneak through.
You can recognize these attempts because they often appear as [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://en.wikipedia.org/wiki/Phishing">Phishing</a> has been around for a long time (by internet standards), but a new batch of phishing attempts on Facebook has been seeming to spread like wildfire.  Facebook is attempting to prevent some phishing scams, but many URLs, often with spaces in the middle, sneak through.</p>
<p>You can recognize these attempts because they often appear as Facebook messages to you and 19 (or so) others, from someone you probably haven&#8217;t talked to in years.  They direct you to look at some link that requires a bit of extrapolation on your part &#8211; at least eliminating a space in the middle of the URL, or changing (dot) to an actual period.</p>
<p>Clicking through is exactly what you&#8217;d expect &#8211; a website looking like Facebook in all easily discernible ways, asking for your account name in password.  Since these URLs (seem to) come from friends, we&#8217;re likely to not spend an extra moment thinking about them.  One friend even got such a message from his advisor.  <a href="www.indiana.edu/~phishing/">Statistically speaking</a> you&#8217;re much more likely to click these links if they come from friends.</p>
<p>Research points to the most effective phishing teaching moment to be right after someone has fallen victim to a scam.  On Facebook, with instant notifications of these messages <em>and</em> the ability to possibly prevent new victims, there is a new social component to teaching.  I have a copy-and-paste-able message that I respond with by default:</p>
<blockquote><p>You got phished!  Don&#8217;t click the link.  Just to re-articulate &#8211; whenever you&#8217;re asked for a name and password to log in to something, check the domain name very carefully, even if the website looks right. something like facebook.com.com is not actually facebook, for instance! People are incredibly likely (statistically speaking) to fall victim to phishing scams like this one when they&#8217;re sent through friends, so treat these as carefully as you&#8217;d treat emails from mysterious Nigerian princes.</p></blockquote>
<p>Simply acting as one person alone, however, we can&#8217;t make as much of an educational dent as would be ideal.  There have <em>got</em> to be social computing ways to handle this.  Perhaps routing suspicious URLs to a few trusted &#8211; or even random &#8211; friends first, with the express skepticism required to catch these phishing attempts?  What do you think?</p>
]]></content:encoded>
			<wfw:commentRss>http://groups.csail.mit.edu/haystack/blog/2009/05/25/phishing-prevention-as-social-computing/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
