\chapter{Related Work}
\label{related}

\section{License Detection Tools}

Popular search engines including  Google, Yahoo  and  even  sites such  as  Flickr, blip.tv, OWL  Music Search and SpinXpress have advanced search options to find CC licensed content on the Web \cite{google_cc_search, yahoo_cc_search, flickrapi, bliptv, owlmusicsearch, spinxpress}.  In addition to these search engines, there are many tools that extract RDF from Web pages marked up with RDFa. RDFa Distiller \cite{dist} is one such RDFa parser. Even the CC License \emph{\textbf{Syntax}} Validation Service \cite{ccval} can be used to parse documents for embedded licenses in RDFa. After parsing the document, this service gives a list of licensed objects and each of their license authorship, version, jurisdiction, whether the license has been superseded or deprecated and whether the work is allowed in free cultural works. However, it does not give to whom the attribution should be given when reusing these license objects like the attribution license violations validator discussed in Chapter \ref{validator}.

CC has put much focus on coming up with ways to enable tool builders to use the CC licenses very effectively. For example, the \emph{``live box"} on the \emph{License Deed Page} suggests how to attribute a particular work. This \emph{``live box"} is created when a CC license hyperlink that has the \emph{attributionName} and the \emph{attributionURL} properties to the \emph{License Deed Page} is dereferenced. Javascript code in the that page will scrape RDFa metadata from the referring page and construct the attribution XHTML as shown in Figure \ref{fig-cc-deed-box}. There are also several license aware Mozilla Firefox extensions developed by the CC. MozCC \cite{mozcc} is one such tool. It provides a specialized interface for displaying CC licenses, where the user receives visual cues when a page with RDFa metadata is encountered. This includes the display of specific CC branded icons in the browser status bar when the metadata indicates the presence of a CC license. This extension is not supported in the latest versions of Firefox, and does not offer the capability to copy the license attribution XHTML as in the Semantic Clipboard that we have developed. 

\begin{figure}[!h]
  \centerline{\epsfig{file=images/deed_referrer.jpg, width=1\linewidth}}
  \caption{CC Deed Page Displaying the Attribution XHTML}
  \label{fig-cc-deed-box}
\end{figure}

Operator \cite{operator} is another Firefox browser extension that detects micro-formats and RDFa in Web pages that the user visits. Using Operator, it is possible to write an `action script' that finds all CC licensed content inside a Web page by looking at the RDFa syntax. However, similar to the MozCC Firefox Extension, this cannot also be used to copy license information with the content as in the Semantic Clipboard.

\section{License Embedding Tools}

There are several tools which can be used to automatically embed the license metadata from Flickr. Applications such as ThinkFree, a Web based commercial office suite \cite{thinkfree}, and the open source counterpart of it, the ``Flickr image reuse for OpenOffice.org" \cite{flickroo} are few examples of such applications. These applications allow the user to directly pick an image from the Flickr Web site and automatically inject the license metadata with it into a document in the corresponding \emph{office suite}. A severe limitation of this approach is that they only support Flickr images. The Semantic Clipboard can be used to copy any image in to any target document with the license as long as the license metadata is expressed in RDFa.

The python library `liblicense'  provides a straight forward way for developers to build license-aware applications by utilizing a pluggable module system for reading and writing metadata from specific file types \cite {liblicense}. This allows for the extraction and writing license information for files in GTK \footnote[1]{GTK is a tool kit for building graphical user interfaces.} applications. 
License Tagger \cite{licensetagger} is built using this library and allows a user to add license metadata to audio, video, text and images. This application works for desktop applications and requires another application to interpret the embedded license metadata. However with the Semantic Clipboard, as long as the license metadata is expressed in RDFa, the content can be reused and embedded in another file in a license aware manner.

\section{Transferring Metadata with Content}

There is a tool called `News Credit' \cite{newscredit}  developed by the Media Standards Trust that was developed with the aim of making online news transparent. This tool embeds micro-formats with some specific enhancements to allow journalists to specify basic information to their news articles online. This helps establish an article's authorship and provenance.

As mentioned in Chapter \ref{clipboard}, there has been some work on annotating XHTML documents with provenance metadata using RDFa \cite{harvey-thesis}. This work presents a method for performing copy and paste operations on XHTML documents in a way that preserves the metadata. This tool also incorporates a Creative Commons reasoning engine that reads document metadata and makes licensing decisions for annotated documents.

We also find inspiration in digital photos and their Exchangeable Image File Format (EXIF) \cite{exif} and the Extensible Metadata Platform (XMP) standard \cite{xmp}. This information describes the photo, is embedded inside the photo itself, and is readable using simple tools. Therefore, it would be possible to embed license information inside the photo as well along with the other metadata. But this information can be easily overwritten should by a malevolent reuser.

\section{Commercial Applications for Detecting Violations}
\label{police}

Attributor \cite{attributor}, a commercial application, claims to continuously monitor the Web for its customers' photos, videos, documents and to let them know when they have been used elsewhere on the Web. Then it offers to send notices to the offending Web sites notifying link request, offers for license, request for removal or a share of the advertisement revenue of that page.  Another commercial application called PicScout \cite{picscout} claims that it is currently responsible for detecting over 90\% of all online image infringements detections. They also claim to provide the subscribers of their services with a view into where and how their images are being used online. 

The problem with these services is that it penalizes the infringers after-the-fact, rather than encouraging the them to do the right thing upfront \cite{attributor-prob}. Since their implementations are based on bots that crawl the Web in search of infringes, these services take up valuable Internet bandwidth \cite{picscout-prob}. Also, these services are not free, which bars many content creators who wish to use such services to find license violations of their content from using the service.