%% This is an example first chapter.  You should put chapter/appendix that you
%% write into a separate file, and add a line \include{yourfilename} to
%% main.tex, where `yourfilename.tex' is the name of the chapter/appendix file.
%% You can process specific files by typing their names in at the 
%% \files=
%% prompt when you run the file main.tex through LaTeX.
\chapter{Introduction}
\label{intro}

\section{Problem Description}

The World Wide Web (WWW) is a platform in which users can share their work very effectively. Due to the nature of the medium, content on the Web including text, images, and videos can be reused and remixed rapidly. Scientific research data, social networks, blogs, photo sharing sites and other such applications known collectively as the Social Web, and even general purpose Web sites, have lots of increasingly complex information. Such information from several Web pages can be very easily aggregated, mashed up and presented in other Web pages. Content generation of this nature inevitably leads to many copyright and license terms violations, motivating research into effective methods to detect and prevent such violations.

\section{Motivation}

Creative Commons (CC) provides a very clear and a widely accepted rights expression language \cite{hal08cc} using Semantic Web technologies that are used to compose a set of well-defined licenses. These licenses are both machine readable and human readable, and clearly indicates to a person who wishes to reuse content exactly how it should be used, by expressing the accepted use, permissions, and restrictions of the content. 

However, even with these human-friendly licenses, we can expect license violations to occur due to many factors: 
\begin{inparaenum}[\itshape a\upshape)]
\item Users may be ignorant as to what each of the licenses mean. 
\item Users may forget or be too lazy to check and include the proper license terms. 
\item Users may give an incorrect license which violates the original content creator's intention. 
And last but not least,  
\item malicious users might intentionally ignore the CC-license given to an original work in their own interests.  
\end{inparaenum}

% \begin{enumerate}
%\item Users may be ignorant as to what each of the licenses mean.
%\item Users may forget or be too lazy to check and include the proper license terms. 
%\item Users may give an incorrect license which violates the original content creator's intention. 
%\item Malicious users who may intentionally ignore the CC-license given to an original work in their own interests.  
%\end{enumerate}

Whatever the case may be, the original content creator would be interested in knowing when her licenses have been violated, on which Web pages and by whom. But given the scale and the nature of the Web, the knowledge of such license violations is highly unlikely unless the original content creator comes across those by chance. An assessment on Creative Commons Attribution License Violations on Flickr images on the Web, as discussed in Chapter \ref{assesment} revealed violation rates ranging from 70\%-90\%. 

On the other hand, people who want to reuse content may be interested in knowing whether a particular content item can be reused or not, and if it can be reused, the conditions for reuse. Therefore, license aware tools for easy content reuse, and validators to verify works against any license violations will lead to the path of least resistance in generating creative works on the Web. 

\section{The Need for Policy Awareness in Content Reuse}

Policies in general are pervasive in Web applications. They play a crucial role in enhancing security, privacy and usability of the services offered on the Web \cite{DBLP:conf/esws/BonattiDFNOPS06}. Information accountability provides another motivation to apply policies for data usage practices  \cite{info-account}. 
%On the semantic web, policy based systems may be implemented with a reasoner on rule-based systems, where the rules represent laws, licenses, or policies that relate to the system. For example, the AIR rule language is designed to express and enforce policies to provide reliable assessments of compliance with rules and policies governing the use of information \cite{DBLP:conf/policy/KagalHW08}. 
In this thesis we will limit the `policy awareness' aspect to licenses that can be expressed semantically, that are widely deployed on a range of media, and that have a large community base. CC licenses fit this description perfectly. Therefore, we will be focussing our attention on CC licenses, keeping in mind the possibility of extending the system we develop to support other types of licensing mechanisms.

%\subsubsection{How are License Metadata Expressed?}
\subsection{Exposing Licenses}

Typically there are two ways in which metadata about licenses can be exposed:

\begin{enumerate}
 \item Through APIs which expose the Licenses:
 
For example, Flickr allows users to specify the license associated with their images. These license information can then be queried through the Flickr API. This method is not very interoperable as API specific data wrappers have to be written for each service.
 
\item Through  Resource Description Framework in Attributes (RDFa)~\cite{rdfa}:

CC licenses can be expressed in machine readable form such as RDF \cite{rdf} using RDFa. The content creator and consumer can use RDFa for rights expression and compliance respectively.  RDFa allows machine understandable semantics to be embedded in the XHTML. 
%The CC License Validation Service \cite{ccval} can be used to validate if a CC license given by an original content creator is syntactically correct. 
\end{enumerate}

\section{Tools Developed}

Research undertaken in this thesis focuses on methods for detecting and helping users avoid license violations. We have developed several tools to achieve this.

\begin{enumerate}
  \item \textbf{License Violations Validator} : \emph{to verify your own work for attribution license violations.}
  
People who create works that may use several hundred or so other sources would be interested in knowing whether they have violated anybody else's CC license terms, by misattributing or by not attributing the original content creator. In such cases, a \emph{validator} which checks for CC license violations of content would be very useful. This will be analogous to the validators Web developers use to check whether their XHTML is valid by using the \emph{W3C Markup Validation Service}, or semantic data producers checking to see if their data is in proper RDF syntax by using the \emph{W3C RDF Validation Service}. Using such a tool, content reusers can rectify the instances where they have inadvertently violated the CC licenses before they publish their work.

  \item \textbf{Semantic Clipboard:} \emph{for users to seamlessly reuse content on the Web while integrating the license metadata in a policy aware manner.}
  
This is designed to address the problem of users being lazy to check the license or inadvertently giving a wrong license or attribution information. Semantic Clipboard allows users to copy images, but adds the license metadata to the copied image so that the secondary work will be license compliant automatically.
\end{enumerate}

Both these tools are built upon the Creative Commons  Rights Expression Language (ccREL) \cite{hal08cc}. There is no attempt to enforce the rights associated with content as in Digital Rights Management (DRM). The violator will not be automatically prevented if the license terms are violated.  It merely guides the user as to how best the content should be reused, making sure that the policies governing content usage are properly adhered to. 


\section{Thesis Overview}

Chapter \ref{intro} has given an introduction to the problem that is addressed in this thesis. The rest of the thesis is structured as follows:

Chapter \ref{background} gives the background and an overview of the technologies used for policy aware content reuse.

Chapter \ref{assesment} outlines the experiment conducted to assess the level of Creative Commons attribution license violations on the Web, and the results of that experiment.

Chapter \ref{validator} gives the implementation details of the ``Attribution License Violations Detector and Validator" for Flickr images.

Chapter \ref{clipboard} gives the implementation details of the ``Semantic Clipboard".

Chapter \ref{related} discusses related work in this area.

Chapter \ref{summary} gives a summary of the contributions, challenges and the future work.

Appendix \ref{appa} gives the scenario encoded in the AIR rule language and the output from the AIR reasoner.

Appendix \ref{appb} gives links to the source code, documentation and demos of the tools described in this thesis.
