PICS Label Distribution

Label Syntax and Communication Protocols

by Tim Krauskopf (timk@spyglass.com), Jim Miller (jmiller@w3.org), Paul Resnick (presnick@research.att.com), and Win Treese (treese@OpenMarket.com)

Revision 1, DRAFT 2 Last modified on Sat. Oct 28 1995 by Resnick

Overview

This document has been prepared for the technical subcommittee of PICS (Platform for Internet Content Selection). It defines a general format for labels that permits them to be embedded in RFC-822 style headers. It defines four methods by which PICS labels may be transmitted:
IN A DOCUMENT
One or more labels may be embedded in a document. We specify the format and note in particular how to use a META tag to embed labels in html documents.
WITH A DOCUMENT
An http client can request that labels be sent along with a document. A server can satisfy the request, by sending the labels in http headers.
SEPARATELY
A client can request labels from a label server that runs the http protocol. The labels may refer to items available through protocols other than http, such as ftp, gopher, or netnews.
POSTING
A client may post labels to a label server that runs the http protocol. The labels may refer to items avaiable through prtocols other than http.

General Format

A label consists of a service identifier, label options, and a rating. The service identifier tells who issued the rating. Label options give additional properties of the document being rated as well as the rating itself, such as the time the document was rated. The rating itself is a set of attribute-value pairs that describe a document along several dimensions. The general format of a label is
         service=(options...; rating=(...);)
Label options are as follows:
on = HTML-date
The date on which this rating was issued.
until = HTML-date
The date on which this rating expires.
for = [*]quotedURL
The URL of the item to which this rating applies. * indicates that the label applies to all items matching the URL, which is useful for assigning a rating to a site or directory.
by = quotedURL
The URL of the entity rating the item.
at = HTML-date
The last modification date of the item to which this rating applies at the time the rating was assigned. This can serve as a less expensive, but less reliable, alternative to the message integrity check (MIC) options.
mic-md5 = Base64-string
A message integrity check (MIC) of the item being rated. The MD5 Message Digest Algorithm is used to compute the MIC.
complete-label = quotedURL
Dereferencing this URL returns a complete label that can be used in place of the currentone. The complete label has values for as many attributes as possible. This is used when a short label is transmitted for performance purposes but additional information is also available.

Example

For example, a label that uses the example rating system from the document, "Rating Services and Rating Systems", might be as follows:
     "http://www.gcf.org"=(
          on="05 Nov 1994 08:08:08 -0500";
          until="31 Dec 1995 23:59:59 -0000";
          for="http://www.gcf.org/index.html";
          by="mailto:rating-authority@gcf.org";
          rating=(suds=0.5; density=none; color/hue=red);
      )

Detailed Syntax

The following grammar, in modified BNF, describes the syntax of labels. The methods by which labels are embedded in specific protocols are detailed below.

Notes:

  1. Whitespace is ignored except in quoted strings.
  2. The strings in transmit-name-lists are case insensitive. All other strings are case sensitive.
  3. Additional options may be added over time. For experimental purposes, options with names beginning "x-" may be added at any time without prior arrangment. Extending the options that are formally part of this specification requires an additional consensus process before adoption.
  4. This specification requires the use of US-ASCII. Note that the document, "Rating Systems and Rating Services," describes how a service can map the US-ASCII transmit-as names to descriptive strings using other character sets.
labellist :: label [';' labellist]
label :: service '=' '(' [optionlist ';'] 'rating = (' rating ');' ')'
service :: quotedURL
quotedURL :: '"' URL '"' 
optionlist :: option [ ';' optionlist ]
option ::  'on =' quoted-date
	|  'until=' quoted-date
        |  'at=' quoted-date
        |  'for=' quotedURL
        |  'complete-label=' quotedURL
        |  'mic-md5=' base64-string
quoted-date :: 
rating :: avlist
avlist :: avitem [';' alist]
avitem :: attribute '=' value
attribute :: transmit-name-list
transmit-name-list :: quotedshortname ['/'transmit-name-list] 
value :: number
number :: [sign]unsignedint['.' [unsignedint]]
sign :: '+' | '-'
unsignedint :: [1*n][0-9]
quotedshortname :: ' " ' [1*n]extendedalphanum ' " '
extendedalphanum :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '+' | '-'

base64-string is as defined in RFC-1421.
URL is as defined in RFC-1738 for URLs.

RFC-822 Headers

Many protocols, such as Internet electronic mail, the HyperText Transfer Protocol, and USENET News, use ASCII headers as described in RFC-822. For use in such protocols, we define a new header, PICS-Label, used to contain the labels described in this document. The syntax is:
PICS-Label: 
where label is described according to the syntax above. Continuation lines beginning with whitespace may be used as defined in RFC-822.

Embedding Labels in HyperText Markup Language (HTML)

Labels may be embedded in HTML files as meta-information, using the META element defined in the HTML specification. This embedding usually takes one of two forms:
  1. Using the HTTP header equivalency mechanism, which may be used by an HTTP server to generate a header:
            http-equiv="PICS-Label" content='...'
           
  2. Using the name mechanism, which may be parsed directly by the receiver:
            name="PICS-Label" content='...'
           
    (Note that the content attribute use single quotes, because the label syntax uses double quotes.)

Sending Labels With A Document

When an http server sends a document to a client, it sends additional headers as well. We specify how the client can request that one or more labels be included in one of those headers.

The following example illustrates a typical exchange:

Client sends to http server www.greatdocs.com:

GET foo.html HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
      negotiable={services={http://www.gcf.org/ratings}}

Server responds to client:

HTTP/1.0 200 OK
Date: Thursday, 30-Jun-95 17:51:47 GMT
MIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: PICS/1.0 scope=any str=opt id=pics headers={PICS-Label PICS-Status}
Content-Encoding: pics
PICS-Label: "http://www.gcf.org/ratings"=(
          on="05 Nov 1994 08:08:08 -0500";
          until="31 Dec 1995 23:59:59 -0000";
          for="http://www.greatdocs.com/foo.html";
          by="mailto:rating-authority@gcf.org";
          rating=(suds=0.5; density=none; color/hue=red);
      )PICS-Status: OK
Content-type: text/html

...contents of foo.html...

Explanation of example

The client requests that document foo.html be sent back. In addition, the client requests the rating of that document from service, "http://www.gcf.or/ratings". The request follows the PEP (Protocol Extension Protocol) syntax for extensions to the http protocol. The PEP syntax is currently under development at W3C. It allows for more organized extension to http than the current method of merely adding extra header-fields.

The server responds by sending back the label, in the Content-Label: header, as well as the document. The PICS-Status: header says that the label request has been fulfilled. It would not be appropriate to signal an http error if the document is available but the ratings request has not been fulfilled. The PICS-Status header contains that information. The Protocol: header is included to conform to PEP, as is the addition of "pics " to the beginning of the Content-Encoding: field. In this case, there is no other content-encoding, and so "pics " is the only things that appears in that field.

Detailed Syntax and Semantics of HTPP Requests for Labels With Document

The following grammar, in modified BNF, describes the syntax of the additional header line to be included in an HTTP request for a document and associated labels.
accept-header :: 'Accept-Protocol: PICS/1.0 scope=any rx-str=opt negotiable={' services ['embedded'] '}'
services :: 'services={' 1*(quotedURL ';' ) '}'

The services specify one or more labeling services for which the client is requesting a label for the document.

When the 'embedded' flag is set, the server is requested to prefetch labels for all URLs that are referenced in the current document. That is, the server is requested to look inside the document requested, extract the URLs, and send the labels associated with those URLs. If fulfilled, the request may permit the client to display links differently depending on how they are labeled. Servers may, however, choose to ignore this part of the request.

Detailed Syntax and Semantics For HTTP Response Headers

Three additional headers are specfied: One header is modified:

Sending Labels Separately

PICS labels can also be retrieved separately from the documents they refer to. The protocol is an extension of HTTP, meaning that a label server will use HTTP to respond to reuests, even if the labels themselves describe ftp, gopher, or other non-http sites and documents.

Requests to stand-alone label servers share some features with the requests specified above for delivery of labels along with documents. In particular, the method of specifying which rating services' labels are desired is identical in the two methods.

Specifying which URL(s) to get labels for

The GET command normally includes a requested-URL. When requesting labels separately from a document, this URL actually specifies the label server rather than a document. In addition, it is necessary to specify the URL(s) for which the client desires labels. This is done through the usual "?" syntax.

It is acceptable to ask for labels describing a site, or a subdirectory at a site, rather than just an individual document. It is also acceptable to ask for labels of all the documents at a site or in a subdirectory. The syntax must distinguish between these two types of requests.

Sample Queries

The general format of the query is
         ?url=[*]"..."url=[*]"..."...
The following sample request, made to the http server http://www.labels.org, is illustrative:

GET /ratings?url=*"http://www.questionable.org/images" HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
      negotiable={services={http://www.gcf.org/ratings}}

The query asks the label server www.label.org/ratings to send a single label that applies to everything in the images directory at site www.questionable.org. The desired label should come from the service http://www.gcf.org/ratings

The label server responds by sending back headers only, with no document. The headers follow the syntax and semantics described above for label transmission with a document.

Detailed Syntax and Semantics of HTTP Query for Labels Separate >From Documents

The following grammar, in modified BNF, describes the syntax of the query that follows the URL of the label server.
query :: '?' 1*(request)
request :: ['*']quotedURL['*']

Posting Labels to a Label Server

To aid organizations in collecting ratings, we specify how a client can submit new ratings to a rating server.

To be filled in. Basically, do a POST to the label server. Include the Accept-Protcol: line. Send the ratings as a header. We could send the ratings in the body, but we might as well be consistent with how we send them in the other distribution methods. We will need to define response codes to go in a response header.

Why HTTP For Label Servers

Instead of extending HTTP, we considered proposals for special-purpose label transport protocols. Before making a final decision, we constructed the following lists of pros and cons.

Advantages of Using HTTP

Advantages of Creating a New Protocol Instead of Using HTTP

FAQ - Frequently Asked Questions

Why is there no ftp, gopher, or netnews protocol for requesting labels along with a document?

Labels can be sent as additional headers in any protocol that employs RFC-822 style headers. We have not yet determined, however, convenient extensions to protocols other than http to permit requests that ask for labels from specific services. We may specify such extensions in the future.

How do you get labels for items on FTP, Gopher, or netnews servers? Are we forcing all FTP implementations to implement all of HTTP as well?

FTP, Gopher, and netnews servers need not distribute PICS labels. Labels for items on such servers can be retrieved from an HTTP-based label server.

The PICS premise is that all compliant client will have to implement some new protocol. The subset of HTTP which would be required for obtaining a PICS label can be minimal (see section above on minimal implementation). HTTP will be no more difficult to implement in an FTP (or other) client than a brand-new protocol which accomplishes similar features.

Can existing HTTP servers be used as PICS label servers?

Using CGI scripts, or with a small amount of added code in the HTTP server, an existing HTTP server can be configured to access a database of labels and return that information coded as additional HTTP Headers. Most of the work is in the lookup and formatting of the labels themselves, not the modifications to HTTP.

How do I design a really fast PICS server? Won't the overhead be too much?

HTTP already explicitly defines the minimum fields required and then what rules must be followed when additional information is useful to the transaction. For example, HTTP does not require that clients provide "Accept:" headers to indicate preferred MIME types for the content, but if they are provided, servers can match up available formats with the client's request. It may be designed to optimize throughput or to optimize the appearance of the result, or to adjust to the client software's preference.

If you minimize the server's response to one line, plus the label information, you are already dealing with the minimum amount of data transfer possible to obtain a label. In addition, most performance issues for PICS will probably be addressed with caching, not by reducing lookup time for a single label. Caching optimization requires meta-data which can be easily encoded within HTTP headers.

How can we keep the PICS extensions from getting tied up in HTTP standardization?

The management of header extensions for HTTP has been an issue of discussion and work by the HTTP group for some time. The HTTP specification lays down specific rules for the handling of extensions which guarantee that those extensions will not be made invalid by any revisions of HTTP itself. In addition, the W3C is working on a system for managing and negotiating HTTP extensions even more intelligently.

The worst risk seems to be that HTTP could be upgraded to a new revision level forcing some HTTP implementations to support multiple versions (1.0 and 2.0, for example) or forcing some PICS servers to update their protocol as well. Hopefully a major update in HTTP would bring enough benefits for PICS to make any update worthwhile.

What is PEP and Why is PICS Using It?
What if PEP Does Not Catch On?

If the general extension mechanism specified by PEP does not catch on, PICS servers will need to look for the specific header line beginning Accept-Protocol: PICS/1.0 and parse it to determine the rating request. PICS clients will need to look for and parse the specific header lines PICS-Label and PICS-Status. We will have to hope that no other group tries to extend HTTP in a way that uses headers named PICS-Label or PICS-Status.

References

To be added.