PICS Label Distribution

Label Syntax and Communication Protocols

by Tim Krauskopf (timk@spyglass.com), Jim Miller (jmiller@w3.org), Paul Resnick (presnick@research.att.com), and Win Treese (treese@OpenMarket.com)

Revision 1, DRAFT 4 Last modified on Sun. Oct 29 1995 by JMiller and PResnick

Overview

This document has been prepared for the technical subcommittee of PICS (Platform for Internet Content Selection). It defines a general format for labels that permits them to be embedded in RFC-822-style headers. It defines four methods by which PICS labels may be transmitted:
In a document
One or more labels may be embedded in a document. We specify the format and note in particular how to use a META tag to embed labels in html documents.
With a document
An http client can request that labels be sent along with a document. A server can satisfy the request, by sending the labels in http headers.
Separately
A client can request labels from a label server that runs the http protocol. The labels may refer to items available through protocols other than http, such as ftp, gopher, or netnews.
Posting
A client may post labels to a label server that runs the http protocol. This protocol will help rating services to collect new ratings from volunteers or paid staff.

General Format

A label consists of a service identifier, label options, and a rating. The service identifier tells who issued the rating. Label options give additional properties of the document being rated as well as the rating itself, such as the time the document was rated. The rating itself is a set of attribute-value pairs that describe a document along several dimensions. The general format of a label is
         (PICS-1.0 
          (rating-service "<URL>")
          [option...]
          (ratings (<category> <value>) ...))
Label options are as follows:
(on ISO-date)
The date on which this rating was issued.
(until ISO-date)
The date on which this rating expires.
(for ['*'] quotedURL)
The URL of the item to which this rating applies. * indicates that the label applies to all items matching the URL (i.e., items whose URLs contain this URL as a prefix), which is useful for assigning a rating to a site or directory.
(by quotedURL)
The URL of the entity rating the item.
(at ISO-date)
The last modification date of the item to which this rating applies, at the time the rating was assigned. This can serve as a less expensive, but less reliable, alternative to the message integrity check (MIC) options.
(mic-md5 "Base64-string")
A message integrity check (MIC) of the item being rated. The MD5 Message Digest Algorithm is used to compute the MIC.
(complete-label quotedURL)
Dereferencing this URL returns a complete label that can be used in place of the current one. The complete label has values for as many attributes as possible. This is used when a short label is transmitted for performance purposes but additional information is also available.

Example

For example, a label that uses the example rating system from the document PICS Ratings Services and Ratings Systems might be as follows:
     (PICS-1.0
      (rating-service "http://www.gcf.org")
      (on "1994-11-05T08:15:23-0500")
      (until "1995-12-31T23:59:59-0000")
      (for "http://www.gcf.org/index.html")
      (by "mailto:rating-authority@gcf.org")
      (ratings ((suds 0.5) (density 0) (color/hue 1))))

Detailed Syntax

The following grammar, in modified BNF, describes the syntax of labels. The methods by which labels are embedded in specific protocols are detailed below.

Notes:

  1. Whitespace is ignored except in quoted strings.
  2. The string in a transmit-name is case insensitive. All other strings are case sensitive.
  3. Option names ("on," "until," "at," etc) are case insensitive.
  4. Additional options may be added over time. For experimental purposes, options with names beginning "x-" may be added at any time without prior arrangment. Extending the options that are formally part of this specification requires an additional consensus process before adoption.
  5. This specification requires the use of US-ASCII. Note that the document PICS Ratings Services and Ratings Systems describes how a service can map the US-ASCII transmit-names to descriptive strings using other character sets.
labellist :: label [labellist]
label :: '(PICS-1.0' service option* '(' 'ratings' rating+ ')' ')'
service :: '(' quotedURL ')' 
quotedURL :: '"' URL '"' as described and extended in
     PICS Ratings Services and Ratings Systems.
option ::  '(' 'on' quoted-ISO-date ')'
        |  '(' 'until' quoted-ISO-date ')'
        |  '(' 'at' quoted-ISO-date ')'
        |  '(' 'for' ['*']quotedURL ')'
        |  '(' 'complete-label' quotedURL ')'
        |  '(' 'mic-md5' base64-string ')'
quoted-ISO-date :: '"'YYYY'-'MM'-'DD'T'hh':'mm':'ssStz'"'
     based on the ISO 8601:1988 date and time standard, restricted
     to the specific form described here:
     YYYY :: four-digit year
     MM :: two-digit month (01=January, etc.)
     DD :: two-digit day of month (01 through 31)
     hh :: two digits of hour (00 through 23) (am/pm NOT allowed)
     mm :: two digits of minute (00 through 59)
     ss :: two digits of second (00 through 59)
     S  :: sign of time zone offset from UTC ('+' or '-')
     tz :: four digit amount of offset from UTC
               (e.g., 1512 means 15 hours and 12 minutes)
     For example, "1994-11-05T08:15:23-0500" is a valid quoted-ISO-date
     denoting November 5, 1994, 8:15:23 am, Eastern Standard Time
rating :: '(' transmit-name number ')'
transmit-name :: ' " ' [1*n]extendedalphanum ['/'transmit-name] ' " '
value :: number
number :: [sign]unsignedint['.' [unsignedint]]
sign :: '+' | '-'
unsignedint :: [1*n][0-9]
quotedshortname :: ' " ' [1*n]extendedalphanum ' " '
extendedalphanum :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '+' | '-'

base64-string is as defined in RFC-1421.

RFC-822 Headers

Many protocols, such as Internet electronic mail, the HyperText Transfer Protocol, and USENET News, use ASCII headers as described in RFC-822. For use in such protocols, we define a new header, PICS-Label, used to contain the labels described in this document. The syntax is:
PICS-Label: <labellist>
where labellist is described according to the syntax above. Continuation lines beginning with whitespace may be used following the specification given in RFC-822.

Embedding Labels in HyperText Markup Language (HTML)

Labels may be embedded in HTML files as meta-information, using the META element defined in the HTML specification. This embedding takes one of two forms:
  1. Using the HTTP header equivalency mechanism, which may be used by an HTTP server to generate a header:
           <META>http-equiv="PICS-Label" content='labellist'</META>
           
  2. Using the name mechanism, which may be parsed directly by the receiver:
           <META>name="PICS-Label" content='labellist'</META>
           
(Note that the content attribute uses single quotes, because the label syntax uses double quotes.)

Sending Labels With A Document

When an http server sends a document to a client, it sends additional headers as well. We specify how the client can request that one or more labels be included in one of those headers.

Example

Client sends to http server www.greatdocs.com:

GET foo.html HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
      negotiable={services={"http://www.gcf.org/ratings"}}

Server responds to client:

HTTP/1.0 200 OK
Date: Thursday, 30-Jun-95 17:51:47 GMT
MIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: PICS/1.0 scope=any str=opt id=pics headers={PICS-Label PICS-Status}
Content-Encoding: pics
PICS-Label:
 (PICS-1.0
  (rating-service "http://www.gcf.org")
  (on "1994-11-05T08:15:23-0500")
  (until "1995-12-31T23:59:59-0000")
  (for "http://www.gcf.org/index.html")
  (by "mailto:rating-authority@gcf.org")
  (ratings ((suds 0.5) (density 0) (color/hue 1))))
PICS-Status: OK
Content-type: text/html

...contents of foo.html...

Explanation of example

The client requests that document foo.html be sent back. In addition, the client requests the rating of that document from service "http://www.gcf.or/ratings". The request follows the PEP (Protocol Extension Protocol) syntax for extensions to the http protocol. The PEP syntax is currently under development at W3C. It allows for more organized extension to http than the current method of merely adding extra header-fields.

The server responds by sending back the label, in the PICS-Label header, as well as the document. The PICS-Status header indicates the status of the label request. It would not be appropriate to signal an http error if the document is available but the ratings request has not been fulfilled. In this case, the PICS-Status header confirms that the request has been fulfilled. The Protocol header is included to conform to PEP, as is the addition of "pics " to the beginning of the Content-Encoding field. In this case, there is no other content-encoding, and so "pics " is the only things that appears in the Content-Encoding field.

Detailed Syntax and Semantics of HTPP Requests for Labels With Document

The following grammar, in modified BNF, describes the syntax of the additional header line to be included in an HTTP request for a document and associated labels.
accept-header :: 
 'Accept-Protocol: PICS/1.0 scope=any rx-str=opt
  negotiable={' services ['embedded'] '}'
services :: 'services={' quotedURL 1*(';' quotedURL) '}'
Notes on the syntax:

Detailed Syntax and Semantics For HTTP Response Headers

Three additional headers are specfied: One header is modified:

Notes on the response syntax:

Requesting Labels Separately

PICS labels can also be retrieved separately from the documents to which they refer. The protocol is an extension of HTTP, meaning that a label server will use HTTP to respond to requests, even if the labels themselves describe ftp, gopher, or other non-http sites and documents.

Requests to stand-alone label servers share some features with the requests specified above for delivery of labels along with documents. In particular, the method of specifying which rating services' labels are desired is identical in the two methods.

A GET command includes a requested-URL. When requesting labels separately from a document, this URL specifies the label server rather than the document for which a label is desired.

As with requests for labels along with documents, separate requests for labels rely on the Accept-Protocol header. In addition to specifying the desired rating services, it is also necessary to specify the URL or URLs for which labels are desired.

Sample Query

The following sample request, made to the http server http://www.labels.org, is illustrative:

GET /ratings HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
      negotiable={services={"http://www.gcf.org/ratings");
                  urls={*"http://www.questionable.org/images"}
                 }

The query asks the label server www.label.org/ratings to send a single label that applies to everything in the images directory at site www.questionable.org. The desired label should come from the service http://www.gcf.org/ratings.

The label server responds by sending back headers only, with no document. The headers follow the syntax and semantics described above for label transmission with a document.

Detailed Syntax and Semantics of HTTP Query for Labels Separate >From Documents

The following grammar, in modified BNF, describes the syntax of the Accept-Protocol header that the client sends to the server.
accept-header :: 
 'Accept-Protocol: PICS/1.0 scope=any rx-str=opt
  negotiable={' services  ['embedded'] ';' urls '}'
services :: 'services={' quotedURL 1*(';' quotedURL) '}'
urls :: 'urls={' [labelURL 1*(';' labelURL) '}'
labelURL :: ['*'] quotedURL ['*']

Notes on the syntax:

Posting Labels to a Label Server

To aid organizations in collecting ratings, we specify how a client can submit new ratings to a rating server.

This section has not been written yet. The basic idea is to do a POST to the label server. Include the Accept-Protocol: line. Send the ratings as a header. We could send the ratings in the body, but we might as well be consistent with how we send them in the other distribution methods. We will need to define response codes to go in a response header.

Why HTTP For Label Servers

Instead of extending HTTP, we considered proposals for special-purpose label transport protocols. Before making a final decision, we constructed the following lists of pros and cons.

Advantages of Using HTTP

Advantages of Creating a New Protocol Instead of Using HTTP

FAQ - Frequently Asked Questions

Why is there no ftp, gopher, or netnews protocol for requesting labels along with a document?

Labels can be sent as additional headers in any protocol that employs RFC-822 style headers. We have not yet determined, however, convenient extensions to protocols other than http to permit requests that ask for labels from specific services. We may specify such extensions in the future.

How do you get labels for items on FTP, Gopher, or netnews servers? Are we forcing all FTP implementations to implement all of HTTP as well?

FTP, Gopher, and netnews servers need not distribute PICS labels. Labels for items on such servers can be retrieved from an HTTP-based label server.

The PICS premise is that all compliant clients will have to implement some new protocol. The subset of HTTP which would be required for obtaining a PICS label can be minimal. HTTP will be no more difficult to implement in an FTP (or other) client than a brand-new protocol that provides similar features.

Can existing HTTP servers be used as PICS label servers?

Using CGI scripts, or with a small amount of added code in the HTTP server, an existing HTTP server can be configured to access a database of labels and return that information coded as additional HTTP Headers. Most of the work is in the lookup and formatting of the labels themselves, not the modifications to HTTP.

How do I design a really fast PICS server? Won't the overhead be too much?

HTTP already explicitly defines the minimum fields required and then what rules must be followed when additional information is useful to the transaction. For example, HTTP does not require that clients provide "Accept:" headers to indicate preferred MIME types for the content, but if they are provided, servers can match up available formats with the client's request. An HTTP server may be designed to optimize throughput or to optimize the appearance of the result, or to adjust to the client software's preference.

If you minimize the server's response to one line, plus the label information, you are already dealing with the minimum amount of data transfer possible to obtain a label. In addition, most performance issues for PICS will probably be addressed with caching, not by reducing lookup time for a single label. Caching optimization requires meta-data which can be easily encoded within HTTP headers.

How can we keep the PICS extensions from getting tied up in HTTP standardization?

The management of header extensions for HTTP has been an issue of discussion and work by the HTTP group for some time. The HTTP specification lays down specific rules for the handling of extensions which guarantee that those extensions will not be made invalid by any revisions of HTTP itself. In addition, the W3C is working on a system (PEP) for managing and negotiating HTTP extensions even more intelligently.

The worst risk seems to be that HTTP could be upgraded to a new revision level forcing some HTTP implementations to support multiple versions (1.0 and 2.0, for example) or forcing some PICS servers to update their protocol as well. Hopefully a major update in HTTP would bring enough benefits for PICS to make any update worthwhile.

What is PEP and Why is PICS Using It?

The Protocol Extension Proposal from the World Wide Web Consortium uses a trio of header fields (Protocol, Accept-Protocol, and Content-Encoding) to allow a HTTP client and server to do sophisticated negotiation about the set of header fields and their meanings. It is being proposed for use in http 1.2 and http-ng, and is currently under careful scrutiny by the W3C Security Editorial Board to make sure that it contains the features necessary to provide security for general document transmission as well as electronic payments.

PICS faces many of the same problems that face the security and electronic payment community. In PICS the issue revolves around the ability for the client to tell the server from which labeling services it would like to have labels. This is a simple negotiation problem of the kind PEP was designed to solve. Rather than invent an orthogonal mechanism it seemed best to use one that is already being proposed and investigated.

What if PEP Does Not Catch On?

If the general extension mechanism specified by PEP does not become a generic feature of HTTP servers, PICS servers will need to look for the specific header line beginning Accept-Protocol: PICS/1.0 and process it to determine the rating request. PICS clients will need to look for and process the specific header lines PICS-Label and PICS-Status. We will also have to hope that no other group tries to extend HTTP in a way that uses headers named PICS-Label or PICS-Status.

References

To be added.