Revision 1, DRAFT 2
Last modified on Sat. Oct 28 1995 by Resnick
Overview
This document has been prepared for the technical subcommittee of
PICS (Platform for Internet Content Selection). It defines a general
format for labels that permits them to be embedded in RFC-822 style
headers. It defines four methods by which PICS labels may be
transmitted:
IN A DOCUMENT
One or more labels may be embedded in a document. We specify the
format and note in particular how to use a META tag to embed labels
in html documents.
WITH A DOCUMENT
An http client can request that labels be
sent along with a document. A server can satisfy the request, by
sending the labels in http headers.
SEPARATELY
A client can request labels from a label server
that runs the http protocol. The labels may refer to items available
through protocols other than http, such as ftp, gopher, or netnews.
POSTING
A client may post labels to a label server that
runs the http protocol. The labels may refer to items avaiable
through prtocols other than http.
General Format
A label consists of a service identifier, label options,
and a rating. The service identifier tells who issued the
rating. Label options give additional properties of the document
being rated as well as the rating itself, such as the time the
document was rated. The rating itself is a set of attribute-value
pairs that describe a document along several dimensions.
The general format of a label is
service=(options...; rating=(...);)
Label options are as follows:
on = HTML-date
The date on which this rating was issued.
until = HTML-date
The date on which this rating expires.
for = [*]quotedURL
The URL of the item to which this rating applies. * indicates
that the label applies to all items matching the URL, which is useful
for assigning a rating to a site or directory.
by = quotedURL
The URL of the entity rating the item.
at = HTML-date
The last modification date of the item to which this rating
applies at the time the rating was assigned. This can serve as a
less expensive, but less reliable, alternative to the message
integrity check (MIC) options.
mic-md5 = Base64-string
A message integrity check (MIC) of the item being rated.
The MD5 Message Digest Algorithm is used to compute the MIC.
complete-label = quotedURL
Dereferencing this URL returns a complete label that can be used
in place of the currentone. The complete label has values for as many
attributes as possible. This is used when a short label is transmitted
for performance purposes but additional information is also available.
Example
For example, a label that uses the example rating system from
the document, "Rating Services and Rating Systems", might be as follows:
"http://www.gcf.org"=(
on="05 Nov 1994 08:08:08 -0500";
until="31 Dec 1995 23:59:59 -0000";
for="http://www.gcf.org/index.html";
by="mailto:rating-authority@gcf.org";
rating=(suds=0.5; density=none; color/hue=red);
)
Detailed Syntax
The following grammar, in modified BNF, describes the syntax of
labels. The methods by which labels are embedded in specific
protocols are detailed below.
Notes:
Whitespace is ignored except in quoted strings.
The strings in transmit-name-lists are case insensitive. All other
strings are case sensitive.
Additional options may be added over time. For experimental
purposes, options with names beginning "x-" may be
added at any time without prior arrangment. Extending the options
that are formally part of this specification requires an additional
consensus process before adoption.
This specification requires the use of US-ASCII. Note that the
document, "Rating Systems and Rating Services," describes how a
service can map the US-ASCII transmit-as names to descriptive strings
using other character sets.
Many protocols, such as Internet electronic mail, the HyperText
Transfer Protocol, and USENET News, use ASCII headers as described in
RFC-822. For use in such protocols, we define a new header,
PICS-Label, used to contain the labels described in this document.
The syntax is:
PICS-Label:
where label is described according to the syntax
above. Continuation lines beginning with whitespace may be used as
defined in RFC-822.
Embedding Labels in HyperText Markup Language (HTML)
Labels may be embedded in HTML files as meta-information, using the
META element defined in the HTML specification. This embedding
usually takes one of two forms:
Using the HTTP header equivalency mechanism, which may be used
by an HTTP server to generate a header:
http-equiv="PICS-Label" content='...'
Using the name mechanism, which may be parsed directly by the receiver:
name="PICS-Label" content='...'
(Note that the content attribute use single quotes, because the
label syntax uses double quotes.)
Sending Labels With A Document
When an http server sends a document to a client, it sends additional
headers as well. We specify how the client can request that one or
more labels be included in one of those headers.
The following example illustrates a typical exchange:
Client sends to http server www.greatdocs.com:
GET foo.html HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
negotiable={services={http://www.gcf.org/ratings}}
Server responds to client:
HTTP/1.0 200 OK
Date: Thursday, 30-Jun-95 17:51:47 GMT
MIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: PICS/1.0 scope=any str=opt id=pics headers={PICS-Label PICS-Status}
Content-Encoding: pics
PICS-Label: "http://www.gcf.org/ratings"=(
on="05 Nov 1994 08:08:08 -0500";
until="31 Dec 1995 23:59:59 -0000";
for="http://www.greatdocs.com/foo.html";
by="mailto:rating-authority@gcf.org";
rating=(suds=0.5; density=none; color/hue=red);
)PICS-Status: OK
Content-type: text/html
...contents of foo.html...
Explanation of example
The client requests that document foo.html be sent back. In addition,
the client requests the rating of that document from service,
"http://www.gcf.or/ratings". The request follows the PEP (Protocol
Extension Protocol) syntax for extensions to the http protocol. The
PEP syntax is currently under development at W3C. It allows for more
organized extension to http than the current method of merely adding
extra header-fields.
The server responds by sending back the label, in the
Content-Label: header, as well as the document. The PICS-Status:
header says that the label request has been fulfilled. It would not
be appropriate to signal an http error if the document is available
but the ratings request has not been fulfilled. The PICS-Status
header contains that information. The Protocol: header is included
to conform to PEP, as is the addition of "pics " to the beginning of
the Content-Encoding: field. In this case, there is no other
content-encoding, and so "pics " is the only things that appears in
that field.
Detailed Syntax and Semantics of HTPP Requests for Labels With
Document
The following grammar, in modified BNF, describes the syntax of the
additional header line to be included in an HTTP request for a
document and associated labels.
The services specify one or more labeling services for which the client is requesting a label for the document.
When the 'embedded' flag is set, the server is requested to
prefetch labels for all URLs that are referenced in the current
document. That is, the server is requested to look inside the
document requested, extract the URLs, and send the labels associated
with those URLs. If fulfilled, the request may permit the client to
display links differently depending on how they are labeled. Servers
may, however, choose to ignore this part of the request.
Detailed Syntax and Semantics For HTTP Response Headers
"pics " is inserted just after "Content-Encoding:" if that header would alread be sent. If that header would not normally be included, the following complete header line should be included,
Content-Encoding: pics
The response-codes header is meant to provide diagnostic information.
Explanation is an optional, unformatted, human-readable string.
'no rating service' means that the server does not provide rating service.
'request denied' indicates that the request for ratings was denied. The server may or may not provide rating service.
'service unavailable' specifies which rating services' labels have been omitted.
'ratings unavailable' specifies which URLs for which labels have been omitted.
'no embedded ratings' means the server has ignored the request for embedded ratings.
Sending Labels Separately
PICS labels can also be retrieved separately from the documents they
refer to. The protocol is an extension of HTTP, meaning that a label
server will use HTTP to respond to reuests, even if the labels
themselves describe ftp, gopher, or other non-http sites and
documents.
Requests to stand-alone label servers share some features with
the requests specified above for delivery of labels along with
documents. In particular, the method of specifying which rating
services' labels are desired is identical in the two methods.
Specifying which URL(s) to get labels for
The GET command normally includes a requested-URL. When requesting
labels separately from a document, this URL actually specifies the
label server rather than a document. In addition, it is necessary to
specify the URL(s) for which the client desires labels. This is done
through the usual "?" syntax.
It is acceptable to ask for labels describing a site, or a
subdirectory at a site, rather than just an individual document. It
is also acceptable to ask for labels of all the documents at a site
or in a subdirectory. The syntax must distinguish between these two
types of requests.
Sample Queries
The general format of the query is
?url=[*]"..."url=[*]"..."...
The following sample request, made to the http server
http://www.labels.org, is illustrative:
GET /ratings?url=*"http://www.questionable.org/images" HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
negotiable={services={http://www.gcf.org/ratings}}
The query asks the label server www.label.org/ratings to send a
single label that applies to everything in the images directory at
site www.questionable.org. The desired label should come from the service
http://www.gcf.org/ratings
The label server responds by sending back headers only, with no
document. The headers follow the syntax and semantics described above
for label transmission with a document.
Detailed Syntax and Semantics of HTTP Query for Labels Separate
>From Documents
The following grammar, in modified BNF, describes the syntax of the
query that follows the URL of the label server.
A '*' before the URL indicates a request for a single label that
applies to all items that have the URL as a prefix. This is useful
for requesting a rating of a site.
A '*' at the end of the URL requests all labels for URLs that
have the URL as a prefix. This is useful for requesting the ratings
of all items at a site.
Note that it is permitted to include more than one URL in the
request. Each such requested is handled independently, but all labels
are sent back in a single header.
Posting Labels to a Label Server
To aid organizations in collecting ratings, we specify how a client
can submit new ratings to a rating server.
To be filled in. Basically, do a POST to the label server. Include
the Accept-Protcol: line. Send the ratings as a header. We could send
the ratings in the body, but we might as well be consistent with how
we send them in the other distribution methods. We will need to
define response codes to go in a response header.
Why HTTP For Label Servers
Instead of extending HTTP, we considered proposals for
special-purpose label transport protocols. Before making a final
decision, we constructed the following lists of pros and cons.
Advantages of Using HTTP
An existing HTTP server can be used as a PICS label server.
This is particularly useful in the short term. CGI scripts at the
HTTP server can handle the special query syntax and entity header
field of a request for labels.
A label returned from a label server and a label returned along
with a document from an HTTP server can use identical label formats.
Client programs that already support HTTP will have much less
new code to implement.
Client programs that do not support HTTP will have to support a
new protocol in any case. It may be easier to support HTTP, because
of available software libraries.
Several protocol elements are already fully specified by HTTP
that would be required in any PICS protocol.
Date and time formats.
Content encoding types.
Character set and Internationalization issues.
Error/result conditions. Both result categories (extensible),
as well as a sample set of messages are specified.
Handling of expiration dates for each URL queried.
HTTP is quite stable, has not diverged, and is well accepted.
Security and Payment systems either exist or are being developed
for HTTP. A binary format may also be developed for speed. PICS need
not reinvent such systems.
Firewalls tend to allow HTTP headers to be transmitted already.
A new protocol would take much longer to be accepted.
Current extensibility already defines how extensions to PICS
itself should be accomplished.
Advantages of Creating a New Protocol Instead of Using HTTP
A new protocol would avoid any HTTP protocol wars.
Label servers and clients would not need to be updated to
accommodate HTTP changes.
RFC822 and other precedents could still be used in the design of
a new protocol.
A binary format could be considered initially for speed.
UDP or other datagram lookups could be considered.
FAQ - Frequently Asked Questions
Why is there no ftp, gopher, or netnews protocol for requesting labels
along with a document?
Labels can be sent as additional headers in any protocol that
employs RFC-822 style headers. We have not yet determined, however,
convenient extensions to protocols other than http to permit requests
that ask for labels from specific services. We may specify such
extensions in the future.
How do you get labels for items on FTP, Gopher, or netnews
servers? Are we forcing all FTP implementations to implement all of
HTTP as well?
FTP, Gopher, and netnews servers need not distribute PICS labels.
Labels for items on such servers can be retrieved from an HTTP-based
label server.
The PICS premise is that all compliant client will have to implement
some new protocol. The subset of HTTP which would be required for
obtaining a PICS label can be minimal (see section above on minimal
implementation). HTTP will be no more difficult to implement in an FTP
(or other) client than a brand-new protocol which accomplishes
similar features.
Can existing HTTP servers be used as PICS label servers?
Using CGI scripts, or with a small amount of added code in the
HTTP server, an existing HTTP server can be configured to access
a database of labels and return that information coded as additional
HTTP Headers. Most of the work is in the lookup and formatting
of the labels themselves, not the modifications to HTTP.
How do I design a really fast PICS server? Won't the overhead
be too much?
HTTP already explicitly defines the minimum fields required and
then what rules must be followed when additional information is
useful to the transaction. For example, HTTP does not require
that clients provide "Accept:" headers to indicate preferred
MIME types for the content, but if they are provided, servers
can match up available formats with the client's request. It may
be designed to optimize throughput or to optimize the appearance
of the result, or to adjust to the client software's preference.
If you minimize the server's response to one line, plus the label
information, you are already dealing with the minimum amount of
data transfer possible to obtain a label. In addition, most performance
issues for PICS will probably be addressed with caching, not by
reducing lookup time for a single label. Caching optimization
requires meta-data which can be easily encoded within HTTP headers.
How can we keep the PICS extensions from getting tied up
in HTTP standardization?
The management of header extensions for HTTP has been an issue
of discussion and work by the HTTP group for some time. The HTTP
specification lays down specific rules for the handling of extensions
which guarantee that those extensions will not be made invalid
by any revisions of HTTP itself. In addition, the W3C is working
on a system for managing and negotiating HTTP extensions even
more intelligently.
The worst risk seems to be that HTTP could be upgraded to a new
revision level forcing some HTTP implementations to support multiple
versions (1.0 and 2.0, for example) or forcing some PICS servers
to update their protocol as well. Hopefully a major update in
HTTP would bring enough benefits for PICS to make any update worthwhile.
What is PEP and Why is PICS Using It?
What if PEP Does Not Catch On?
If the general extension mechanism specified by PEP does not
catch on, PICS servers will need to look for the specific header line
beginning Accept-Protocol: PICS/1.0 and parse it to determine the
rating request. PICS clients will need to look for and parse the
specific header lines PICS-Label and PICS-Status. We will have to
hope that no other group tries to extend HTTP in a way that uses
headers named PICS-Label or PICS-Status.