PICS Label Distribution
Label Syntax and Communication Protocols
by Tim Krauskopf (timk@spyglass.com), Jim Miller
(jmiller@w3.org), Paul Resnick
(presnick@research.att.com),
and Win Treese
(treese@OpenMarket.com)
Revision 1, DRAFT 4
Last modified on Sun. Oct 29 1995 by JMiller and PResnick
Overview
This document has been prepared for the technical subcommittee of
PICS (Platform for Internet Content Selection). It defines a general
format for labels that permits them to be embedded in RFC-822-style
headers. It defines four methods by which PICS labels may be
transmitted:
- In a document
- One or more labels may be embedded in a document. We specify the
format and note in particular how to use a META tag to embed labels
in html documents.
- With a document
- An http client can request that labels be sent along with a
document. A server can satisfy the request, by sending the labels in
http headers.
- Separately
- A client can request labels from a label server that runs the
http protocol. The labels may refer to items available through
protocols other than http, such as ftp, gopher, or netnews.
- Posting
- A client may post labels to a label server that runs the http
protocol. This protocol will help rating services to collect new
ratings from volunteers or paid staff.
General Format
A label consists of a service identifier, label options,
and a rating. The service identifier tells who issued the
rating. Label options give additional properties of the document
being rated as well as the rating itself, such as the time the
document was rated. The rating itself is a set of attribute-value
pairs that describe a document along several dimensions.
The general format of a label is
(PICS-1.0
(rating-service "<URL>")
[option...]
(ratings (<category> <value>) ...))
Label options are as follows:
- (on ISO-date)
- The date on which this rating was issued.
- (until ISO-date)
- The date on which this rating expires.
- (for ['*'] quotedURL)
- The URL of the item to which this rating applies. * indicates
that the label applies to all items matching the URL (i.e., items
whose URLs contain this URL as a prefix), which is useful for
assigning a rating to a site or directory.
- (by quotedURL)
- The URL of the entity rating the item.
- (at ISO-date)
- The last modification date of the item to which this rating
applies, at the time the rating was assigned. This can serve as a
less expensive, but less reliable, alternative to the message
integrity check (MIC) options.
- (mic-md5 "Base64-string")
- A message integrity check (MIC) of the item being rated.
The MD5 Message Digest Algorithm is used to compute the MIC.
- (complete-label quotedURL)
- Dereferencing this URL returns a complete label that can be used
in place of the current one. The complete label has values for as many
attributes as possible. This is used when a short label is transmitted
for performance purposes but additional information is also available.
Example
For example, a label that uses the example rating system from
the document PICS Ratings Services and Ratings Systems might be as follows:
(PICS-1.0
(rating-service "http://www.gcf.org")
(on "1994-11-05T08:15:23-0500")
(until "1995-12-31T23:59:59-0000")
(for "http://www.gcf.org/index.html")
(by "mailto:rating-authority@gcf.org")
(ratings ((suds 0.5) (density 0) (color/hue 1))))
Detailed Syntax
The following grammar, in modified BNF, describes the syntax of
labels. The methods by which labels are embedded in specific
protocols are detailed below.
Notes:
- Whitespace is ignored except in quoted strings.
- The string in a transmit-name is case insensitive. All other
strings are case sensitive.
- Option names ("on," "until," "at," etc) are case insensitive.
- Additional options may be added over time. For experimental
purposes, options with names beginning "x-" may be
added at any time without prior arrangment. Extending the options
that are formally part of this specification requires an additional
consensus process before adoption.
- This specification requires the use of US-ASCII. Note that the
document PICS Ratings Services and Ratings Systems describes how a
service can map the US-ASCII transmit-names to descriptive strings
using other character sets.
labellist :: label [labellist]
label :: '(PICS-1.0' service option* '(' 'ratings' rating+ ')' ')'
service :: '(' quotedURL ')'
quotedURL :: '"' URL '"' as described and extended in
PICS Ratings Services and Ratings Systems.
option :: '(' 'on' quoted-ISO-date ')'
| '(' 'until' quoted-ISO-date ')'
| '(' 'at' quoted-ISO-date ')'
| '(' 'for' ['*']quotedURL ')'
| '(' 'complete-label' quotedURL ')'
| '(' 'mic-md5' base64-string ')'
quoted-ISO-date :: '"'YYYY'-'MM'-'DD'T'hh':'mm':'ssStz'"'
based on the ISO 8601:1988 date and time standard, restricted
to the specific form described here:
YYYY :: four-digit year
MM :: two-digit month (01=January, etc.)
DD :: two-digit day of month (01 through 31)
hh :: two digits of hour (00 through 23) (am/pm NOT allowed)
mm :: two digits of minute (00 through 59)
ss :: two digits of second (00 through 59)
S :: sign of time zone offset from UTC ('+' or '-')
tz :: four digit amount of offset from UTC
(e.g., 1512 means 15 hours and 12 minutes)
For example, "1994-11-05T08:15:23-0500" is a valid quoted-ISO-date
denoting November 5, 1994, 8:15:23 am, Eastern Standard Time
rating :: '(' transmit-name number ')'
transmit-name :: ' " ' [1*n]extendedalphanum ['/'transmit-name] ' " '
value :: number
number :: [sign]unsignedint['.' [unsignedint]]
sign :: '+' | '-'
unsignedint :: [1*n][0-9]
quotedshortname :: ' " ' [1*n]extendedalphanum ' " '
extendedalphanum :: 'A' | ... | 'Z' | 'a' | ... | 'z' | '+' | '-'
base64-string is as defined in RFC-1421.
RFC-822 Headers
Many protocols, such as Internet electronic mail, the HyperText
Transfer Protocol, and USENET News, use ASCII headers as described in
RFC-822. For use in such protocols, we define a new header,
PICS-Label, used to contain the labels described in this document.
The syntax is:
PICS-Label: <labellist>
where labellist is described according to the syntax above.
Continuation lines beginning with whitespace may be used following the
specification given in RFC-822.
Embedding Labels in HyperText Markup Language (HTML)
Labels may be embedded in HTML files as meta-information, using the
META element defined in the HTML specification. This embedding
takes one of two forms:
- Using the HTTP header equivalency mechanism, which may be used
by an HTTP server to generate a header:
<META>http-equiv="PICS-Label" content='labellist'</META>
- Using the name mechanism, which may be parsed directly by the receiver:
<META>name="PICS-Label" content='labellist'</META>
(Note that the content attribute uses single quotes, because the label
syntax uses double quotes.)
Sending Labels With A Document
When an http server sends a document to a client, it sends additional
headers as well. We specify how the client can request that one or
more labels be included in one of those headers.
Example
Client sends to http server www.greatdocs.com:
GET foo.html HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
negotiable={services={"http://www.gcf.org/ratings"}}
Server responds to client:
HTTP/1.0 200 OK
Date: Thursday, 30-Jun-95 17:51:47 GMT
MIME-version: 1.0
Last-modified: Thursday, 29-Jun-95 17:51:47 GMT
Protocol: PICS/1.0 scope=any str=opt id=pics headers={PICS-Label PICS-Status}
Content-Encoding: pics
PICS-Label:
(PICS-1.0
(rating-service "http://www.gcf.org")
(on "1994-11-05T08:15:23-0500")
(until "1995-12-31T23:59:59-0000")
(for "http://www.gcf.org/index.html")
(by "mailto:rating-authority@gcf.org")
(ratings ((suds 0.5) (density 0) (color/hue 1))))
PICS-Status: OK
Content-type: text/html
...contents of foo.html...
Explanation of example
The client requests that document foo.html be sent back. In addition,
the client requests the rating of that document from service
"http://www.gcf.or/ratings". The request follows the PEP (Protocol
Extension Protocol) syntax for extensions to the http protocol. The
PEP syntax is currently under development at W3C. It allows for more
organized extension to http than the current method of merely adding
extra header-fields.
The server responds by sending back the label, in the PICS-Label
header, as well as the document. The PICS-Status header indicates the
status of the label request. It would not be appropriate to signal an
http error if the document is available but the ratings request has
not been fulfilled. In this case, the PICS-Status header confirms
that the request has been fulfilled. The Protocol header is included
to conform to PEP, as is the addition of "pics " to the beginning of
the Content-Encoding field. In this case, there is no other
content-encoding, and so "pics " is the only things that appears in
the Content-Encoding field.
Detailed Syntax and Semantics of HTPP Requests for Labels With
Document
The following grammar, in modified BNF, describes the syntax of the
additional header line to be included in an HTTP request for a
document and associated labels.
accept-header ::
'Accept-Protocol: PICS/1.0 scope=any rx-str=opt
negotiable={' services ['embedded'] '}'
services :: 'services={' quotedURL 1*(';' quotedURL) '}'
Notes on the syntax: - The services specify one or more rating
services from which the client is requesting a label for the
document.
- When the 'embedded' flag is set, the server is requested to
prefetch labels for all URLs that are referenced in the current
document. That is, the server is requested to look inside the
document requested, extract the URLs, and send the labels associated
with those URLs. If fulfilled, the request may permit the client to
display links differently depending on how they are labeled. Servers
may, however, choose to ignore this part of the request.
Detailed Syntax and Semantics For HTTP Response Headers
Three additional headers are specfied:
protocol-header :: 'Protocol: PICS/1.0 scope=any str=opt id=pics
headers={PICS-Label PICS-Status}'
label-header :: 'PICS-Label:' labellist
status-header :: 'PICS-Status:' status-code
status-code :: code [Explanation]
code :: 'OK' | 'no-rating-service' | 'request-denied' |
'service-unavailable' '(' 1*quotedURL ')' |
'ratings-not-available' '(' 1*quotedURL ')' |
'no-embedded-ratings'
explanation :: quotedshortname
One header is modified:
- "pics " is inserted just after "Content-Encoding:" if that header
would already be sent. If that header would not normally be included,
the following complete header line should be included,
Content-Encoding: pics
Notes on the response syntax:
- The status-header is meant to provide diagnostic information.
- Explanation is an optional, unformatted, human-readable string.
- 'no-rating-service' means that the server never provides labels.
- 'request-denied' indicates that the request for ratings was
denied. The server may or may not provide any rating service
information, but it is unable to do so in this case. For example, the
server might require user authentication to allow for billing before
it will provide labels.
- 'service-unavailable' specifies which rating services' labels
have been omitted. This indicates that the server understands the
request but does not provide labels from the requested rating service.
- 'ratings-unavailable' specifies which URLs for which labels have
been omitted. This indicates that the server is able to provide
labels from the requested rating service, but it cannot find labels
for the URLs that have been requested; that is, the rating service has
not rated the requested URLs.
- 'no-embedded-ratings' means the server has ignored the request
for embedded ratings.
Requesting Labels Separately
PICS labels can also be retrieved separately from the documents to
which they refer. The protocol is an extension of HTTP, meaning that a
label server will use HTTP to respond to requests, even if the labels
themselves describe ftp, gopher, or other non-http sites and
documents.
Requests to stand-alone label servers share some features with
the requests specified above for delivery of labels along with
documents. In particular, the method of specifying which rating
services' labels are desired is identical in the two methods.
A GET command includes a requested-URL. When requesting labels
separately from a document, this URL specifies the label server
rather than the document for which a label is desired.
As with requests for labels along with documents, separate requests for
labels rely on the Accept-Protocol header. In addition to specifying
the desired rating services, it is also necessary to specify the URL
or URLs for which labels are desired.
Sample Query
The following sample request, made to the http server
http://www.labels.org, is illustrative:
GET /ratings HTTP/1.0
Accept-Protocol: PICS/1.0 scope=any rx-str=opt
negotiable={services={"http://www.gcf.org/ratings");
urls={*"http://www.questionable.org/images"}
}
The query asks the label server www.label.org/ratings to send a
single label that applies to everything in the images directory at
site www.questionable.org. The desired label should come from the service
http://www.gcf.org/ratings.
The label server responds by sending back headers only, with no
document. The headers follow the syntax and semantics described above
for label transmission with a document.
Detailed Syntax and Semantics of HTTP Query for Labels Separate
>From Documents
The following grammar, in modified BNF, describes the syntax of the
Accept-Protocol header that the client sends to the server.
accept-header ::
'Accept-Protocol: PICS/1.0 scope=any rx-str=opt
negotiable={' services ['embedded'] ';' urls '}'
services :: 'services={' quotedURL 1*(';' quotedURL) '}'
urls :: 'urls={' [labelURL 1*(';' labelURL) '}'
labelURL :: ['*'] quotedURL ['*']
Notes on the syntax:
- * preceding the quoted URL indicates a request for a single label that
applies to all items that have the URL as a prefix. This is useful for
requesting a rating of a site.
- * after the quoted URL requests all labels for URLs that have the
URL as a prefix. This is useful for requesting the ratings of all
items at a site.
- If * occurs both before and after a URL, it is equivalent to
having only the * before the URL. That is, it is treated as a request
for a single label that describes the entire site or directory.
- Note that it is permitted to include more than one URL in the
request. Each such request is handled independently, but all labels
are sent back in a single header.
- For consistency with requests for labels along with documents, we
have allowed the 'embedded' flag in this query. It is expected that
most label servers will ignore this flag, because they will be
unwilling to fetch the documents in order to extract embedded URLs.
Posting Labels to a Label Server
To aid organizations in collecting ratings, we specify how a client
can submit new ratings to a rating server.
This section has not been written yet. The basic idea is to do a
POST to the label server. Include the Accept-Protocol: line. Send the
ratings as a header. We could send the ratings in the body, but we
might as well be consistent with how we send them in the other
distribution methods. We will need to define response codes to go in
a response header.
Why HTTP For Label Servers
Instead of extending HTTP, we considered proposals for
special-purpose label transport protocols. Before making a final
decision, we constructed the following lists of pros and cons.
Advantages of Using HTTP
- An existing HTTP server can be used as a PICS label server. This
is particularly useful in the short term. CGI scripts at the HTTP
server can handle the special header fields of a request for labels.
- A label returned from a label server and a label returned along
with a document from an HTTP server can use identical label formats.
- Client programs that already support HTTP will have much less
new code to implement.
- Client programs that do not support HTTP will have to support a
new protocol in any case. It may be easier to support HTTP than a
newly defined label transport protocol, because of available software
libraries.
- Several protocol elements are already fully specified by HTTP
that would be required in any PICS protocol.
- Date and time formats.
- Content encoding types.
- Character set and Internationalization issues.
- Error/result conditions. Both result categories (extensible),
as well as a sample set of messages are specified.
- Handling of expiration dates for each URL queried.
- HTTP is quite stable, has not diverged, and is well accepted.
- Security and payment systems either exist or are being developed
for HTTP. A binary format may also be developed for speed. PICS need
not reinvent such systems.
- Firewalls tend to allow HTTP headers to be transmitted already.
A new protocol would take much longer to be accepted.
- A reliable connection (initially TCP based), ASCII-based protocol
seems desirable initially.
- Current extensibility already defines how extensions to PICS
itself should be accomplished.
Advantages of Creating a New Protocol Instead of Using HTTP
- A new protocol would avoid any HTTP protocol wars.
- Label servers and clients would not need to be updated to
accommodate HTTP changes.
- RFC-822 and other precedents could still be used in the design of
a new protocol.
- A binary format could be considered initially for speed.
- UDP or other datagram lookups could be considered.
FAQ - Frequently Asked Questions
Why is there no ftp, gopher, or netnews protocol for requesting labels
along with a document?
Labels can be sent as additional headers in any protocol that
employs RFC-822 style headers. We have not yet determined, however,
convenient extensions to protocols other than http to permit requests
that ask for labels from specific services. We may specify such
extensions in the future.
How do you get labels for items on FTP, Gopher, or netnews
servers? Are we forcing all FTP implementations to implement all of
HTTP as well?
FTP, Gopher, and netnews servers need not distribute PICS labels.
Labels for items on such servers can be retrieved from an HTTP-based
label server.
The PICS premise is that all compliant clients will have to
implement some new protocol. The subset of HTTP which would be
required for obtaining a PICS label can be minimal. HTTP will be no
more difficult to implement in an FTP (or other) client than a
brand-new protocol that provides similar features.
Can existing HTTP servers be used as PICS label servers?
Using CGI scripts, or with a small amount of added code in the
HTTP server, an existing HTTP server can be configured to access
a database of labels and return that information coded as additional
HTTP Headers. Most of the work is in the lookup and formatting
of the labels themselves, not the modifications to HTTP.
How do I design a really fast PICS server? Won't the overhead
be too much?
HTTP already explicitly defines the minimum fields required and
then what rules must be followed when additional information is
useful to the transaction. For example, HTTP does not require
that clients provide "Accept:" headers to indicate preferred
MIME types for the content, but if they are provided, servers
can match up available formats with the client's request. An HTTP server may
be designed to optimize throughput or to optimize the appearance
of the result, or to adjust to the client software's preference.
If you minimize the server's response to one line, plus the label
information, you are already dealing with the minimum amount of
data transfer possible to obtain a label. In addition, most performance
issues for PICS will probably be addressed with caching, not by
reducing lookup time for a single label. Caching optimization
requires meta-data which can be easily encoded within HTTP headers.
How can we keep the PICS extensions from getting tied up
in HTTP standardization?
The management of header extensions for HTTP has been an issue
of discussion and work by the HTTP group for some time. The HTTP
specification lays down specific rules for the handling of extensions
which guarantee that those extensions will not be made invalid
by any revisions of HTTP itself. In addition, the W3C is working
on a system (PEP) for managing and negotiating HTTP extensions even
more intelligently.
The worst risk seems to be that HTTP could be upgraded to a new
revision level forcing some HTTP implementations to support multiple
versions (1.0 and 2.0, for example) or forcing some PICS servers
to update their protocol as well. Hopefully a major update in
HTTP would bring enough benefits for PICS to make any update worthwhile.
What is PEP and Why is PICS Using It?
The Protocol Extension Proposal from the World Wide Web Consortium
uses a trio of header fields (Protocol, Accept-Protocol, and
Content-Encoding) to allow a HTTP client and server to do
sophisticated negotiation about the set of header fields and their
meanings. It is being proposed for use in http 1.2 and http-ng, and
is currently under careful scrutiny by the W3C Security Editorial
Board to make sure that it contains the features necessary to provide
security for general document transmission as well as electronic
payments.
PICS faces many of the same problems that face the security and
electronic payment community. In PICS the issue revolves around the
ability for the client to tell the server from which labeling
services it would like to have labels. This is a simple negotiation
problem of the kind PEP was designed to solve. Rather than invent an
orthogonal mechanism it seemed best to use one that is already being
proposed and investigated.
What if PEP Does Not Catch On?
If the general extension mechanism specified by PEP does not
become a generic feature of HTTP servers, PICS servers will need to
look for the specific header line beginning Accept-Protocol: PICS/1.0
and process it to determine the rating request. PICS clients will
need to look for and process the specific header lines PICS-Label and
PICS-Status. We will also have to hope that no other group tries to
extend HTTP in a way that uses headers named PICS-Label or
PICS-Status.
References
To be added.