Last updated August 7, 1994

The SG-Scout WWW Robot:

The SG-Scout robot was developed for the Xerox Palo Alto Research Center as part of a project involving the development of a new form of directed WWW browser. The SG-Scout is compliant with the Standard for Robot Exclusion and is featured on Martijn Koster's list of active WWW robots. The SG-Scout was run in late June and again in early August 1994. In August the robot discovered over 7,250 WWW servers and 250,000 html and text pages available through the HTTP protocol.

The SG-Scout in Action

The SG-Scout maintains a list of the servers it has discovered and cycles through this list, requesting one document from each server per cycle. The Scout also uses this list of servers to store and retrieve the aliases of each server. The SG-Scout is designed to be run in parallel. Up to thirty images of the Scout have been run simultaneously using one database. Typically, however, four or five Scout images are run in parallel during a search of the web. With five Scout images running in parallel, the robot can download 30,000 documents each day. This large number of requests should not cause problems for servers, however, because equal numbers of documents are requested from each of the known servers in any time period. Thus, since over 7,250 servers are known, the SG-Scout should request no more than 5 or 6 documents from each server each day.

"How will I know if the SG-Scout visits my server?"

The SG-Scout is compliant with the Standard for Robot Exclusion and includes with every request User-Agent: and From: fields which contain the SG-Scout's identification ("SG-Scout/x.xx") and the operator's email address respectively. If these headers appear in your log files, then you know that the SG-Scout has been knocking on your door!

Bug Reports

There was a bug in the SG-Scout which appeared during the first parallel test of the program (late June). The bug involved the false detection of timed out connections and resulted in repeated requests for the same file. The bug was fixed within 30 hours of its misbehavior.


Email concerning the SG-Scout should be directed to