Implementation.

Next: Timing Measurements. Up: Distributed Web-Crawling Previous: Distributed Web-Crawling

Implementation.

Like in the factoring application, implementing this application involved using basic library components, and defining application-specific extensions of the work, work GUI, result, resultGUI, and problem objects. The CrawlerWork object contained a target URL string, and code adapted from the sequential web crawler code in Chap.8 of Core Java [12]. The result object simply contained the URL strings of the links found. The code for the work and result GUIs was reused from the factoring application with minor modifications.

In addition to these, we also defined a CrawlerWorkManager class which extends BasicWorkManager. CrawlerWorkManager overrides the putResult() method such that whenever a result URL string is placed into the result pool, it first checks if the URL string is already in the pool. If not, it adds the URL string to both the result pool and the work pool.

Unfortunately, because of Java's applet security restrictions, the web crawler does not work with normal browsers. It is not difficult, however to run it with the appletviewer from Sun's JDK, which has an option for allowing unrestricted network access. Microsoft Internet Explorer 4.0 and Sun's HotJava browser also have similar options, but apparently, Netscape 4.0 does not.

Next: Timing Measurements. Up: Distributed Web-Crawling Previous: Distributed Web-Crawling

Luis Sarmenta
1/2/1998