Next: Timing Measurements.
Up: Distributed Web-Crawling
Previous: Distributed Web-Crawling
Like in the factoring application, implementing this
application involved using basic library components, and
defining application-specific extensions of the work, work GUI,
result, resultGUI, and problem objects.
The CrawlerWork object contained a target URL
string, and code adapted from the sequential
web crawler code in Chap.8 of Core Java [12].
The result object simply contained the URL strings of the links found.
The code for the work and result GUIs was reused
from the factoring application with minor modifications.
In addition to these, we also defined a CrawlerWorkManager
class which extends BasicWorkManager.
CrawlerWorkManager overrides the
putResult() method such that whenever a result
URL string is placed into the result pool,
it first checks if the URL string is already in the pool.
If not, it adds the URL string to both the result pool and
the work pool.
Unfortunately, because of Java's applet security restrictions,
the web crawler does not work with normal browsers.
It is not difficult, however to run it with
the appletviewer from Sun's JDK, which has
an option for allowing unrestricted network access.
Microsoft Internet Explorer 4.0 and Sun's HotJava browser
also have similar options, but apparently, Netscape 4.0 does not.
Next: Timing Measurements.
Up: Distributed Web-Crawling
Previous: Distributed Web-Crawling
Luis Sarmenta
1/2/1998