Distributed Web Crawling
Idea: use volunteer computingto increase not onlycomputation power, but alsocommunication power
- Allow users with slow links todo communication-bound tasks;e.g., web crawling/searching
Experimental Results:
- web crawling through slow link is slow even with fast PC (w/ JIT)
- web crawling using volunteers with slower Java (4x slower) but faster links resulted in speedup
- using “multi-work” Engine and Manager gave more speedup transparently (without recoding)
- ongoing work on even more speedup