org.supermind.crawl
Interface FetchList

All Known Implementing Classes:
DefaultFetchList

public interface FetchList

A data structure for managing the list of URLs to fetch. Roughly equivalent to Mercator's Frontier. This class is NOT thread-safe. Each FetcherThread is supposed to have its own fetchlist.


Method Summary
 void close()
          Release resources.
 boolean contains(java.net.URL url)
          Does the fetchlist contain this url?
 int getCurrentSize()
          Total number of URLs this fetchlist currently contains.
 void init()
          Initialize resources.
 HostQueue next()
          Get next HostQueue.
 void queue(ScheduledURL parent, java.net.URL url)
          Add a ScheduledURL to the fetchlist.
 void release(HostQueue hostQueue, int popped, long timeTaken)
          Release HostQueue from use.
 

Method Detail

close

void close()
Release resources.


contains

boolean contains(java.net.URL url)
Does the fetchlist contain this url?

Parameters:
url -
Returns:

getCurrentSize

int getCurrentSize()
Total number of URLs this fetchlist currently contains. This variable is only updated when HostQueues have been release(org.supermind.crawl.HostQueue, int, long)d.

Returns:
number of URLs

init

void init()
Initialize resources.


next

HostQueue next()
Get next HostQueue.

Returns:
next HostQueue, or null if none of the HostQueues have any URLs

queue

void queue(ScheduledURL parent,
           java.net.URL url)
Add a ScheduledURL to the fetchlist. Multiple threads can be calling this method, and implementing classes must synchronize access accordingly.

Parameters:
parent - originating url
url - url to queue

release

void release(HostQueue hostQueue,
             int popped,
             long timeTaken)
Release HostQueue from use. This method must be called when HostQueue.pop() completes.

Parameters:
hostQueue -
popped - number of URLs popped from the queue
timeTaken - total time taken to download the popped urls