org.supermind.crawl
Interface FetchedURLs

All Known Implementing Classes:
CachingFetchedURLs, DefaultFetchedURLs, InMemoryFetchedURLs, LastModifiedFetchedURLs

public interface FetchedURLs

Database of URLs that have already been fetched.


Field Summary
static java.util.logging.Logger LOG
           
 
Method Summary
 void close()
           
 boolean contains(java.net.URL url)
          Has the URL already been fetched?
 ScheduledURL get(long id)
          Get a persisted URL.
 void init()
           
 void insert(ScheduledURL url, org.apache.nutch.protocol.ProtocolOutput output)
          Insert a fetched URL.
 

Field Detail

LOG

static final java.util.logging.Logger LOG
Method Detail

close

void close()
           throws java.io.IOException
Throws:
java.io.IOException

contains

boolean contains(java.net.URL url)
Has the URL already been fetched?

Parameters:
url -
Returns:

get

ScheduledURL get(long id)
Get a persisted URL. (optional operation)

Parameters:
id - ScheduledURL's id
Returns:
ScheduledURL, or null if doesn't exist
Throws:
java.lang.UnsupportedOperationException - if the implementation doesn't save ScheduledURLs

init

void init()
          throws java.io.IOException
Throws:
java.io.IOException

insert

void insert(ScheduledURL url,
            org.apache.nutch.protocol.ProtocolOutput output)
            throws java.io.IOException
Insert a fetched URL.

Parameters:
url - url
output - protocol output
Throws:
java.io.IOException