org.supermind.crawl
Class LastModifiedFetchedURLs

java.lang.Object
  extended by org.supermind.crawl.LastModifiedFetchedURLs
All Implemented Interfaces:
FetchedURLs, LastModifiedDB

public class LastModifiedFetchedURLs
extends java.lang.Object
implements FetchedURLs, LastModifiedDB

Records URLs and their last modified times.


Field Summary
 
Fields inherited from interface org.supermind.crawl.FetchedURLs
LOG
 
Fields inherited from interface org.supermind.crawl.LastModifiedDB
NO_OP
 
Constructor Summary
LastModifiedFetchedURLs()
           
 
Method Summary
 void close()
          Close.
 boolean contains(java.net.URL url)
          Has this URL already been fetched?
 ScheduledURL get(long id)
          Get a persisted URL.
protected  long getChecksum(java.net.URL url)
          Create a 64-bit checksum by merging a 32-bit host checksum with the url's 32-bit checksum.
 long getLastModified(java.net.URL url)
          Get last modified time (in milliseconds).
 void init()
           
 void insert(ScheduledURL url, org.apache.nutch.protocol.ProtocolOutput output)
          Insert a fetched URL.
 void setChecksum(java.util.zip.Checksum checksum)
           
 void setPersister(LongLongPersister persister)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LastModifiedFetchedURLs

public LastModifiedFetchedURLs()
Method Detail

close

public void close()
           throws java.io.IOException
Description copied from interface: LastModifiedDB
Close.

Specified by:
close in interface FetchedURLs
Specified by:
close in interface LastModifiedDB
Throws:
java.io.IOException

contains

public boolean contains(java.net.URL url)
Has this URL already been fetched?

Specified by:
contains in interface FetchedURLs
Parameters:
url -
Returns:

get

public ScheduledURL get(long id)
Description copied from interface: FetchedURLs
Get a persisted URL. (optional operation)

Specified by:
get in interface FetchedURLs
Parameters:
id - ScheduledURL's id
Returns:
ScheduledURL, or null if doesn't exist

getChecksum

protected long getChecksum(java.net.URL url)
Create a 64-bit checksum by merging a 32-bit host checksum with the url's 32-bit checksum. By using host checksum as sig. bits, urls can be easily sorted by host.

Parameters:
url -
Returns:

getLastModified

public long getLastModified(java.net.URL url)
Description copied from interface: LastModifiedDB
Get last modified time (in milliseconds).

Specified by:
getLastModified in interface LastModifiedDB
Returns:

init

public void init()
          throws java.io.IOException
Specified by:
init in interface FetchedURLs
Throws:
java.io.IOException

insert

public void insert(ScheduledURL url,
                   org.apache.nutch.protocol.ProtocolOutput output)
            throws java.io.IOException
Description copied from interface: FetchedURLs
Insert a fetched URL.

Specified by:
insert in interface FetchedURLs
Parameters:
url - url
output - protocol output
Throws:
java.io.IOException

setChecksum

public void setChecksum(java.util.zip.Checksum checksum)

setPersister

public void setPersister(LongLongPersister persister)