org.supermind.crawl
Class DefaultFetchedURLs

java.lang.Object
  extended by org.supermind.crawl.DefaultFetchedURLs
All Implemented Interfaces:
FetchedURLs

public class DefaultFetchedURLs
extends java.lang.Object
implements FetchedURLs

Default implementation of FetchedURLs. Saves URL checksums directly to a MapFile.


Field Summary
 
Fields inherited from interface org.supermind.crawl.FetchedURLs
LOG
 
Constructor Summary
DefaultFetchedURLs()
           
 
Method Summary
 void close()
           
 boolean contains(java.net.URL url)
          Has this URL already been fetched?
 ScheduledURL get(long id)
          Get a persisted URL.
protected  long getChecksum(java.net.URL url)
          Create a 64-bit checksum by merging a 32-bit host checksum with the url's 32-bit checksum.
 void init()
           
 void insert(ScheduledURL url, org.apache.nutch.protocol.ProtocolOutput output)
          Insert a fetched URL.
 void setChecksum(java.util.zip.Checksum checksum)
           
 void setPersister(LongPersister persister)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DefaultFetchedURLs

public DefaultFetchedURLs()
Method Detail

close

public void close()
           throws java.io.IOException
Specified by:
close in interface FetchedURLs
Throws:
java.io.IOException

contains

public boolean contains(java.net.URL url)
Has this URL already been fetched?

Specified by:
contains in interface FetchedURLs
Parameters:
url -
Returns:

get

public ScheduledURL get(long id)
Description copied from interface: FetchedURLs
Get a persisted URL. (optional operation)

Specified by:
get in interface FetchedURLs
Parameters:
id - ScheduledURL's id
Returns:
ScheduledURL, or null if doesn't exist

getChecksum

protected long getChecksum(java.net.URL url)
Create a 64-bit checksum by merging a 32-bit host checksum with the url's 32-bit checksum. By using host checksum as sig. bits, urls can be easily sorted by host.

Parameters:
url -
Returns:

init

public void init()
          throws java.io.IOException
Specified by:
init in interface FetchedURLs
Throws:
java.io.IOException

insert

public void insert(ScheduledURL url,
                   org.apache.nutch.protocol.ProtocolOutput output)
            throws java.io.IOException
Description copied from interface: FetchedURLs
Insert a fetched URL.

Specified by:
insert in interface FetchedURLs
Parameters:
url - url
output - protocol output
Throws:
java.io.IOException

setChecksum

public void setChecksum(java.util.zip.Checksum checksum)

setPersister

public void setPersister(LongPersister persister)
                  throws java.io.IOException
Throws:
java.io.IOException