org.supermind.crawl
Class NutchFetchListCrawlSeedSource

java.lang.Object
  extended by org.supermind.crawl.NutchFetchListCrawlSeedSource
All Implemented Interfaces:
CrawlSeedSource

public class NutchFetchListCrawlSeedSource
extends java.lang.Object
implements CrawlSeedSource

Uses a Nutch FetchList to seed a crawl.


Field Summary
(package private)  org.apache.nutch.io.ArrayFile.Reader fetchList
           
(package private)  int idx
           
(package private)  java.util.Iterator it
           
(package private)  boolean next
           
(package private)  org.apache.nutch.pagedb.FetchListEntry nextEntry
           
 
Constructor Summary
NutchFetchListCrawlSeedSource()
           
 
Method Summary
 void close()
          Close resources.
 SeedURL getSeedURL(int index)
          Get seed URL corresponding to an index.
 java.util.Iterator<SeedURL> getSeedURLs()
          Get iterator of seed URLs.
 void setFile(java.lang.String file)
           
 void setNfs(org.apache.nutch.fs.NutchFileSystem nfs)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

fetchList

org.apache.nutch.io.ArrayFile.Reader fetchList

idx

int idx

it

java.util.Iterator it

next

boolean next

nextEntry

org.apache.nutch.pagedb.FetchListEntry nextEntry
Constructor Detail

NutchFetchListCrawlSeedSource

public NutchFetchListCrawlSeedSource()
Method Detail

close

public void close()
           throws java.io.IOException
Description copied from interface: CrawlSeedSource
Close resources.

Specified by:
close in interface CrawlSeedSource
Throws:
java.io.IOException

getSeedURL

public SeedURL getSeedURL(int index)
Description copied from interface: CrawlSeedSource
Get seed URL corresponding to an index. (optional operation)

Specified by:
getSeedURL in interface CrawlSeedSource
Returns:
SeedURL

getSeedURLs

public java.util.Iterator<SeedURL> getSeedURLs()
                                        throws java.io.IOException
Description copied from interface: CrawlSeedSource
Get iterator of seed URLs.

Specified by:
getSeedURLs in interface CrawlSeedSource
Returns:
Throws:
java.io.IOException

setFile

public void setFile(java.lang.String file)

setNfs

public void setNfs(org.apache.nutch.fs.NutchFileSystem nfs)