org.supermind.crawl
Class FileCrawlSeedSource

java.lang.Object
  extended by org.supermind.crawl.FileCrawlSeedSource
All Implemented Interfaces:
CrawlSeedSource

public class FileCrawlSeedSource
extends java.lang.Object
implements CrawlSeedSource

Seed a crawl from a file. This implementation loads all seeds into memory, so for obvious reasons, is inappropriate if the file is too large.


Field Summary
protected  java.io.BufferedReader reader
           
protected  java.util.ArrayList<SeedURL> seeds
           
 
Constructor Summary
FileCrawlSeedSource(java.lang.String file)
           
 
Method Summary
 void close()
          Close resources.
 SeedURL getSeedURL(int index)
          Get seed URL corresponding to an index.
 java.util.Iterator<SeedURL> getSeedURLs()
          Get iterator of seed URLs.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

reader

protected java.io.BufferedReader reader

seeds

protected java.util.ArrayList<SeedURL> seeds
Constructor Detail

FileCrawlSeedSource

public FileCrawlSeedSource(java.lang.String file)
                    throws java.io.IOException
Throws:
java.io.IOException
Method Detail

close

public void close()
           throws java.io.IOException
Description copied from interface: CrawlSeedSource
Close resources.

Specified by:
close in interface CrawlSeedSource
Throws:
java.io.IOException

getSeedURL

public SeedURL getSeedURL(int index)
Description copied from interface: CrawlSeedSource
Get seed URL corresponding to an index. (optional operation)

Specified by:
getSeedURL in interface CrawlSeedSource
Returns:
SeedURL

getSeedURLs

public java.util.Iterator<SeedURL> getSeedURLs()
                                        throws java.io.IOException
Description copied from interface: CrawlSeedSource
Get iterator of seed URLs.

Specified by:
getSeedURLs in interface CrawlSeedSource
Returns:
Throws:
java.io.IOException