org.supermind.crawl
Interface CrawlSeedSource

All Known Implementing Classes:
FileCrawlSeedSource, NutchFetchListCrawlSeedSource

public interface CrawlSeedSource

Source of URLs to seed a crawl. Possible implementations include a file-based source and database-backed source.

SeedURLs may optionally have an id. This allows FetchList filters to constrain a crawl to its seed's host/domain.


Method Summary
 void close()
          Close resources.
 SeedURL getSeedURL(int index)
          Get seed URL corresponding to an index.
 java.util.Iterator<SeedURL> getSeedURLs()
          Get iterator of seed URLs.
 

Method Detail

close

void close()
           throws java.io.IOException
Close resources.

Throws:
java.io.IOException

getSeedURL

SeedURL getSeedURL(int index)
Get seed URL corresponding to an index. (optional operation)

Parameters:
index -
Returns:
SeedURL
Throws:
java.lang.UnsupportedOperationException - if CrawlSeedSource doesn't support random access

getSeedURLs

java.util.Iterator<SeedURL> getSeedURLs()
                                        throws java.io.IOException
Get iterator of seed URLs.

Returns:
Throws:
java.io.IOException