CrawlSeedSource

org.supermind.crawl
Interface CrawlSeedSource

All Known Implementing Classes:: FileCrawlSeedSource, NutchFetchListCrawlSeedSource

Source of URLs to seed a crawl. Possible implementations include a file-based source and database-backed source.

SeedURLs may optionally have an id. This allows FetchList filters to constrain a crawl to its seed's host/domain.

Method Summary
`void`	`close()` Close resources.
`SeedURL`	`getSeedURL(int index)` Get seed URL corresponding to an index.
`java.util.Iterator<SeedURL>`	`getSeedURLs()` Get iterator of seed URLs.

Method Detail

void close()
           throws java.io.IOException

Close resources.

SeedURL getSeedURL(int index)

Get seed URL corresponding to an index. (optional operation)

Parameters:: index -
Returns:: SeedURL
Throws:: java.lang.UnsupportedOperationException - if CrawlSeedSource doesn't support random access

java.util.Iterator<SeedURL> getSeedURLs()
                                        throws java.io.IOException

Get iterator of seed URLs.