public interface CrawlSeedSource
Source of URLs to seed a crawl. Possible implementations include a file-based source and database-backed source.
SeedURLs may optionally have an id. This allows FetchList filters to constrain a crawl to its seed's host/domain.
Method Summary | |
---|---|
void |
close()
Close resources. |
SeedURL |
getSeedURL(int index)
Get seed URL corresponding to an index. |
java.util.Iterator<SeedURL> |
getSeedURLs()
Get iterator of seed URLs. |
Method Detail |
---|
void close() throws java.io.IOException
java.io.IOException
SeedURL getSeedURL(int index)
index
-
java.lang.UnsupportedOperationException
- if CrawlSeedSource doesn't support random accessjava.util.Iterator<SeedURL> getSeedURLs() throws java.io.IOException
java.io.IOException