Interface Summary | |
---|---|
ScopeFilter<T> | Determines the scope of an operation. |
Class Summary | |
---|---|
AbstractScope<T> | Limits the scope of an operation through ScopeFilter s. |
FetchListScope | Scope to determine what URLs are added to a
FetchList . |
FetchListScope.Input | |
MapFileContentSeenFilter | Writes MD5s to a MapFile for easy comparison. |
NutchUrlFLFilter | Filters URLs using Nutch's URLFilters. |
OneExternalLinkFLFilter | Allows a URL if its parent has the same host as its seed. |
ParentPrefixPathFLFilter | Allows a URL if it has the same path or host as its parent (originating page), . |
ParseScope | Scope to determine what which fetched URLs are parsed. |
PostFetchScope | Scope to determine which fetched URLs are processed by
PostFetchProcessor s. |
PostFetchScope.Input | |
SameParentHostFLFilter | Allows a URL if it has the same host as its parent (originating page). |
SameParentPathFLFilter | Allows a URL if it has the same path as its parent (originating page). |
SameParentTLDFLFilter | Allows a URL if it has the same TLD (top-level domain) as its parent (originating page). |
SizeConstrainedFLFilter | Limits a crawl to a fixed number of pages. |
WebDBContentSeenFilter | Uses Nutch's WebDB and a page's Md5 hash to determine if a page has been seen. |