java.lang.Objectorg.supermind.crawl.scope.MapFileContentSeenFilter
public class MapFileContentSeenFilter
Writes MD5s to a MapFile for easy comparison. Similar in concept to how FetchedURLs works.
Field Summary |
---|
Fields inherited from interface org.supermind.crawl.scope.ScopeFilter |
---|
ABSTAIN, ALLOW, REJECT |
Fields inherited from interface org.supermind.crawl.PostFetchProcessor |
---|
NO_OP |
Constructor Summary | |
---|---|
MapFileContentSeenFilter()
|
Method Summary | |
---|---|
void |
close()
Cleanup. |
int |
filter(PostFetchScope.Input input)
Check webdb to see if the page's MD5 exists. |
void |
process(FetcherOutput fo,
org.apache.nutch.protocol.Content content,
org.apache.nutch.parse.Parse parse)
Processes a fetched page. |
void |
setPersister(MD5Persister persister)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public MapFileContentSeenFilter()
Method Detail |
---|
public void close() throws java.io.IOException
PostFetchProcessor
close
in interface PostFetchProcessor
java.io.IOException
public int filter(PostFetchScope.Input input)
filter
in interface ScopeFilter<PostFetchScope.Input>
o
-
public void process(FetcherOutput fo, org.apache.nutch.protocol.Content content, org.apache.nutch.parse.Parse parse) throws java.io.IOException
PostFetchProcessor
process
in interface PostFetchProcessor
parse
- Parse data, can be null if parse failed
java.io.IOException
public void setPersister(MD5Persister persister) throws java.io.IOException
java.io.IOException