org.supermind.crawl.scope
Class MapFileContentSeenFilter

java.lang.Object
  extended by org.supermind.crawl.scope.MapFileContentSeenFilter
All Implemented Interfaces:
PostFetchProcessor, ScopeFilter<PostFetchScope.Input>

public class MapFileContentSeenFilter
extends java.lang.Object
implements ScopeFilter<PostFetchScope.Input>, PostFetchProcessor

Writes MD5s to a MapFile for easy comparison. Similar in concept to how FetchedURLs works.


Field Summary
 
Fields inherited from interface org.supermind.crawl.scope.ScopeFilter
ABSTAIN, ALLOW, REJECT
 
Fields inherited from interface org.supermind.crawl.PostFetchProcessor
NO_OP
 
Constructor Summary
MapFileContentSeenFilter()
           
 
Method Summary
 void close()
          Cleanup.
 int filter(PostFetchScope.Input input)
          Check webdb to see if the page's MD5 exists.
 void process(FetcherOutput fo, org.apache.nutch.protocol.Content content, org.apache.nutch.parse.Parse parse)
          Processes a fetched page.
 void setPersister(MD5Persister persister)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

MapFileContentSeenFilter

public MapFileContentSeenFilter()
Method Detail

close

public void close()
           throws java.io.IOException
Description copied from interface: PostFetchProcessor
Cleanup.

Specified by:
close in interface PostFetchProcessor
Throws:
java.io.IOException

filter

public int filter(PostFetchScope.Input input)
Check webdb to see if the page's MD5 exists.

Specified by:
filter in interface ScopeFilter<PostFetchScope.Input>
Parameters:
o -
Returns:

process

public void process(FetcherOutput fo,
                    org.apache.nutch.protocol.Content content,
                    org.apache.nutch.parse.Parse parse)
             throws java.io.IOException
Description copied from interface: PostFetchProcessor
Processes a fetched page.

Specified by:
process in interface PostFetchProcessor
parse - Parse data, can be null if parse failed
Throws:
java.io.IOException

setPersister

public void setPersister(MD5Persister persister)
                  throws java.io.IOException
Throws:
java.io.IOException