org.supermind.crawl.util
Class MapFilePersister<K,V>

java.lang.Object
  extended by org.supermind.crawl.util.MapFilePersister<K,V>
Direct Known Subclasses:
LongLongPersister, LongPersister, MD5Persister

public abstract class MapFilePersister<K,V>
extends java.lang.Object

Helper class to simplify interaction with a MapFile. Because a MapFile is optimal in batch-update scenarios, an in-memory buffer is used to perform batch updates. This buffer also improves performance when read/updates have some locality.


Field Summary
(package private)  java.lang.String baseDir
           
protected  java.util.TreeMap<K,V> buffer
          Buffer.
(package private)  int bufferIdx
          Current buffer position.
(package private) static java.util.logging.Logger LOG
           
(package private)  java.lang.String mapdir
           
(package private)  java.io.File mapdirFile
           
(package private)  int maxBufferSize
          Buffer size.
(package private)  org.apache.nutch.fs.NutchFileSystem nfs
           
(package private)  boolean overwrite
          Should existing files/directories be overwritten?
(package private)  org.apache.nutch.io.MapFile.Reader reader
           
 
Constructor Summary
MapFilePersister()
           
 
Method Summary
 void add(K k, V v)
          Add a key/value pair.
 void close()
          Close resources.
 void flushToDisk()
          Flush the buffer to disk.
protected abstract  org.apache.nutch.io.WritableComparator getKeyComparator()
          Get comparator for MapFile key class.
protected abstract  org.apache.nutch.io.WritableComparable getKeyInstance(K k, boolean throwaway)
          Return a new instance of the key.
protected abstract  java.util.Comparator<K> getTypeComparator()
          Get comparator for type.
protected  org.apache.nutch.io.Writable getValueInstance(V v, boolean throwaway)
          Return a new instance of the value.
 void init()
          Initialize resources.
 void setBaseDir(java.lang.String dir)
          Set location of directory where the MapFile will be created.
 void setMaxBufferSize(int maxBufferSize)
           
 void setNfs(org.apache.nutch.fs.NutchFileSystem nfs)
          Set NutchFileSystem.
 void setOverwrite(boolean overwrite)
          Setter whether existing files/directories should be overwritten.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

baseDir

java.lang.String baseDir

buffer

protected java.util.TreeMap<K,V> buffer
Buffer.


bufferIdx

int bufferIdx
Current buffer position.


LOG

static java.util.logging.Logger LOG

mapdir

java.lang.String mapdir

mapdirFile

java.io.File mapdirFile

maxBufferSize

int maxBufferSize
Buffer size.


nfs

org.apache.nutch.fs.NutchFileSystem nfs

overwrite

boolean overwrite
Should existing files/directories be overwritten?


reader

org.apache.nutch.io.MapFile.Reader reader
Constructor Detail

MapFilePersister

public MapFilePersister()
Method Detail

add

public void add(K k,
                V v)
         throws java.io.IOException
Add a key/value pair.

Parameters:
k -
v -
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Close resources.

Throws:
java.io.IOException

flushToDisk

public void flushToDisk()
                 throws java.io.IOException
Flush the buffer to disk. Buffer entries are appended to the tmpfile, which is then sorted. A MapFile is created from the sorted file, so that fast access to the entries is possible.

Throws:
java.io.IOException

getKeyComparator

protected abstract org.apache.nutch.io.WritableComparator getKeyComparator()
Get comparator for MapFile key class.

Returns:

getKeyInstance

protected abstract org.apache.nutch.io.WritableComparable getKeyInstance(K k,
                                                                         boolean throwaway)
Return a new instance of the key.

Returns:

getTypeComparator

protected abstract java.util.Comparator<K> getTypeComparator()
Get comparator for type.

Returns:

getValueInstance

protected org.apache.nutch.io.Writable getValueInstance(V v,
                                                        boolean throwaway)
Return a new instance of the value. Defaults to NullWritable.get().

Returns:

init

public void init()
          throws java.io.IOException
Initialize resources.

Throws:
java.io.IOException

setBaseDir

public void setBaseDir(java.lang.String dir)
Set location of directory where the MapFile will be created.

Parameters:
dir -

setMaxBufferSize

public void setMaxBufferSize(int maxBufferSize)

setNfs

public void setNfs(org.apache.nutch.fs.NutchFileSystem nfs)
Set NutchFileSystem.

Parameters:
nfs -

setOverwrite

public void setOverwrite(boolean overwrite)
Setter whether existing files/directories should be overwritten.

Parameters:
overwrite -