org.supermind.crawl
Class MapFilePersister<K,V>

java.lang.Object
  extended by org.supermind.crawl.util.MapFilePersister<K,V>
Direct Known Subclasses:
LongIntPersister, LongLongPersister, LongPersister, MD5Persister

public abstract class MapFilePersister<K,V>
extends java.lang.Object

Helper class to simplify interaction with a MapFile. Because a MapFile is optimal in batch-update scenarios, an in-memory buffer is used to perform batch updates. This buffer also improves performance when read/updates have some locality.


Field Summary
protected  java.util.TreeMap<K,V> buffer
          Buffer.
protected static int maxBufferSize
          Buffer size.
protected  org.apache.nutch.io.SequenceFile.Sorter sorter
           
protected  org.apache.nutch.io.SequenceFile.Writer tmpWriter
           
 
Constructor Summary
MapFilePersister()
           
 
Method Summary
 void add(K k, V v)
          Add a key/value pair.
 void close()
          Close resources.
 void flushToDisk()
          Flush the buffer to disk.
protected abstract  org.apache.nutch.io.WritableComparator getKeyComparator()
          Get comparator for MapFile key class.
protected abstract  org.apache.nutch.io.WritableComparable getKeyInstance()
          Return a new instance of the key.
protected abstract  java.lang.Class<? extends org.apache.nutch.io.WritableComparable> getMapFileKeyClass()
          Get key class.
protected  java.lang.Class<? extends org.apache.nutch.io.Writable> getMapFileValueClass()
          Get value class.
protected abstract  java.util.Comparator<K> getTypeComparator()
          Get comparator for type.
protected  org.apache.nutch.io.Writable getValueInstance()
          Return a new instance of the value.
 void init()
          Initialize resources.
protected  void initTmpWriter()
          Initialize tmpWriter.
 void setMapdir(java.lang.String mapdir)
          Set location of directory where the MapFile will be created.
 void setNfs(org.apache.nutch.fs.NutchFileSystem nfs)
          Set NutchFileSystem.
 void setOverwrite(boolean overwrite)
          Setter whether existing files/directories should be overwritten.
 void setTmpfile(java.lang.String tmpfile)
          Set the location of temp file.
protected abstract  void writeBufferToTmp()
          Write contents of buffer to tmpfile.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

buffer

protected java.util.TreeMap<K,V> buffer
Buffer.


maxBufferSize

protected static int maxBufferSize
Buffer size.


sorter

protected org.apache.nutch.io.SequenceFile.Sorter sorter

tmpWriter

protected org.apache.nutch.io.SequenceFile.Writer tmpWriter
Constructor Detail

MapFilePersister

public MapFilePersister()
Method Detail

add

public void add(K k,
                V v)
         throws java.io.IOException
Add a key/value pair.

Parameters:
k -
v -
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
Close resources.

Throws:
java.io.IOException

flushToDisk

public void flushToDisk()
                 throws java.io.IOException
Flush the buffer to disk. Buffer entries are appended to the tmpfile, which is then sorted. A MapFile is created from the sorted file, so that fast access to the entries is possible.

Throws:
java.io.IOException

getKeyComparator

protected abstract org.apache.nutch.io.WritableComparator getKeyComparator()
Get comparator for MapFile key class.

Returns:

getKeyInstance

protected abstract org.apache.nutch.io.WritableComparable getKeyInstance()
Return a new instance of the key.

Returns:

getMapFileKeyClass

protected abstract java.lang.Class<? extends org.apache.nutch.io.WritableComparable> getMapFileKeyClass()
Get key class.

Returns:

getMapFileValueClass

protected java.lang.Class<? extends org.apache.nutch.io.Writable> getMapFileValueClass()
Get value class. Defaults to NullWritable.

Returns:

getTypeComparator

protected abstract java.util.Comparator<K> getTypeComparator()
Get comparator for type.

Returns:

getValueInstance

protected org.apache.nutch.io.Writable getValueInstance()
Return a new instance of the value. Defaults to NullWritable.get().

Returns:

init

public void init()
          throws java.io.IOException
Initialize resources.

Throws:
java.io.IOException

initTmpWriter

protected void initTmpWriter()
                      throws java.io.IOException
Initialize tmpWriter.

Throws:
java.io.IOException

setMapdir

public void setMapdir(java.lang.String mapdir)
Set location of directory where the MapFile will be created.

Parameters:
mapdir -

setNfs

public void setNfs(org.apache.nutch.fs.NutchFileSystem nfs)
Set NutchFileSystem.

Parameters:
nfs -

setOverwrite

public void setOverwrite(boolean overwrite)
Setter whether existing files/directories should be overwritten.

Parameters:
overwrite -

setTmpfile

public void setTmpfile(java.lang.String tmpfile)
Set the location of temp file.

Parameters:
tmpfile -

writeBufferToTmp

protected abstract void writeBufferToTmp()
                                  throws java.io.IOException
Write contents of buffer to tmpfile. Subclasses should use tmpWriter to do this.

Throws:
java.io.IOException