org.supermind.crawl
Class FetcherThread
java.lang.Object
java.lang.Thread
org.supermind.crawl.FetcherThread
- All Implemented Interfaces:
- java.lang.Runnable
public class FetcherThread
- extends java.lang.Thread
Thread that performs the actual fetching.
Each FetcherThread has its own FetchList
and FetchedURLs
.
URLs are assigned to FetcherThreads by host, and the same URL is
guaranteed to be always assigned to the same host.
URLs queued for fetching are not added directly to the respective fetchlists,
rather to a holding area, which is checked at regular intervals. This allows
fetchlists and fetchedurls to operate in a single-threaded model.
Nested classes/interfaces inherited from class java.lang.Thread |
java.lang.Thread.State, java.lang.Thread.UncaughtExceptionHandler |
Fields inherited from class java.lang.Thread |
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY |
Methods inherited from class java.lang.Thread |
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getAllStackTraces, getContextClassLoader, getDefaultUncaughtExceptionHandler, getId, getName, getPriority, getStackTrace, getState, getThreadGroup, getUncaughtExceptionHandler, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setDefaultUncaughtExceptionHandler, setName, setPriority, setUncaughtExceptionHandler, sleep, sleep, start, stop, stop, suspend, toString, yield |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
continueWaiting
protected boolean continueWaiting
fetchedUrls
protected FetchedURLs fetchedUrls
fetcher
protected Fetcher fetcher
fetchList
protected FetchList fetchList
fetchListScope
protected FetchListScope fetchListScope
flScopeIn
protected final FetchListScope.Input flScopeIn
linkQueue
protected java.util.LinkedHashMap<java.net.URL,ScheduledURL> linkQueue
linkQueueBatchSize
protected int linkQueueBatchSize
LOG
protected final java.util.logging.Logger LOG
parseScope
protected ParseScope parseScope
postFetchInput
protected final PostFetchScope.Input postFetchInput
postFetchScope
protected PostFetchScope postFetchScope
waiting
protected boolean waiting
FetcherThread
public FetcherThread(Fetcher fetcher)
addOutlinksToFetchlist
protected void addOutlinksToFetchlist(ScheduledURL parent,
org.apache.nutch.parse.Parse parse)
- Add outlinks to fetchlist.
- Parameters:
parent
- parse
-
handleFetch
protected org.apache.nutch.parse.ParseStatus handleFetch(ScheduledURL scheduledURL,
org.apache.nutch.protocol.ProtocolOutput output)
throws java.io.IOException
- Throws:
java.io.IOException
run
public void run()
- Specified by:
run
in interface java.lang.Runnable
- Overrides:
run
in class java.lang.Thread
setFetchedURLs
public void setFetchedURLs(FetchedURLs fetchedUrls)
setFetcher
public void setFetcher(Fetcher fetcher)
setFetchList
public void setFetchList(FetchList fetchList)
setFetchListScope
public void setFetchListScope(FetchListScope fetchListScope)
setParseScope
public void setParseScope(ParseScope parseScope)
setPostFetchScope
public void setPostFetchScope(PostFetchScope postFetchScope)