java.lang.Object org.supermind.crawl.DefaultFetchList
public class DefaultFetchList
Fetchlist that assigns higher priority to hosts with many pending pages.
Field Summary | |
---|---|
protected LastModifiedDB |
lastModifiedDb
LastModifiedDB . |
protected static java.util.logging.Logger |
LOG
|
Constructor Summary | |
---|---|
DefaultFetchList()
|
Method Summary | |
---|---|
void |
close()
Release resources. |
boolean |
contains(java.net.URL url)
Does the fetchlist contain this url? |
int |
getCurrentSize()
Total number of URLs this fetchlist currently contains. |
protected long |
getNextAvailable(long timeTaken)
When a HostQueue is next available. |
void |
init()
Initialize resources. |
HostQueue |
next()
Get next HostQueue . |
void |
queue(ScheduledURL parent,
java.net.URL url)
Add a ScheduledURL to the fetchlist. |
void |
release(HostQueue hostQueue,
int popped,
long timeTaken)
Release HostQueue from use. |
void |
setLastModifiedDb(LastModifiedDB lastModifiedDb)
|
void |
setWaitFactor(int waitFactor)
The time taken to download a chunk of pages from a host is multiplied by this waitFactor to determine how soon a host can be accessed again. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected LastModifiedDB lastModifiedDb
LastModifiedDB
.
protected static java.util.logging.Logger LOG
Constructor Detail |
---|
public DefaultFetchList()
Method Detail |
---|
public void close()
FetchList
close
in interface FetchList
public boolean contains(java.net.URL url)
FetchList
contains
in interface FetchList
public int getCurrentSize()
FetchList
FetchList.release(org.supermind.crawl.HostQueue, int, long)
d.
getCurrentSize
in interface FetchList
protected long getNextAvailable(long timeTaken)
timeTaken
- time taken to download a page
public void init()
FetchList
init
in interface FetchList
public HostQueue next()
FetchList
HostQueue
.
next
in interface FetchList
public void queue(ScheduledURL parent, java.net.URL url)
FetchList
ScheduledURL
to the fetchlist. Multiple threads
can be calling this method, and implementing classes must
synchronize access accordingly.
queue
in interface FetchList
parent
- originating urlurl
- url to queuepublic void release(HostQueue hostQueue, int popped, long timeTaken)
FetchList
HostQueue#pop()
completes.
release
in interface FetchList
popped
- number of URLs popped from the queuetimeTaken
- total time taken to download the popped urlspublic void setLastModifiedDb(LastModifiedDB lastModifiedDb)
public void setWaitFactor(int waitFactor)
waitFactor
-