|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||
java.lang.Object | +--edu.unika.aifb.rdf.crawler.URIList
The class URIList is the only class in the package intended to be called from outside the "uriproc" package - when initializing the URIList. (In our implementation the caller is Console class). In turn, it calls the Channel_Pool constructor, when it's time to start processing those URIs by a download thread pool. URIList uses URLStruct to store individual URIs with associated depths, processing status codes and possible error messages. In this class we can define policies - which URIs to follow.
| Field Summary | |
HostFilter |
filter
|
| Constructor Summary | |
URIList()
Initialize an empty URIList |
|
| Method Summary | |
void |
addURI(java.lang.String uri,
java.lang.String p_uri,
boolean decrement)
This method is used to add to the URIList if depth is not known in advance |
boolean |
addURI(java.lang.String uri,
java.lang.String p_uri,
int depth)
Add a single URI with the given crawling depth and given parent (this public method is called from Console in a sinchronized manner, to add new URIs discovered by all the crawling threads/channels). |
boolean |
allNonWhite()
Are all the URIs in the list non WHITE, i.e. currently all threads must wait for a new job to appear? |
boolean |
allRedOrBlack()
Are all the URIs in the list either RED or BLACK? |
void |
assert(java.lang.String message)
|
void |
checkInBlack(java.lang.String uri)
Crawling to this "uri" was successfully finished It was added to RDF model, and its descendants (if any) were added to URIList |
void |
checkInRed(java.lang.String uri,
java.lang.Exception e)
Crawling found problem with the "uri", mark it and insert it back into the map |
java.lang.String |
checkOutWhite()
Find a white URI in the list, return its string to the processing Channel instance and paint it gray |
static java.lang.String |
cutRef(java.lang.String url)
This function cuts away a reference part from a URL to avoid duplication of URLs when crawling, in case if they differ only in their reference part. |
java.lang.String |
getDescriptions(java.lang.String parent)
Get back a nice representation of those URLs which are being crawled into from the given URL parent (parent=null for the top-level URLs) |
java.lang.String |
getParent(java.lang.String uri)
|
static void |
main(java.lang.String[] args)
For debugging - create and print a list of URIs. |
void |
printColors()
|
void |
printMap()
Print all the associations in the map (for debugging) |
void |
setFilter(java.util.Vector hosts)
Set host filter for this URIList |
java.lang.String |
toString()
Get back a nice RDF representation of what URLs are placed in the list for crawling. |
| Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
wait,
wait,
wait |
| Field Detail |
public HostFilter filter
| Constructor Detail |
public URIList()
| Method Detail |
public boolean addURI(java.lang.String uri,
java.lang.String p_uri,
int depth)
public void addURI(java.lang.String uri,
java.lang.String p_uri,
boolean decrement)
public void setFilter(java.util.Vector hosts)
public boolean allRedOrBlack()
public boolean allNonWhite()
public java.lang.String toString()
public java.lang.String getDescriptions(java.lang.String parent)
public void printMap()
public static void main(java.lang.String[] args)
public java.lang.String checkOutWhite()
public void checkInRed(java.lang.String uri,
java.lang.Exception e)
public void checkInBlack(java.lang.String uri)
public java.lang.String getParent(java.lang.String uri)
public static java.lang.String cutRef(java.lang.String url)
public void printColors()
public void assert(java.lang.String message)
|
||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||
| SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||