A B C D E F G H I L M N O P R S T U W

A

addURI(String, String, boolean) - Method in class edu.unika.aifb.rdf.crawler.URIList
This method is used to add to the URIList if depth is not known in advance
addURI(String, String, int) - Method in class edu.unika.aifb.rdf.crawler.URIList
Add a single URI with the given crawling depth and given parent (this public method is called from Console in a sinchronized manner, to add new URIs discovered by all the crawling threads/channels).
allNonWhite() - Method in class edu.unika.aifb.rdf.crawler.URIList
Are all the URIs in the list non WHITE, i.e. currently all threads must wait for a new job to appear?
allRedOrBlack() - Method in class edu.unika.aifb.rdf.crawler.URIList
Are all the URIs in the list either RED or BLACK?
analyze() - Method in class edu.unika.aifb.rdf.crawler.RDFInstance
 
analyzeHTML() - Method in class edu.unika.aifb.rdf.crawler.DocInstance
Analyze the HTML text and find out the outgoing URLs
analyzeRDF() - Method in class edu.unika.aifb.rdf.crawler.DocInstance
Analyze the RDF text and find out uris, namespaces and rdflists
assert(String) - Method in class edu.unika.aifb.rdf.crawler.URIList
 

B

bigvect - Static variable in class edu.unika.aifb.rdf.crawler.ChannelPool
 
BLACK - Static variable in class edu.unika.aifb.rdf.crawler.URLStruct
 

C

cache - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
Cache of mappings: URL-filepaths.
cache - Variable in class edu.unika.aifb.rdf.crawler.ChannelPool
 
Cache - class edu.unika.aifb.rdf.crawler.Cache.
This is a top-level class responsible for the mapping of URIs to filepaths, streams or symbolic strings.
Cache() - Constructor for class edu.unika.aifb.rdf.crawler.Cache
Make either an empty cache or a cache with 5 default items
CachePath - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
CachePath - absolute path where to store the cache map
capacity - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
How many threads in the ThreadPool Feel free to change this for optimum performance
capacity - Static variable in class edu.unika.aifb.rdf.crawler.ChannelPool
 
Channel - class edu.unika.aifb.rdf.crawler.Channel.
An individual channel which waits for the work to be done with web page retrieving, tries out Cache, NetRetrieve and finally passes filepath to DocProcessor and gets back 1) a piece of RDF model 2) all the URIs which have to be tested/crawled as they appear in the given URI 3) exceptions (if any)
Channel(ChannelPool, int) - Constructor for class edu.unika.aifb.rdf.crawler.Channel
 
ChannelPool - class edu.unika.aifb.rdf.crawler.ChannelPool.
This class gets URIs one by one and decides when to start new threads.
ChannelPool(URIList, Cache, int, int) - Constructor for class edu.unika.aifb.rdf.crawler.ChannelPool
This constructor initializes ChannelPool - normally there is just one instance of it per program.
check() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Used for url, p_url initialization and exception throwing
check(String) - Static method in class edu.unika.aifb.rdf.crawler.RobotCheck
So far we ignore ROBOTS.TXT files
checkFilter(String) - Method in class edu.unika.aifb.rdf.crawler.HostFilter
Check against the hostname filter
checkInBlack(String) - Method in class edu.unika.aifb.rdf.crawler.URIList
Crawling to this "uri" was successfully finished It was added to RDF model, and its descendants (if any) were added to URIList
checkInRed(String, Exception) - Method in class edu.unika.aifb.rdf.crawler.URIList
Crawling found problem with the "uri", mark it and insert it back into the map
checkOutWhite() - Method in class edu.unika.aifb.rdf.crawler.URIList
Find a white URI in the list, return its string to the processing Channel instance and paint it gray
closeString() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
 
CrawlConsole - class edu.unika.aifb.rdf.crawler.CrawlConsole.
CrawlConsole is intended as the only public class to be used by every application which needs to embed RDF Crawler functionality.
CrawlConsole(Vector, Vector, int, int) - Constructor for class edu.unika.aifb.rdf.crawler.CrawlConsole
Initialize the crawler parameters uris String Vector of initial URIs to crawl to hostfilter String Vector of hosts we want to crawl (null, if we crawl everywhere) depth how deep we want to crawl (0, if we want just the given URIs) time how many seconds we wait until we break connections to nonresponding hosts
cutRef(String) - Static method in class edu.unika.aifb.rdf.crawler.URIList
This function cuts away a reference part from a URL to avoid duplication of URLs when crawling, in case if they differ only in their reference part.

D

DocInstance - class edu.unika.aifb.rdf.crawler.DocInstance.
DocInstance - call different document processing routines.
DocInstance(Cache, String, String) - Constructor for class edu.unika.aifb.rdf.crawler.DocInstance
Constructor: initialize data structures, but dont assign current and parent URI, as they cause exceptions
download(String, String) - Static method in class edu.unika.aifb.rdf.crawler.NetRetrieve
 
dumpModel() - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Get the crawling results as a string

E

edu.unika.aifb.rdf.crawler - package edu.unika.aifb.rdf.crawler
 
encode(String) - Static method in class edu.unika.aifb.rdf.crawler.DocInstance
This method is adapted from org.gjt.vinny.html.HTMLEncoder, it is a utility method for converting a string into a format suitable for placing inside HTML, so that special symbols: <,>,&," and ' are properly escaped.

F

filter - Variable in class edu.unika.aifb.rdf.crawler.URIList
 
FilterException - exception edu.unika.aifb.rdf.crawler.FilterException.
 
finishedthreads - Static variable in class edu.unika.aifb.rdf.crawler.ChannelPool
 

G

getDepth() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Return the depth of this URI
getDescriptions(String) - Method in class edu.unika.aifb.rdf.crawler.URIList
Get back a nice representation of those URLs which are being crawled into from the given URL parent (parent=null for the top-level URLs)
getEndSignal() - Static method in class edu.unika.aifb.rdf.crawler.ChannelPool
 
getExtension() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
 
getHost() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Return host from the uri
getNs() - Method in class edu.unika.aifb.rdf.crawler.DocInstance
 
getNs() - Method in class edu.unika.aifb.rdf.crawler.HTMLInstance
 
getNs() - Method in class edu.unika.aifb.rdf.crawler.RDFInstance
 
getParent(String) - Method in class edu.unika.aifb.rdf.crawler.URIList
 
getParentURL() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Return normalized parent URI
getRdf() - Method in class edu.unika.aifb.rdf.crawler.DocInstance
 
getRdf() - Method in class edu.unika.aifb.rdf.crawler.HTMLInstance
 
getRdf() - Method in class edu.unika.aifb.rdf.crawler.RDFInstance
 
getStatus() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Return status as a number
getUri() - Method in class edu.unika.aifb.rdf.crawler.DocInstance
 
getUri() - Method in class edu.unika.aifb.rdf.crawler.HTMLInstance
 
getUri() - Method in class edu.unika.aifb.rdf.crawler.RDFInstance
Could not make RE.substituteAll(Object input,String replace) to work.
getURL() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Return (typically normalized) URI
GRAY - Static variable in class edu.unika.aifb.rdf.crawler.URLStruct
 

H

HostFilter - class edu.unika.aifb.rdf.crawler.HostFilter.
The class HostFilter checks whether the URL string belongs to the given set of hosts.
HostFilter(Vector) - Constructor for class edu.unika.aifb.rdf.crawler.HostFilter
Initializes the set of eligible hosts
HTMLInstance - class edu.unika.aifb.rdf.crawler.HTMLInstance.
HTMLInstance - process the metainfo extracted from the HTML document.
HTMLInstance(String, StringBuffer) - Constructor for class edu.unika.aifb.rdf.crawler.HTMLInstance
Initialize from StringBuffer

I

insert(String, String) - Method in class edu.unika.aifb.rdf.crawler.Cache
Insert an URI-filename pair in the cache

L

LogPath - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
LogPath - absolute path where to store the LOG file of the crawling process
lookup(String) - Method in class edu.unika.aifb.rdf.crawler.Cache
Look up an URI in the cache

M

main(String[]) - Static method in class edu.unika.aifb.rdf.crawler.URIList
For debugging - create and print a list of URIs.
main(String[]) - Static method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Used to call CrawlConsole from DOS command line.
main(String[]) - Static method in class edu.unika.aifb.rdf.crawler.DocInstance
for debugging
main(String[]) - Static method in class edu.unika.aifb.rdf.crawler.NetRetrieve
 
main(String[]) - Static method in class edu.unika.aifb.rdf.crawler.HTMLInstance
For debugging.
main(String[]) - Static method in class edu.unika.aifb.rdf.crawler.Cache
Test whether the data structure used for cache functions properly
model - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
RDF model - we are building it from small pieces
ModelPath - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
ModelPath - absolute path where to store the model of all the RDF facts

N

NetRetrieve - class edu.unika.aifb.rdf.crawler.NetRetrieve.
NetRetrieve - fetch URLs and write them to files
NetRetrieve() - Constructor for class edu.unika.aifb.rdf.crawler.NetRetrieve
 

O

openString() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Writeout of this data structure in XML format

P

pool - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
Thread pool - branches off 10 different threads
printColors() - Method in class edu.unika.aifb.rdf.crawler.URIList
 
printMap() - Method in class edu.unika.aifb.rdf.crawler.URIList
Print all the associations in the map (for debugging)
printStatus() - Method in class edu.unika.aifb.rdf.crawler.URLStruct
Print status code mnemonic

R

RDFInstance - class edu.unika.aifb.rdf.crawler.RDFInstance.
HTMLInstance - process the metainfo extracted from the HTML document.
RDFInstance(String, StringBuffer) - Constructor for class edu.unika.aifb.rdf.crawler.RDFInstance
Initialize from StringBuffer
readAsString(String, String) - Method in class edu.unika.aifb.rdf.crawler.Cache
This utility function reads a file contents in a String buffer and returns it
RED - Static variable in class edu.unika.aifb.rdf.crawler.URLStruct
 
ResourceClass - Variable in class edu.unika.aifb.rdf.crawler.RDFInstance
Used to distinguish Resource from Literal in RDF triples
RobotCheck - class edu.unika.aifb.rdf.crawler.RobotCheck.
Finds out the host's robot policy
RobotCheck() - Constructor for class edu.unika.aifb.rdf.crawler.RobotCheck
 
run() - Method in class edu.unika.aifb.rdf.crawler.Channel
 

S

saveModel(String) - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Save the crawling results to a file RDFUtil.saveModel(...) does not work.
setCachePath(String) - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Indicate the file where you want to store the cache
setException(Exception) - Method in class edu.unika.aifb.rdf.crawler.URLStruct
 
setFilter(Vector) - Method in class edu.unika.aifb.rdf.crawler.URIList
Set host filter for this URIList
setLocalNamespace(String, String) - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Set a mapping of "url" - some RDF Namespace given by a Web address to a local file "path".
setLogPath(String) - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Indicate the file where you want to store the LOG file
setModelPath(String) - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Indicate the file where you want to store the RDF model
setStatus(int) - Method in class edu.unika.aifb.rdf.crawler.URLStruct
 
start() - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Start Crawling.

T

time - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
How many seconds to crawl.
toString() - Method in class edu.unika.aifb.rdf.crawler.URIList
Get back a nice RDF representation of what URLs are placed in the list for crawling.
toString() - Method in class edu.unika.aifb.rdf.crawler.HostFilter
 
toString() - Method in class edu.unika.aifb.rdf.crawler.Cache
Prints a table of cache mappings
trace(int) - Method in class edu.unika.aifb.rdf.crawler.Channel
Tracing function

U

urilist - Variable in class edu.unika.aifb.rdf.crawler.CrawlConsole
"TODO-list" - all the URLs we have to crawl.
urilist - Static variable in class edu.unika.aifb.rdf.crawler.ChannelPool
 
URIList - class edu.unika.aifb.rdf.crawler.URIList.
The class URIList is the only class in the package intended to be called from outside the "uriproc" package - when initializing the URIList.
URIList() - Constructor for class edu.unika.aifb.rdf.crawler.URIList
Initialize an empty URIList
URLStruct - class edu.unika.aifb.rdf.crawler.URLStruct.
This class represents a data structure to store a single URL with full status information - crawling depth, referrer=parent URL, processing status (see below) and exceptions encountered while crawling to the given URI.
URLStruct(String, String, int) - Constructor for class edu.unika.aifb.rdf.crawler.URLStruct
Constructor to make URL records with all the crawling information.

W

WHITE - Static variable in class edu.unika.aifb.rdf.crawler.URLStruct
status codes: WHITE - discovered node not being processed by a thread, GRAY - discovered node, currently being processed, BLACK - discovered node, already processed, its descendants are inserted into list, RED - some exception detected while crawling.
writeResults() - Method in class edu.unika.aifb.rdf.crawler.CrawlConsole
Write out the results

A B C D E F G H I L M N O P R S T U W