edu.unika.aifb.rdf.crawler
Class URLStruct

java.lang.Object
  |
  +--edu.unika.aifb.rdf.crawler.URLStruct

public class URLStruct
extends java.lang.Object

This class represents a data structure to store a single URL with full status information - crawling depth, referrer=parent URL, processing status (see below) and exceptions encountered while crawling to the given URI.


Field Summary
static int BLACK
           
static int GRAY
           
static int RED
           
static int WHITE
          status codes: WHITE - discovered node not being processed by a thread, GRAY - discovered node, currently being processed, BLACK - discovered node, already processed, its descendants are inserted into list, RED - some exception detected while crawling.
 
Constructor Summary
URLStruct(java.lang.String uri, java.lang.String p_uri, int depth)
          Constructor to make URL records with all the crawling information.
 
Method Summary
 void check()
          Used for url, p_url initialization and exception throwing
 java.lang.String closeString()
           
 int getDepth()
          Return the depth of this URI
 java.lang.String getExtension()
           
 java.lang.String getHost()
          Return host from the uri
 java.lang.String getParentURL()
          Return normalized parent URI
 int getStatus()
          Return status as a number
 java.lang.String getURL()
          Return (typically normalized) URI
 java.lang.String openString()
          Writeout of this data structure in XML format
 java.lang.String printStatus()
          Print status code mnemonic
 void setException(java.lang.Exception e)
           
 void setStatus(int status)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

WHITE

public static final int WHITE
status codes: WHITE - discovered node not being processed by a thread, GRAY - discovered node, currently being processed, BLACK - discovered node, already processed, its descendants are inserted into list, RED - some exception detected while crawling. Allowed status transitions: WHITE->GRAY, WHITE->RED, GRAY->RED, GRAY->BLACK

GRAY

public static final int GRAY

BLACK

public static final int BLACK

RED

public static final int RED
Constructor Detail

URLStruct

public URLStruct(java.lang.String uri,
                 java.lang.String p_uri,
                 int depth)
Constructor to make URL records with all the crawling information.
Method Detail

printStatus

public java.lang.String printStatus()
Print status code mnemonic

check

public void check()
           throws java.net.MalformedURLException
Used for url, p_url initialization and exception throwing

openString

public java.lang.String openString()
Writeout of this data structure in XML format

closeString

public java.lang.String closeString()

getDepth

public int getDepth()
Return the depth of this URI

getURL

public java.lang.String getURL()
Return (typically normalized) URI

getParentURL

public java.lang.String getParentURL()
Return normalized parent URI

getStatus

public int getStatus()
Return status as a number

getHost

public java.lang.String getHost()
Return host from the uri

setStatus

public void setStatus(int status)

setException

public void setException(java.lang.Exception e)

getExtension

public java.lang.String getExtension()
                              throws java.lang.Exception