edu.unika.aifb.rdf.crawler
Class DocInstance

java.lang.Object
  |
  +--edu.unika.aifb.rdf.crawler.DocInstance

public class DocInstance
extends java.lang.Object

DocInstance - call different document processing routines. They are grouped in two classes - RDFInstance and HTMLInstance. The goal or these manipulations is to obtain three things:

  1. build RDF models from one or more fragments of RDF found within the document
  2. extract the list of namespaces used within RDF
  3. extract the list of URI references from both RDF and HTML parts of the document


Constructor Summary
DocInstance(Cache cache, java.lang.String uristring, java.lang.String parentstring)
          Constructor: initialize data structures, but dont assign current and parent URI, as they cause exceptions
 
Method Summary
 void analyzeHTML()
          Analyze the HTML text and find out the outgoing URLs
 void analyzeRDF()
          Analyze the RDF text and find out uris, namespaces and rdflists
static java.lang.String encode(java.lang.String val)
          This method is adapted from org.gjt.vinny.html.HTMLEncoder, it is a utility method for converting a string into a format suitable for placing inside HTML, so that special symbols: <,>,&," and ' are properly escaped.
 java.util.Vector getNs()
           
 java.util.Vector getRdf()
           
 java.util.Vector getUri()
           
static void main(java.lang.String[] args)
          for debugging
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocInstance

public DocInstance(Cache cache,
                   java.lang.String uristring,
                   java.lang.String parentstring)
            throws java.io.FileNotFoundException,
                   java.io.IOException
Constructor: initialize data structures, but dont assign current and parent URI, as they cause exceptions
Method Detail

analyzeHTML

public void analyzeHTML()
Analyze the HTML text and find out the outgoing URLs

analyzeRDF

public void analyzeRDF()
                throws java.lang.Exception
Analyze the RDF text and find out uris, namespaces and rdflists

getUri

public java.util.Vector getUri()

getNs

public java.util.Vector getNs()

getRdf

public java.util.Vector getRdf()

main

public static void main(java.lang.String[] args)
for debugging

encode

public static java.lang.String encode(java.lang.String val)
This method is adapted from org.gjt.vinny.html.HTMLEncoder, it is a utility method for converting a string into a format suitable for placing inside HTML, so that special symbols: <,>,&," and ' are properly escaped.