edu.unika.aifb.rdf.crawler
Class DocInstance
java.lang.Object
|
+--edu.unika.aifb.rdf.crawler.DocInstance
- public class DocInstance
- extends java.lang.Object
DocInstance - call different document processing routines.
They are grouped in two classes - RDFInstance and HTMLInstance.
The goal or these manipulations is to obtain three things:
- build RDF models from one or more fragments of RDF found within
the document
- extract the list of namespaces used within RDF
- extract the list of URI references from both RDF and HTML parts of the document
|
Constructor Summary |
DocInstance(Cache cache,
java.lang.String uristring,
java.lang.String parentstring)
Constructor: initialize data structures,
but dont assign current and parent URI, as they cause exceptions |
|
Method Summary |
void |
analyzeHTML()
Analyze the HTML text and find out the outgoing URLs |
void |
analyzeRDF()
Analyze the RDF text and find out uris, namespaces and rdflists |
static java.lang.String |
encode(java.lang.String val)
This method is adapted from org.gjt.vinny.html.HTMLEncoder,
it is a utility method for converting
a string into a format suitable for placing inside HTML,
so that special symbols: <,>,&," and ' are properly escaped. |
java.util.Vector |
getNs()
|
java.util.Vector |
getRdf()
|
java.util.Vector |
getUri()
|
static void |
main(java.lang.String[] args)
for debugging |
| Methods inherited from class java.lang.Object |
clone,
equals,
finalize,
getClass,
hashCode,
notify,
notifyAll,
toString,
wait,
wait,
wait |
DocInstance
public DocInstance(Cache cache,
java.lang.String uristring,
java.lang.String parentstring)
throws java.io.FileNotFoundException,
java.io.IOException
- Constructor: initialize data structures,
but dont assign current and parent URI, as they cause exceptions
analyzeHTML
public void analyzeHTML()
- Analyze the HTML text and find out the outgoing URLs
analyzeRDF
public void analyzeRDF()
throws java.lang.Exception
- Analyze the RDF text and find out uris, namespaces and rdflists
getUri
public java.util.Vector getUri()
getNs
public java.util.Vector getNs()
getRdf
public java.util.Vector getRdf()
main
public static void main(java.lang.String[] args)
- for debugging
encode
public static java.lang.String encode(java.lang.String val)
- This method is adapted from org.gjt.vinny.html.HTMLEncoder,
it is a utility method for converting
a string into a format suitable for placing inside HTML,
so that special symbols: <,>,&," and ' are properly escaped.