7.5.13

A very simple example to use JDOM along with TagSoup

Just Read and Build the Document

SAXBuilder builder = new org.jdom.input.SAXBuilder("org.ccil.cowan.tagsoup.Parser");
Reader in = new StringReader(pageContent);
org.jdom.Document doc = builder.build(in);
System.out.println(new XMLOutputter().outputString(doc));

Applying with XPath

SAXBuilder builder = new org.jdom.input.SAXBuilder("org.ccil.cowan.tagsoup.Parser");
Reader in = new StringReader(pageContent);
org.jdom.Document doc = builder.build(in);
//System.out.println(new XMLOutputter().outputString(doc));
XPath xpath = XPath.newInstance("//xhtml:div[@class='views-row views-row-3 views-row-odd']");
xpath.addNamespace("xhtml", "http://www.w3.org/1999/xhtml");
List<Element> nodes = xpath.selectNodes(doc);
for(Element el : nodes) {
System.out.println(new XMLOutputter().outputString(el));
}

Applying XPath with Sub-Element

I am not sure why, but it seems that JDom has problems to handle Sub-elements. You have to build a new document to apply with the XPath.

SAXBuilder builder = new org.jdom.input.SAXBuilder("org.ccil.cowan.tagsoup.Parser");
Reader in = new StringReader(pageContent);
org.jdom.Document doc = builder.build(in);
//System.out.println(new XMLOutputter().outputString(doc));
XPath xpath = XPath.newInstance("//xhtml:div[@class='node node-teaser node-article']");
xpath.addNamespace("xhtml", "http://www.w3.org/1999/xhtml");
List<Element> nodes = xpath.selectNodes(doc);
XPath xpathArticle = XPath.newInstance("//xhtml:a[@class='node-title']");
xpathArticle.addNamespace("xhtml", "http://www.w3.org/1999/xhtml");

XMLOutputter xmlOutputter = new XMLOutputter();
for(Element el : nodes) {
String elXml = xmlOutputter.outputString(el);
builder = new org.jdom.input.SAXBuilder();
in = new StringReader(elXml);
org.jdom.Document doc2 = builder.build(in);
Element result = (Element)xpathArticle.selectSingleNode(doc2);
System.out.println(xmlOutputter.outputString(result));
}

No comments: