java - xerces stalls on getting xhtml1-transitional DTD -
java - xerces stalls on getting xhtml1-transitional DTD -
this question has reply here:
xerces dom parser incredibly slow? 2 answersi'd perform xpath queries on xml document online. i've set inputstreams retrieve content , append <?xml ...?>
header declares encoding nowadays in charset
field of http requests. although works, it's painfully slow.
//bis bufferedinputstream content part of http reply docbuilder = docbuilderfactory.newdocumentbuilder(); // throws exception. document doc = docbuilder.parse (new prependinputstream(bis, "<?xml version='1.0' encoding='"+charset+"' ?>\r\n"));
(please allow me not set whole source time: i'm preparing assignment students).
some strace analysis revealed programme stalls when contacting w3.org:
send(8, "get /tr/xhtml1/dtd/xhtml1-transitional.dtd http/1.1\r\nuser-agent: java/1.6.0_17\r\nhost: www.w3.org\r\naccept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2\r\nconnection: keep-alive\r\n\r\n", 186, 0) recv(8, ...
as don't worry much html content valid (well-formed should enough), tried docbuilderfactory.setvalidating(false)
doesn't seem prevent online retrieval of dtd.
trying set manually schema docbuilderfactory.setschema()
using same dtd file retrieved manually results in "org.xml.sax.saxparseexception: markup in document preceding root element must well-formed. " (that not idea)
where over-complicating things?
(the xml backend seems com.sun.org.apache.xerces.internal.impl.xs.xmlschemaloader.loadschema, far can tell stack traces -- if that's of use).
html dtd's huge, using includes. , right, take forever. utilize xml catalog. there 1 can store dtds locally , map them scheme id.
if utilize tool, maven, find sufficient pointers.
the advantage i.o. intercepting entities reply linked @sylvainulg suggests, receive right characters.
java xml dtd
Comments
Post a Comment