java - xerces stalls on getting xhtml1-transitional DTD -



java - xerces stalls on getting xhtml1-transitional DTD -

this question has reply here:

xerces dom parser incredibly slow? 2 answers

i'd perform xpath queries on xml document online. i've set inputstreams retrieve content , append <?xml ...?> header declares encoding nowadays in charset field of http requests. although works, it's painfully slow.

//bis bufferedinputstream content part of http reply docbuilder = docbuilderfactory.newdocumentbuilder(); // throws exception. document doc = docbuilder.parse (new prependinputstream(bis, "<?xml version='1.0' encoding='"+charset+"' ?>\r\n"));

(please allow me not set whole source time: i'm preparing assignment students).

some strace analysis revealed programme stalls when contacting w3.org:

send(8, "get /tr/xhtml1/dtd/xhtml1-transitional.dtd http/1.1\r\nuser-agent: java/1.6.0_17\r\nhost: www.w3.org\r\naccept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2\r\nconnection: keep-alive\r\n\r\n", 186, 0) recv(8, ...

as don't worry much html content valid (well-formed should enough), tried docbuilderfactory.setvalidating(false) doesn't seem prevent online retrieval of dtd.

trying set manually schema docbuilderfactory.setschema() using same dtd file retrieved manually results in "org.xml.sax.saxparseexception: markup in document preceding root element must well-formed. " (that not idea)

where over-complicating things?

(the xml backend seems com.sun.org.apache.xerces.internal.impl.xs.xmlschemaloader.loadschema, far can tell stack traces -- if that's of use).

html dtd's huge, using includes. , right, take forever. utilize xml catalog. there 1 can store dtds locally , map them scheme id.

if utilize tool, maven, find sufficient pointers.

the advantage i.o. intercepting entities reply linked @sylvainulg suggests, receive right characters.

java xml dtd

Comments

Popular posts from this blog

web services - java.lang.NoClassDefFoundError: Could not initialize class net.sf.cglib.proxy.Enhancer -

Accessing MATLAB's unicode strings from C -

javascript - mongodb won't find my schema method in nested container -