Intercepting All Accesses to External Entities During XML SAX Parsing

This capability is useful in situations where the public or system id in an XML file do not refer to actual resources and must be redirected. An EntityResolver object must be installed on the parser in order to intercept the accesses. If a mapping is made, the entity resolver must return an InputSource object to the resource.
try { // Create an XML parser DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder(); // Install the entity resolver builder.setEntityResolver(new MyResolver()); // Parse the XML file Document doc = builder.parse(new File("infilename.xml")); } catch (SAXException e) { // A parsing error occurred; the xml input is not valid } catch (ParserConfigurationException e) { } catch (IOException e) { } public class MyResolver implements EntityResolver { // This method is called whenever an external entity is accessed // for the first time. public InputSource resolveEntity (String publicId, String systemId) { try { // Wrap the systemId in a URI object to make it convenient // to extract the components of the systemId URI uri = new URI(systemId); // Check if external source is a file if ("file".equals(uri.getScheme())) { String filename = uri.getSchemeSpecificPart(); return new InputSource(new FileReader(filename)); } } catch (URISyntaxException e) { } catch (IOException e) { } // Returning null causes the caller to try accessing the systemid return null; } }
This is the sample input for the example:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE root SYSTEM "System.dtd" [ <!ENTITY entity1 SYSTEM "External.xml"> <!ENTITY entity2 SYSTEM "http://hostname.com/my.dtd"> <!ENTITY % entity3 SYSTEM "More.dtd"> %entity3; ]> <root> &entity1; &entity2; </root>
The resulting system ids passed into MyResolve.resolveEntity() are:
file:d:/almanac/1.4/egs/org.xml.sax/More.dtd file:d:/almanac/1.4/egs/org.xml.sax/Systemid.dtd file:d:/almanac/1.4/egs/org.xml.sax/External.dtd http://hostname.com/my.dtd

Post a comment

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image. Ignore spaces and be careful about upper and lower case.