Chapter 63. XML document type definitions

Index

Defining custom document type entities
#include <x/xml/doc.H>
#include <x/xml/newdtd.H>

auto empty_document=x::xml::doc::create();
lock->create_child()->element({"html"})
    ->element({"body"})
    ->element({"p"})
    ->text("Hello world");
lock->create_internal_dtd("-//W3C//DTD XHTML 1.0 Strict//EN",
                          "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd");
lock->save_file("filename.html", true);

This example creates the following file:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>
    <p>Hello world</p>
  </body>
</html>

A writer lock's create_internal_dtd() method adds a DOCTYPE declaration to the XML document. The XML document cannot be empty. create_internal_dtd() returns a x::xml::newdtd, which is a reference to a reference-counted object with methods that further modify the document type declaration.

A reader lock's get_internal_dtd() returns a x::xml::dtd that represents the existing document's DOCTYPE declaration. A writer lock also implements get_internal_dtd(), but the writer lock's version returns a x::xml::newdtd. x::xml::newdtd's object is a subclass of x::xml::dtd's object and inherits all of x::xml::dtd's object's methods that provide access to the DOCTYPE's definition:

auto dtd=rlock->get_internal_dtd();

if (dtd->exists())
{
    std::cout << "Name: " << dtd->name() << std::endl
              << dtd->external_id() << std::endl
              << dtd->system_id() << std::endl;
}

get_internal_dtd() returns a x::xml::dtd or a x::xml::newdtd even when the XML document does not have a DOCTYPE. Its exists() returns a bool indication whether the DOCTYPE exists. If so, name(), external_id(), and system_id() indicate the DOCTYPE's name, public/external identifier, and the system identifier.

Note

x::xml::dtd and x::xml::newdtd are references to a reference-counted objects that get created by a reader or a writer lock. They each hold an internal reference on the lock that created them, until all references to x::xml::dtd's or x::xml::newdtd's object go out of scope and it gets destroyed.

Generally, they follow the same thread-safe semantics as their corresponding locks. Different threads can retrieve and use their own respective x::xml::dtd, but only one thread can use a given x::xml::dtd at the same time. Only one writer lock can exist at the same time, so there's only one x::xml::newdtd in existence, and only one thread can access the x::xml::newdtd. At this time, it's possible to call get_internal_dtd() a second time which technically returns a different x::xml::newdtd; however for all practical purposes it's the same underlying object, and only one thread can use a x::xml::newdtd, at a time.

There are also analogous create_external_dtd() and get_external_dtd() methods, for external DOCTYPE subsets, but they're not commonly used; their only purpose is to expose the underlying libxml object, that's mainly used in DTD validation.

A writer lock also implements a remove_internal_dtd() (and remove_external_dtd()), which removes an XML document's DOCTYPE.

Defining custom document type entities

The object referenced by x::xml::newdtd implements several methods that add custom entity declaration to DOCTYPE:

auto empty_document=x::xml::doc::create();

empty_document->writelock()->create_child()->element({"html"})

auto wlock=empty_document->writelock();
auto intdtd=wlock
    ->create_internal_dtd("-//W3C//DTD XHTML 1.0 Strict//EN",
	                  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd");

intdtd->create_general_entity("XML", "<i>XML</i>");

// ...

wlock->save_file("filename.xml");

This results in the following XML document:


<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" [
<!ENTITY XML "<i>XML</i>">
]>
<html xmlns="http://www.w3.org/1999/xhtml">

 ....

Custom entities get typically declared in an XML document that gets saved into a file. When creating the rest of the document (prior to saving it), use entity() to insert an entity reference:

wlock->create_child()->entity("XML");
	

This inserts the &XML; entity reference into the XML document. Using the noent option when loading the XML document resolves entity references, the entities get replaced by their contents, in the parsed XML document.

newdtd->create_parsed_entity("ch1", "", "chapter1.xml");

create_parsed_entity() adds a declaration for an external parsed entity. The second parameter is a public identifier, which is normally an empty string. This example adds <!ENTITY ch1 SYSTEM 'chapter1.xml'> to the DOCTYPE declaration.

newdtd->create_unparsed_entity("table1", "", "table1.jpg", "jpg");

create_unparsed_entity() adds a declaration for an external unparsed entity. The second parameter is a public identifier, which is normally an empty string. This example adds <!ENTITY table1 SYSTEM 'table1.jpg' NDATA jpg> to the XML document.

std::ostringstream o;
	
for (int i=1; i<10; ++i)
{
    o << "<!ENTITY ch" << i << " SYSTEM \"ch" << i << ".xml\">";
}

newdtd->create_internal_parameter_entity("chapters", o.str());

create_internal_parameter_entity() adds a declaration for an internal parameter entity. This example adds <!ENTITY % chapters '[...]'> (with a long, messy string instead of the ellipsis) to the document.

newdtd->create_external_parameter_entity("chapters", "", "chapters.xml");

create_external_parameter_entity() adds a declaration for an external parameter entity. The second parameter is a public identifier, which is normally an empty string. This example adds <!ENTITY % chapters SYSTEM 'chapters.xml'> to the document.

newdtd->include_parameter_entity("chapters");

This example adds "%chapters;" to the document.