Skip to main content
Deno 2 is finally here ๐ŸŽ‰๏ธ
Learn more

๐Ÿ“ƒ XML parser and stringifier

JSR NPM deno.land/x Coverage

๐Ÿ“‘ Examples

Parsing XML to objects

import { parse } from "./parse.ts"

// Parse a string
console.log(parse(`
  <?xml version="1.0"?>
  <root>
    <text>hello</text>
    <array>world</array>
    <array>monde</array>
    <array>ไธ–็•Œ</array>
    <array>๐ŸŒ</array>
    <complex attribute="value">content</complex>
  </root>
`))

// Parse a file
using file = await Deno.open("bench/assets/small.xml")
console.log(parse(file))

Stringifying objects to XML

import { stringify } from "./stringify.ts"

console.log(stringify({
  "@version": "1.0",
  root: {
    text: "hello",
    array: ["world", "monde", "ไธ–็•Œ", "๐ŸŒ"],
    complex: {
      "@attribute": "value",
      "#text": "content",
    },
  },
}))

โœจ Features

  • Based on the quick-xml Rust package (compiled to WASM).
  • Support for XML.parse and XML.stringify in the style of the JSON global.
  • Support for <!-- --> comments.
  • Support for XML entities (&amp;, &#38;, &#x26;, โ€ฆ).
  • Support for mixed content (both text and nodes).
  • Support for large output transformation options:
    • Auto-flatten nodes with a single child, text or attributes
    • Auto-revive booleans, numbers, etc.
    • Auto-group same-named nodes into arrays.
    • Format (indentation, break lines, etc.)
    • Support for custom reviver and replacer functions
  • Support for metadata stored into non-enumerable properties (advanced usage).

๐Ÿ•Š๏ธ Migrating from 5.x.x to 6.x.x

Version 6.x.x and onwards require Deno 2.x.x or later.

๐Ÿ•Š๏ธ Migrating from 4.x.x to 5.x.x

Prior to version version 5.0.0, this library was fully written in TypeScript. It now uses a WASM-compiled binding of the quick-xml Rust package, which provides better performance while allowing us to support more features.

Internal API changes

The $XML internal symbol has been replaced by a set of non-enumerable properties:

  • Parent node can now be accessed through "~parent" property (itโ€™ll be null for the XML document node)
  • Tag name can now be accessed through "~name" property
  • Children nodes can now be accessed through "~children" property
    • CDATA can now be tested by checking whether a node has a "~name": "~cdata" (if flattened, youโ€™ll need to check from the parent node using ~children property)
<root>
  <node><![CDATA[hello <world>]]></node>
</root>
  <ref *1> {
-   [$XML]: { cdata: [ "root", "node" ] },
+   "~parent": null,
+   "~name": "~xml",
    root: {
      node: "hello <world>",
-     [$XML]: { name: "root", parent: null },
+     "~parent": [Circular *1],
+     "~name": "root",
+     "~children": [ { "~name": "~cdata", "#text": "hello <world>" } ],
    }
  }

XML document changes

XML document properties have been moved directly to top-level rather than being stored in xml property.

Doctype is now stored in "#doctype" property, and attributes values are set to "" rather than true.

Processing instructions (like XML stylesheets) are now parsed the same way as regular nodes but have been moved into "#instructions" property.

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="styles.xsl" type="text/xsl"?>
<!DOCTYPE attribute>
<root/>
  {
-   xml: {
-     "@version": "1.0",
-     "@encoding": "UTF-8",
-   },
+   "@version": "1.0",
+   "@encoding": "UTF-8",
-   "$stylesheets": [ { "@href": "styles.xsl", "@type": "text/xsl" } ]
+   "#instructions": {
+     "xml-stylesheet": { "@href": "styles.xsl", "@type": "text/xsl" }
+   },
-   doctype: { "@attribute": true },
+   "#doctype": { "@attribute": "" },
    root: null
  }

Mixed content support

This breaks any existing code that was expecting mixed content to always be a string. Now, mixed content nodes will be parsed as usual, and the #text property will contain the โ€œinner textโ€ of the node.

Note that #text is actually a getter that recursively gets the #text of children nodes (ignoring comment nodes), so itโ€™ll also handle nested mixed content correctly.

<root>some <b>bold</b> text</root>
  {
-   root: "some <b>bold</b> text",
+   root: {
+     "#text": "some bold text",
+     b: "bold",
+   }
  }

Comments

Comments have been moved into "#comments" property. Note that this property is now always an array, even if there is only one comment.

Additionally, you can find comments into the ~children property by searching for nodes with "~name": "~comment". If you call the #text getter on a parent node containing comments, it will return the inner text without comments.

<root><!--some comment--></root>
  {
    root: {
-     "#comment": "some comment",
+     "#comments": [ "some comment" ],
    }
  }

Parsing

Options

Parsing options are categorized into 4 groups:

  • clean, which can remove attributes, comments, xml doctype and instructions from the output
  • flatten, which can flatten nodes with only a text node, empty ones or transform attributes only nodes into objects without the @ prefix
  • revive, which can trim content (unless xml:space="preserve"), unescape xml entities, revive booleans and numbers
    • You can also provide a custom reviver function (applied after other revivals) that will be called on each attribute and node
    • Note that signature of the reviver function has changed
  • mode, which can be either xml or html. Choosing the latter will be more permissive than the former.
  const options = {
-   reviveBooleans: true,
-   reviveNumbers: true,
-   reviver:() => {},
+   revive: { booleans: true, numbers: true, custom: () => {} },
-   emptyToNull: true,
-   flatten: true,
+   flatten: { text: true, empty: true },
-   debug: false,
-   progress: () => null,
  }

Please refer to the documentation for more information.

Parsing streams

The parse() function supports any ReaderSync, which means you can pass directly a file reader for example.

import { parse } from "./parse.ts"
parse(await Deno.readTextFile("example.xml"))

Async parsing is not supported yet, but might be added in the future (see #49).

Stringifying

Options

Stringifying options are now categorized into 2 groups:

  • format, which can configure the indent string and automatically breakline when a text node is too long
    • Since you pass a string rather than a number for indent, it means that you can also use tabs instead of space too
  • replace, which can forcefully escape xml entities
    • You can also provide a custom replacer function that will be called on each attribute and node
    • Note that signature of the replacer function has changed
  const options = {
-   indentSize: 2,
+   format: { indent: "  " },
-   escapeAllEntities: true,
-   replacer: () => {},
+   replace: { entities: true, custom: () => {} },
-   nullToEmpty: false,
-   debug: false,
-   progress: () => null,
  }

Please refer to the documentation for more information.

Stringifying content

Please refer to the above section about API changes. If you were handling XML document properties, using the $XML symbol or #comment property, or dealing with mixed nodes content, youโ€™ll most likely need to update your code.

Additionally, the library now provides comment() and cdata() helpers to respectively create comment and CDATA nodes:

import { cdata, comment, stringify } from "./stringify.ts"
stringify({
  "@version": "1.0",
  "@encoding": "UTF-8",
  root: {
    comment: comment("hello world"),
    cdata: cdata("bonjour <le monde>"),
    text: "hello world",
    node: {
      foo: true,
      bar: 42,
      baz: {
        "@attribute": "value",
      },
    },
  },
})
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <comment><!--hello world--></comment>
  <cdata><![CDATA[bonjour <le monde>]]></cdata>
  <text>hello world</text>
  <node>
    <foo>true</foo>
    <bar>42</bar>
    <baz attribute="value"/>
  </node>
</root>

Note that while you can theoretically use internal API properties, currently, we strongly advise against doing so. Supporting ~children might be added in the future (#57) for mixed content, but its behavior is not yet well defined. Setting ~name manually might lead to unexpected behaviors, especially if it differs from the parent key.

Tip

For more type-safety, write satisfies Partial<xml_document> after whatever you pass into stringify, like so:

import { stringify, type xml_document } from "./stringify.ts"

const ast = {
  "@version": "1.0",
  "@encoding": "UTF-8",
  "root": {},
} satisfies Partial<xml_document>
const result = stringify(ast)

We expose lax typing, but Partial<xml_document> uses the stricter typing we use internally.

๐Ÿ“œ License and credits

Copyright (c) Simon Lecoq <@lowlighter>. (MIT License)
https://github.com/lowlighter/libs/blob/main/LICENSE