📃 XML parser and stringifier
✨ Features
- Based on the quick-xml Rust package (compiled to WASM).
- Support for
XML.parse
andXML.stringify
in the style of theJSON
global. - Support for
<!-- -->
comments. - Support for XML entities (
&
,&
,&
, …). - Support for mixed content (both text and nodes).
- Support for large output transformation options:
- Auto-flatten nodes with a single child, text or attributes
- Auto-revive
boolean
s,number
s, etc. - Auto-group same-named nodes into arrays.
- Format (indentation, break lines, etc.)
- Support for custom
reviver
andreplacer
functions
- Support for metadata stored into non-enumerable properties (advanced usage).
4.x.x
to 5.x.x
🕊️ Migrating from Prior to version version 5.0.0
, this library was fully written in TypeScript.
It now uses a WASM-compiled binding of the quick-xml Rust package, which provides better performance while allowing us to support more features.
Internal API changes
The $XML
internal symbol has been replaced by a set of non-enumerable properties:
- Parent node can now be accessed through
"~parent"
property (it’ll benull
for the XML document node) - Tag name can now be accessed through
"~name"
property - Children nodes can now be accessed through
"~children"
property- CDATA can now be tested by checking whether a node has a
"~name": "~cdata"
(if flattened, you’ll need to check from the parent node using~children
property)
- CDATA can now be tested by checking whether a node has a
<root>
<node><![CDATA[hello <world>]]></node>
</root>
<ref *1> {
- [$XML]: { cdata: [ "root", "node" ] },
+ "~parent": null,
+ "~name": "~xml",
root: {
node: "hello <world>",
- [$XML]: { name: "root", parent: null },
+ "~parent": [Circular *1],
+ "~name": "root",
+ "~children": [ { "~name": "~cdata", "#text": "hello <world>" } ],
}
}
XML document changes
XML document properties have been moved directly to top-level rather than being stored in xml
property.
Doctype is now stored in "#doctype"
property, and attributes values are set to ""
rather than true
.
Processing instructions (like XML stylesheets) are now parsed the same way as regular nodes but have been moved into "#instructions"
property.
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="styles.xsl" type="text/xsl"?>
<!DOCTYPE attribute>
<root/>
{
- xml: {
- "@version": "1.0",
- "@encoding": "UTF-8",
- },
+ "@version": "1.0",
+ "@encoding": "UTF-8",
- "$stylesheets": [ { "@href": "styles.xsl", "@type": "text/xsl" } ]
+ "#instructions": {
+ "xml-stylesheet": { "@href": "styles.xsl", "@type": "text/xsl" }
+ },
- doctype: { "@attribute": true },
+ "#doctype": { "@attribute": "" },
root: null
}
Mixed content support
This breaks any existing code that was expecting mixed content to always be a string.
Now, mixed content nodes will be parsed as usual, and the #text
property will contain the “inner text” of the node.
Note that #text
is actually a getter that recursively gets the #text
of children nodes (ignoring comment nodes), so it’ll also handle nested mixed content correctly.
<root>some <b>bold</b> text</root>
{
- root: "some <b>bold</b> text",
+ root: {
+ "#text": "some bold text",
+ b: "bold",
+ }
}
Comments
Comments have been moved into "#comments"
property.
Note that this property is now always an array, even if there is only one comment.
Additionally, you can find comments into the ~children
property by searching for nodes with "~name": "~comment"
.
If you call the #text
getter on a parent node containing comments, it will return the inner text without comments.
<root><!--some comment--></root>
{
root: {
- "#comment": "some comment",
+ "#comments": [ "some comment" ],
}
}
Parsing
Options
Parsing options are categorized into 4 groups:
clean
, which can removeattributes
,comments
, xmldoctype
andinstructions
from the outputflatten
, which can flatten nodes with only atext
node,empty
ones or transformattributes
only nodes into objects without the@
prefixrevive
, which cantrim
content (unlessxml:space="preserve"
), unescape xmlentities
, revivebooleans
andnumbers
- You can also provide a
custom
reviver function (applied after other revivals) that will be called on each attribute and node - Note that signature of the reviver function has changed
- You can also provide a
mode
, which can be eitherxml
orhtml
. Choosing the latter will be more permissive than the former.
const options = {
- reviveBooleans: true,
- reviveNumbers: true,
- reviver:() => {},
+ revive: { booleans: true, numbers: true, custom: () => {} },
- emptyToNull: true,
- flatten: true,
+ flatten: { text: true, empty: true },
- debug: false,
- progress: () => null,
}
Please refer to the documentation for more information.
Parsing streams
The parse()
function supports any ReaderSync
, which means you can pass directly a file reader for example.
import { parse } from "./parse.ts"
parse(await Deno.readTextFile("example.xml"))
Async parsing is not supported yet, but might be added in the future (see #49).
Stringifying
Options
Stringifying options are now categorized into 2 groups:
format
, which can configure theindent
string and automaticallybreakline
when a text node is too long- Since you pass a string rather than a number for indent, it means that you can also use tabs instead of space too
replace
, which can forcefully escape xmlentities
- You can also provide a
custom
replacer function that will be called on each attribute and node - Note that signature of the replacer function has changed
- You can also provide a
const options = {
- indentSize: 2,
+ format: { indent: " " },
- escapeAllEntities: true,
- replacer: () => {},
+ replace: { entities: true, custom: () => {} },
- nullToEmpty: false,
- debug: false,
- progress: () => null,
}
Please refer to the documentation for more information.
Stringifying content
Please refer to the above section about API changes.
If you were handling XML document properties, using the $XML
symbol or #comment
property, or dealing with mixed nodes content, you’ll most likely need to update your code.
Additionally, the library now provides comment()
and cdata()
helpers to respectively create comment and CDATA nodes:
import { cdata, comment, stringify } from "./stringify.ts"
stringify({
"@version": "1.0",
"@encoding": "UTF-8",
root: {
comment: comment("hello world"),
cdata: cdata("bonjour <le monde>"),
text: "hello world",
node: {
foo: true,
bar: 42,
baz: {
"@attribute": "value",
},
},
},
})
<?xml version="1.0" encoding="UTF-8"?>
<root>
<comment><!--hello world--></comment>
<cdata><![CDATA[bonjour <le monde>]]></cdata>
<text>hello world</text>
<node>
<foo>true</foo>
<bar>42</bar>
<baz attribute="value"/>
</node>
</root>
Note that while you can theoretically use internal API properties, currently, we strongly advise against doing so.
Supporting ~children
might be added in the future (#57) for mixed content, but its behavior is not yet well defined.
Setting ~name
manually might lead to unexpected behaviors, especially if it differs from the parent key.
Tip
For more type-safety, write satisfies Partial<xml_document>
after whatever you pass into stringify
, like so:
import { stringify, type xml_document } from "./stringify.ts"
const ast = {
"@version": "1.0",
"@encoding": "UTF-8",
"root": {},
} satisfies Partial<xml_document>
const result = stringify(ast)
We expose lax typing, but Partial<xml_document>
uses the stricter typing we use internally.
📜 License and credits
Copyright (c) Simon Lecoq <@lowlighter>. (MIT License)
https://github.com/lowlighter/libs/blob/main/LICENSE