Skip to main content

XML stringifier/parser (TypeScript, no dependencies)

deno.land/x jsr.io

Basic usage

import { parse } from "./mod.ts"

console.log(parse(
  `
  <root>
    <!-- This is a comment -->
    <text>hello</text>
    <array>world</array>
    <array>monde</array>
    <array>δΈ–η•Œ</array>
    <array>🌏</array>
    <number>42</number>
    <boolean>true</boolean>
    <complex attribute="value">content</complex>
  </root>
`,
  { reviveNumbers: true, reviveBooleans: true },
))
/*
  Same nodes are grouped into arrays, while numbers and booleans are auto-parsed (can be disabled)
  Nodes with attributes will not be flattened and you'll be able to access them with "@" prefix while
  text nodes are available through "#text" key and comment nodes are available through "#comment" key
  {
    root: {
      "#comment": "This is a comment",
      text: "hello",
      array: ["world", "monde", "δΈ–η•Œ", "🌏"],
      number: 42,
      boolean: true,
      complex: {
        "@attribute": "value",
        "#text": "content",
      }
    }
  }
*/
import { stringify } from "./mod.ts"

console.log(stringify({
  root: {
    "#comment": "This is a comment",
    text: "hello",
    array: ["world", "monde", "δΈ–η•Œ", "🌏"],
    number: 42,
    boolean: true,
    complex: {
      "@attribute": "value",
      "#text": "content",
    },
  },
}))

Features

Follow XML.com’s Converting between XML and JSON patterns.

  • Support basic XML features (tags, self-closed tags, nested tags, attributes, …)
  • Support XML.parse and XML.stringify
  • Support <?xml ?> prolog declaration
  • Support <!DOCTYPE> declaration
  • Support <![CDATA[ ]] strings
  • Support <!-- --> comments
  • Support XML entities (&amp;, &#38;, &#x26;, …)
  • Support auto-conversion of primitives (strings, booleans, numbers, null, …)
  • Support strings or streams (ReaderSync & SeekerSync) inputs
  • Support custom revivers and replacers
  • Support metadata (parent, name, …) hidden in a non-enumerable property
  • Auto-group nodes into arrays when same tag is used
  • Auto-unwrap nodes when it only has text content

How reliable is deno.land/x/xml? Check parse tests and stringify tests πŸ§ͺ

Limitations

  • When using mixed content of texts and child nodes, it will be parsed as a text node
  • When using mixed group of nodes, XML.stringify(XML.parse())) may result in a different order
    • Example: <a><b/><c/><b/></a> will result in <a><b/><b/><c/></a>
    • This may or may not be acceptable depending on your use case

Revivers

By default, node contents will be converted to:

  • null when empty, unless emptyToNull = false
  • number when matching finite numbers, if reviveNumbers = true
  • boolean when matching true or false (case insensitive), if reviveBooleans = true

XML entities (e.g. &amp;, &#38;, &#x26;, …) will be unescaped automatically.

It is also possible to provide a custom reviver for complex transformations:

import { parse } from "./mod.ts"
console.log(parse(
  `
  <prices>
    <product>dakimakura</product>
    <price currency="usd">10.5</price>
    <price currency="eur">10.5</price>
    <price currency="yen">10.5</price>
    <useless/>
  </prices>
`,
  {
    reviver({ value, key, tag, properties }) {
      //Apply special processing for tag, attributes and properties
      if (tag === "price") {
        if (key === "@currency") {
          return { usd: "$", eur: "€", yen: "Β₯" }[value as string] ?? "?"
        }
        if (key === "#text") {
          delete this["@currency"]
          return `${value}${properties?.["@currency"]}`
        }
      }
      //Filter out useless elements
      if (tag === "useless") {
        return undefined
      }
      return value
    },
  },
))
/*
  Like JSON.parse's reviver, computed value can be transformed before being returned.
  - `this` will refer to the node being edited, meaning that any edition will reflect
    on final parsed value.
  - `properties` can be accessed only after all other node's properties have been
    parsed
  - returning `undefined` (or nothing) will filter out current value
  {
    prices: {
      product: "dakimakura",
      price: [ "10.5$", "10.5€", "10.5Β₯" ]
    }
  }
*/

Replacers

By default, node contents will be converted to:

  • "" when null, unless nullToEmpty = false
  • XML entities (e.g. &;, <, >, ", ' …) will be escaped when needed, unless escapeAllEntities = true in which case they will always be escaped

XML metadata

It is possible to access several metadata properties using $XML symbol, like parent node, name, etc.

import { $XML, parse } from "./mod.ts"
console.log($XML)
console.log(Deno.inspect(
  parse(
    `
  <root>
    <child>hello world</child>
  </root>
`,
    { flatten: false },
  ),
  { showHidden: true, compact: false },
))
/*
  Symbol("x/xml")
  {
    root: {
      child: {
        "#text": "hello world",
        [Symbol("x/xml")]: {
          name: "child",
          parent: [Circular]
        }
      },
      [Symbol("x/xml")]: {
        name: "root",
        parent: null
      }
    }
  }
*/

Stringify as CDATA

The Symbol("x/xml") for the root document may contain an Array<string[]> where each value contains an xml path towards a node that should be wrapped in CDATA.

For more complex transformations, use a reviver instead.

Parsing large files

Parsing large files of several mega bytes can take some time. You can use progress option to pass a callback each time a node has been parsed.

import { parse } from "./mod.ts"
const file = await Deno.open("my.xml")
const { size } = await file.stat()
console.log(parse(file, {
  progress(bytes) {
    Deno.stdout.writeSync(
      new TextEncoder().encode(
        `Parsing document: ${(100 * bytes / size).toFixed(2)}%\r`,
      ),
    )
  },
}))

Why does this use synchronous API ?

While there are no official specs for XML.parse and XML.stringify, it is intended to look like native JSON handler, hence why it is synchronous, and contains replacers and revivers.