Skip to main content
Module

x/joss/SPECS.md

JOSS can serialize almost every JavaScript data type and data structure, so data can be seamlessly exchanged between browsers and servers (Deno or Node.js).
Latest
File

JS Open Serialization Scheme

Table of Contents

1. Introduction

JavaScript can be run not only in browsers, but also on servers through the use of JavaScript runtime environments, such as Deno and Node.js.

The de facto serialization format for exchanging data between browsers and servers is JavaScript Object Notation (JSON). However, browsers and servers that exchange data in JSON format are limited to the few data structures native to JSON, even when they can both run JavaScript.

This page documents the specification for a serialization format called the JS Open Serialization Scheme (JOSS). The format supports almost all data types and data structures intrinsic to JavaScript. The format also supports some often overlooked features of JavaScript, such as primitive wrapper objects, circular references, sparse arrays, and negative zeros.

2. Serialization

The serialization of a JavaScript data item begins with a single byte, called a marker byte. In some cases, the marker byte is standalone. In general, it is concatenated with a sequence of bytes to complete the serialization.

2.1. Standalone

The marker bytes with values 0–31 are listed in the following table. The most significant bit is assigned the bit number 0.

Table 1. Standalone marker bytes and others.
Bit Value Interpretation
0–2 0 Multipurpose
3–7 0 null
1 undefined
2 true as a Boolean value
3 true as a Boolean object
4 false as a Boolean value
5 false as a Boolean object
6 Infinity as a Number value
7 Infinity as a Number object
8 -Infinity as a Number value
9 -Infinity as a Number object
10 NaN as a Number value
11 NaN as a Number object
12 Hole in an Array
13 Unsupported data
14 Marker byte for Date
15 Marker byte for RegExp
16–28 Reserved for future extensions
29 Marker byte for object reference
30 Marker byte for custom object
31 Reserved for future extensions

The values 0–13 are for standalone marker bytes. The other values are either for marker bytes acting as semantic tags or reserved for future extensions.

2.2. Numbers

The Number type is used to represent numbers stored in double-precision format. It is serialized by concatenating

  1. The marker byte.
  2. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 2. Marker byte for numbers.
Bit Value Interpretation
0–2 1 Number
3 0 Number value
1 Number object
4 If integer:
0 Integer is not negative-valued
1 Integer is negative-valued
5–7 0 Payload is 1 byte long
1 Payload is 2 bytes long
2 Payload is 3 bytes long
3 Payload is 4 bytes long
4 Payload is 5 bytes long
5 Payload is 6 bytes long
6 Payload is 7 bytes long
7 Payload is 8 bytes long

If the represented number is not an integer, the payload is the value of the number encoded in double-precision format and little-endian byte ordering. The payload is exactly 64 bits or 8 bytes long in this case.

If the represented number is an integer, the payload is the absolute value of the integer encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering. The payload is at most 53 bits or 7 bytes long in this case.

Infinity, -Infinity, and NaN are special cases of the Number type serialized using standalone marker bytes.

2.3. Big Integers

The BigInt type is used to represent arbitrarily big integers. It is serialized by concatenating

  1. The marker byte.
  2. The payload size.
  3. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 3. Marker byte for big integers.
Bit Value Interpretation
0–2 2 BigInt
3 0 BigInt value
1 BigInt object
4 0 Integer is not negative-valued
1 Integer is negative-valued
5–7 0 Payload size is 1 byte long
1 Payload size is 2 bytes long
2 Payload size is 3 bytes long
3 Payload size is 4 bytes long
4 Payload size is 5 bytes long
5 Payload size is 6 bytes long
6 Payload size is 7 bytes long
7 Payload size is 8 bytes long

The payload is the absolute value of the represented integer encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

The payload size is the byte length of the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

2.4. Character and Binary Strings

The String type and ArrayBuffer object are used to represent character strings and binary strings respectively. They are serialized by concatenating

  1. The marker byte.
  2. The payload size.
  3. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 4. Marker byte for character and binary strings.
Bit Value Interpretation
0–2 3 String, ArrayBuffer, SharedArrayBuffer
3–4 0 String value
1 String object
2 ArrayBuffer
3 SharedArrayBuffer
5–7 0 Payload size is 1 byte long
1 Payload size is 2 bytes long
2 Payload size is 3 bytes long
3 Payload size is 4 bytes long
4 Payload size is 5 bytes long
5 Payload size is 6 bytes long
6 Payload size is 7 bytes long
7 Payload size is 8 bytes long

The payload is the represented character string encoded in UTF-8 code units or the represented binary string, whichever is applicable.

The payload size is the byte length of the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

2.5. Dense Arrays and Collections

The Array, Object, Map, and Set objects are used to represent indexed and keyed collections of data. They are serialized by concatenating

  1. The marker byte.
  2. The payload size.
  3. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 5. Marker byte for dense arrays and collections.
Bit Value Interpretation
0–2 4 Dense Array, plain Object, Map, Set
3–4 0 Dense Array
1 Plain Object
2 Map
3 Set
5–7 0 Payload size is 1 byte long
1 Payload size is 2 bytes long
2 Payload size is 3 bytes long
3 Payload size is 4 bytes long
4 Payload size is 5 bytes long
5 Payload size is 6 bytes long
6 Payload size is 7 bytes long
7 Payload size is 8 bytes long

The payload is the serialization of

  • Array: The elements in ascending order of index.
  • Object: The key-value pairs of own enumerable properties keyed by strings, optionally in the order returned by the [[OwnPropertyKeys]] method.
  • Map: The key-value pairs in order of insertion.
  • Set: The values in order of insertion.

The payload size is the number of items in the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering. Each key-value pair is considered one item.

The aforementioned serialization is not applicable to Array objects with holes. The serialization of such objects is described in the next subsection.

2.6. Sparse Arrays

The previous subsection is not applicable to Array objects with holes. Such objects are serialized by concatenating

  1. The marker byte.
  2. The array size.
  3. The payload size.
  4. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 6. Marker byte for sparse arrays.
Bit Value Interpretation
0–2 5 Sparse Array
3 0 Holes are serialized explicitly
1 Indices are serialized explicitly
4–5 0 Array size is 1 byte long
1 Array size is 2 bytes long
2 Array size is 3 bytes long
3 Array size is 4 bytes long
6–7 0 Payload size is 1 byte long
1 Payload size is 2 bytes long
2 Payload size is 3 bytes long
3 Payload size is 4 bytes long

The payload is the serialization of

  • Method A: The holes and elements in ascending order of index, up to and including the last element. Holes after the last element are omitted.
  • Method B: The index-element pairs in ascending order of index.

The payload under method A is analogous to the payload of a dense Array in that holes are treated like elements. Holes are serialized explicitly using a standalone marker byte.

The payload under method B is analogous to the payload of an Object in that indices are treated like property keys.

The payload size is the number of items in the payload encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering. Each index-element pair is considered one item.

The array size is the value of the length property encoded as an unsigned integer in the fewest bytes possible and little-endian byte ordering.

2.7. Typed Arrays

The DataView and TypedArray objects are used to access ArrayBuffer objects. They are serialized by concatenating

  1. The marker byte.
  2. The payload.

The marker byte is defined in the following table. The most significant bit is assigned the bit number 0.

Table 7. Marker byte for typed arrays.
Bit Value Interpretation
0–2 6 DataView, TypedArray
3 0 Elements are in little-endian byte ordering
1 Elements are in big-endian byte ordering
4–7 0 DataView
1 Int8Array
2 Uint8Array
3 Uint8ClampedArray
4 Int16Array
5 Uint16Array
6 Int32Array
7 Uint32Array
8 Float32Array
9 Float64Array
10 BigInt64Array
11 BigUint64Array
12–15 Reserved for future extensions

The payload is the serialization of the binary string returned by the buffer property and segmented by the byteOffset and byteLength properties.

2.8. Dates

The Date object is used to represent dates and times. It is serialized by concatenating

  1. The marker byte for Date.
  2. The serialization of the number returned by the valueOf() method.

2.9. Regular Expressions

The RegExp object is used to represent regular expressions. It is serialized by concatenating

  1. The marker byte for RegExp.
  2. The serialization of the string returned by the toString() method.

2.10. Object References

A reference to an object whose marker byte can be found in the serialized byte stream is serialized by concatenating

  1. The marker byte for object reference.
  2. The serialization of the position of the referenced object’s marker byte in the serialized byte stream, where the first byte is at position zero.

2.11. Custom Objects

A custom object that can be serialized using an external serialization format is serialized by concatenating

  1. The marker byte for custom object.
  2. The serialization of the custom object in accordance with the external serialization format.

2.12. Unsupported Data

Any data type or data structure not covered by the preceding subsections is serialized using a standalone marker byte.

3. Deserialization

The deserialization of a JavaScript data item is accomplished by decoding a serialized byte stream with reference to the serialization format.

The deserialization process should substitute an appropriate Error object in the following scenarios:

  • The marker byte for unsupported data is encountered.
  • The JavaScript engine cannot return the required data.

The deserialization process should stop when the serialized byte stream is malformed as in, but not limited to, the following scenarios:

  • A reserved marker byte is encountered.
  • An invalid data type is encountered, such as
    • An Object key that is not a String value.
    • An Array index that is not a Number value.
  • An invalid value is encountered, such as
    • An Array index that is out of bounds.
    • A duplicate Array index, Object key, Map key, or Set value.
  • The standalone marker byte for a hole is encountered outside the context of a sparse Array.
  • The payload of a Number type encodes an integer longer than 53 bits.
  • The payload of an object reference does not point to a prior object.
  • The serialized byte stream ends before the deserialization process.
  • The deserialization process ends before the serialized byte stream.

4. Limitations

The serialization format does not support certain data types and data structures intrinsic to JavaScript, such as Error, Function, Symbol, and objects that hold weak references like WeakMap, WeakSet, and WeakRef.

The serialization format also does not preserve object properties that are non-enumerable, keyed by symbols, or inherited through the prototype chain, such as the byteOffset property of TypedArray objects and the lastIndex property of RegExp objects.

5. Extensions

The serialization format reserves the marker byte values 224–255, as well as those labelled as reserved in Table 1 and Table 7, for future extensions.

6. Copyright

Copyright © 2021 Quantitative Risk Solutions PLT (201604001668). All rights reserved.