peberminta

Simple, transparent parser combinators toolkit that supports tokens of any type.

For when you wanna do weird things with parsers.

Features

Well typed - written in TypeScript and with a lot of attention to keep types well defined.
Highly generic - no constraints on tokens, options (additional state data) and output types. Core module has not a single mention of strings as a part of normal flow. Some string-specific building blocks can be loaded from a separate module in case you need them.
Transparent. Built on a very simple base idea - just a few type aliases. Whole parser state is accessible at any time.
Lightweight. Zero dependencies. Just type aliases and functions.
Batteries included - comes with a pretty big set of building blocks.
Easy to extend - just follow the convention defined by type aliases when making your own building blocks. (And maybe let me know what you think can be universally useful to be included in the package itself.)
Easy to make configurable parsers. Rather than dynamically composing parsers based on options or manually weaving options into a dynamic parser state, this package offers a standard way to treat options as a part of static data and access them at any moment for course correction.
Well tested - comes will tests for everything including examples.
Practicality over “purity”. To be understandable and self-consistent is more important than to follow an established encoding of abstract ideas. More on this below.
No streaming - accepts a fixed array of tokens. It is simple, whole input can be accessed at any time if needed. More on this below.
Bring your own lexer/tokenizer - if you need it. It doesn’t matter how tokens are made - this package can consume anything you can type. I have a lexer as well, called leac, and it is used in some examples, but there is nothing special about it to make it the best match (well, maybe the fact it is written in TypeScript, has equal level of maintenance and is made with arrays instead of iterators in mind as well).

Install

Node

> npm i peberminta

import * as p from 'peberminta';
import * as pc from 'peberminta/char';

Examples

JSON;
CSV;
Hex Color;
Calc;
Brainfuck (and another implementation);
Non-decreasing sequences;
feel free to PR or request interesting compact grammar examples.

API

Detailed API documentation with navigation and search:

core module;
char module.

Convention

Whole package is built around these type aliases:

export type Data<TToken,TOptions> = {
  tokens: TToken[],
  options: TOptions
};

export type Parser<TToken,TOptions,TValue> =
  (data: Data<TToken,TOptions>, i: number) => Result<TValue>;

export type Matcher<TToken,TOptions,TValue> =
  (data: Data<TToken,TOptions>, i: number) => Match<TValue>;

export type Result<TValue> = Match<TValue> | NonMatch;

export type Match<TValue> = {
  matched: true,
  position: number,
  value: TValue
};

export type NonMatch = {
  matched: false
};

Data object holds tokens array and possibly an options object - it’s just a container for all static data used by a parser. Parser position, on the other hand, has it’s own life cycle and passed around separately.
A Parser is a function that accepts Data object and a parser position, looks into the tokens array at the given position and returns either a Match with a parsed value (use null if there is no value) and a new position or a NonMatch.
A Matcher is a special case of Parser that never fails and always returns a Match.
Result object from a Parser can be either a Match or a NonMatch.
Match is a result of successful parsing - it contains a parsed value and a new parser position.
NonMatch is a result of unsuccessful parsing. It doesn’t have any data attached to it.
TToken can be any type.
TOptions can be any type. Use it to make your parser customizable. Or set it as undefined and type as unknown if not needed.

Building blocks

Core blocks


ab	abc	action	ahead
all	and	any	chain
chainReduce	choice	condition	decide
discard	emit	end	eof
error	fail	flatten	flatten1
left	leftAssoc1	leftAssoc2	longest
lookAhead	make	many	many1
map	map1	middle	not
of	option	or	otherwise
peek	recursive	reduceLeft	reduceRight
right	rightAssoc1	rightAssoc2	satisfy
sepBy	sepBy1	skip	some
start	takeUntil	takeUntilP	takeWhile
takeWhileP	token

Core utilities


match	parse	parserPosition	remainingTokensNumber
tryParse

Char blocks


anyOf	char	charTest	concat
noneOf	oneOf	str

Char utilities


match	parse	parserPosition	tryParse

What about …?

performance - The code is very simple but I won’t put any unverified assumptions here. I’d be grateful to anyone who can set up a good benchmark project to compare different parser combinators.
stable release - Current release is well thought out and tested. I leave a chance that some supplied functions may need an incompatible change. Before version 1.0.0 this will be done without a deprecation cycle.
streams/iterators - Maybe some day, if the need to parse a stream of non-string data arise. For now I don’t have a task that would force me to think well on how to design it. It would require a significant trade off and may end up being a separate module (like char) at best or even a separate package.
Fantasy Land - You can find some familiar ideas here, especially when compared to Static Land. But I’m not concerned about compatibility with that spec - see “Practicality over “purity”” entry above. What I think might make sense is to add separate tests for laws applicable in context of this package. Low priority though.

Some other parser combinator packages

arcsecond;
parsimmon;
chevrotain;
prsc.js;
lop;
parser-lang;
and more, with varied level of maintenance.