Skip to main content
Deno 2 is finally here 🎉️
Learn more
Module

x/netsaur/packages/utilities/mod.ts>SplitTokenizer

Powerful machine learning, accelerated by WebGPU
Go to Latest
class SplitTokenizer
import { SplitTokenizer } from "https://deno.land/x/netsaur@0.4.0-patch/packages/utilities/mod.ts";

Tokenize text based on separator (whitespace)

Constructors

new
SplitTokenizer(options?: Partial<BaseTokenizerOptions & { indices: boolean; }>)

Properties

readonly
lastToken: number
skipWords: "english" | false | string[]

Words to ignore from vocabulary

vocabulary: Map<string, number>

Configuration / Function for preprocessing

Methods

fit(text: string | string[]): this

Construct a vocabulary from a given set of text.

split(text: string): string[]
transform(text: string | string[]): number[][]

Convert a document (string | array of strings) into vectors.