Skip to main content

Deno Port of GPT-3-Encoder

BPE Encoder Decoder for GPT-2 / GPT-3

About

i needed gpt tokenizer for a personal peroject, all others projects(gpt_2_3_tokenizer,clip_bpe) had issue,
they would break at encoding “constructor” word,
so i ported a working gpt-3-encoder module from nodejs(js) to deno(ts) and reformed the internals abit

Usage

deno 1.30.2
v8 10.9.194.5
typescript 4.9.4

// `deno run --allow-read --allow-write example.ts`
import {
  decode,
  encode,
} from "https://deno.land/x/gpt@1.0/mod.ts";

const str = 'encode: biji heval, contrusctor'
const encoded = encode(str)
console.log('tokenized result: ', encoded)

console.log('We can look at each token and what it represents')
for(let token of encoded){
  console.log({token, string: decode([token])})
}

const decoded = decode(encoded)
console.log('decoded:', decoded)