decancer
A portable module that removes common confusables from strings without the use of Regexes. Available for Rust, Node.js, Deno, and the Browser.
Pros:
- Extremely fast, no use of regex whatsoever!
- No dependencies.
- Simple to use, just one single function.
- Supports all the way to UTF-32 code-points. Like emojis, zalgos, etc.
- While this project may not be perfect, it should cover the vast majority of confusables.
Con:
- Remember that this project is not perfect, false-positives may happen.
installation
Rust
In your Cargo.toml
:
decancer = "1.4.0"
Node.js
In your shell:
$ npm install decancer
In your code:
const decancer = require('decancer');
Deno
In your code:
import init from "https://deno.land/x/decancer@v1.4.0/mod.ts";
const decancer = await init();
Browser
In your code:
import init from "https://cdn.jsdelivr.net/gh/null8626/decancer@v1.4.0/decancer.min.js";
const decancer = await init();
examples
NOTE: cured output will ALWAYS be in lowercase.
JavaScript
const noCancer = decancer('vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣');
console.log(noCancer); // 'very funny text'
Rust
extern crate decancer;
use decancer::Decancer;
fn main() {
let instance = Decancer::new();
let output = instance.cure("vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣");
assert_eq!(output, String::from("very funny text"));
}
If you want to check if the decancered string contains a certain keyword, i recommend using this instead since mistranslations can happen (e.g mistaking the number 0 with the letter O)
JavaScript
const noCancer = decancer(someString);
if (decancer.contains(noCancer, 'no-no-word')) console.log('LANGUAGE!!!');
Rust
extern crate decancer;
use decancer::Decancer;
fn main() {
let instance = Decancer::new();
let output = instance.cure("vEⓡ𝔂 𝔽𝕌Ňℕy ţ乇𝕏𝓣");
if instance.contains(&output, "funny") {
println!("i found the funny");
}
}
Web app example
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Decancerer!!! (tm)</title>
<style>
textarea {
font-size: 30px;
}
#cure {
font-size: 20px;
padding: 5px 30px;
}
</style>
</head>
<body>
<h3>Input cancerous text here:</h3>
<textarea rows="10" cols="30" style="font-size: 30px;"></textarea>
<br />
<button style="font-size: 20px; padding: 5px 30px" onclick="cure()">cure!</button>
<script type="module">
import init from "https://cdn.jsdelivr.net/gh/null8626/decancer@v1.4.0/decancer.min.js";
const decancer = await init();
window.cure = function () {
const textarea = document.querySelector("textarea");
if (!textarea.value.length) {
return alert("There's no text!!!");
}
textarea.value = decancer(textarea.value);
}
</script>
</body>
</html>
contributions
All contributions are welcome. Feel free to fork the project at GitHub! <3
If you want to add, remove, modify, or view the list of supported confusables, you can clone the GitHub repository, and modify it directly with Node.js. Either through a script or directly from the REPL.
const reader = await import('./contrib/index.mjs');
const data = reader.default('./core/bin/confusables.bin');
// do something with data...
data.save('./core/bin/confusables.bin');
special thanks
These are the primary resources that made this project possible.