Rustling is a blazingly fast library for computational linguistics. It is written in Rust, with Python bindings.
-
Language Models — N-gram language models with smoothing
MLE— Maximum Likelihood Estimation (no smoothing)Lidstone— Lidstone (additive) smoothingLaplace— Laplace (add-one) smoothing
-
Word Segmentation — Models for segmenting unsegmented text into words
LongestStringMatching— Greedy left-to-right longest match segmenterRandomSegmenter— Random baseline segmenter
-
Part-of-speech Tagging
AveragedPerceptronTagger- Averaged perceptron tagger
-
CHAT Parsing — Parser for CHAT transcription files (CHILDES/TalkBank)
CHAT— Read and query CHAT data from directories, files, strings, or ZIP archives
Benchmarked against pure Python implementations from NLTK, wordseg (v0.0.5), and pylangacq (v0.19.1).
See benchmarks/ for full details and reproduction scripts.
| Component | Task | Speedup | vs. |
|---|---|---|---|
| Language Models | Fit | 10x | NLTK |
| Score | 2x | NLTK | |
| Generate | 80–112x | NLTK | |
| Word Segmentation | LongestStringMatching | 9x | wordseg |
| RandomSegmenter | 1.1x | wordseg | |
| POS Tagging | Training | 5x | NLTK |
| Tagging | 7x | NLTK | |
| CHAT Parsing | from_dir | 55x | pylangacq |
| from_zip | 48x | pylangacq | |
| from_files | 63x | pylangacq | |
| from_strs | 116x | pylangacq | |
| words() | 3x | pylangacq | |
| utterances() | 15x | pylangacq |
pip install rustlingcargo add rustlingMIT License