Package: rtiktoken 0.11.0-1

David Zimmermann-Kollenda

rtiktoken: A Byte-Pair-Encoding (BPE) Tokenizer for OpenAI's Large Language Models

A thin wrapper around the tiktoken-rs crate, allowing to encode text into Byte-Pair-Encoding (BPE) tokens and decode tokens back to text. This is useful to understand how Large Language Models (LLMs) perceive text.

Authors:David Zimmermann-Kollenda [aut, cre], Roger Zurawicki [aut], Authors of the dependent Rust crates [aut]

# Install 'rtiktoken' in R:

install.packages('rtiktoken', repos = c('https://davzim.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/davzim/rtiktoken/issues

Pkgdown/docs site:https://davzim.github.io

On CRAN:

bpe openai rust tokenization cargo

4.15 score 14 stars 7 scripts 259 downloads 4 exports 0 dependencies

Last updated from:52e7f16a92. Checks:12 OK, 1 FAIL. Indexed: yes.

Target	Result	Time
linux-devel-arm64	OK	159
linux-devel-x86_64	OK	163
source / vignettes	OK	196
linux-release-arm64	OK	204
linux-release-x86_64	OK	162
macos-release-arm64	OK	142
macos-release-x86_64	OK	263
macos-oldrel-arm64	OK	126
macos-oldrel-x86_64	OK	244
windows-devel	OK	168
windows-release	OK	171
windows-oldrel	OK	184
wasm-release	FAIL	155

Exports:decode_tokens get_token_count get_tokens model_to_tokenizer

Dependencies:

Citation

Development and contributors

Readme and manuals

Help Manual

Help page	Topics
Decodes tokens back to text	decode_tokens
Returns the number of tokens in a text	get_token_count
Converts text to tokens	get_tokens
Gets the name of the tokenizer used by a model	model_to_tokenizer