Package: rtiktoken 0.11.0-1

David Zimmermann-Kollenda
rtiktoken: A Byte-Pair-Encoding (BPE) Tokenizer for OpenAI's Large Language Models
A thin wrapper around the tiktoken-rs crate, allowing to encode text into Byte-Pair-Encoding (BPE) tokens and decode tokens back to text. This is useful to understand how Large Language Models (LLMs) perceive text.
Authors:
rtiktoken_0.11.0-1.tar.gz
rtiktoken_0.11.0-1.zip(r-4.7)rtiktoken_0.11.0-1.zip(r-4.6)rtiktoken_0.11.0-1.zip(r-4.5)
rtiktoken_0.11.0-1.tgz(r-4.6-x86_64)rtiktoken_0.11.0-1.tgz(r-4.6-arm64)rtiktoken_0.11.0-1.tgz(r-4.5-x86_64)rtiktoken_0.11.0-1.tgz(r-4.5-arm64)
rtiktoken_0.11.0-1.tar.gz(r-4.7-arm64)rtiktoken_0.11.0-1.tar.gz(r-4.7-x86_64)rtiktoken_0.11.0-1.tar.gz(r-4.6-arm64)rtiktoken_0.11.0-1.tar.gz(r-4.6-x86_64)
rtiktoken_0.11.0.tgz(r-4.5-emscripten)
manual.pdf |manual.html✨
card.svg |card.png
rtiktoken/json (API)
NEWS
| # Install 'rtiktoken' in R: |
| install.packages('rtiktoken', repos = c('https://davzim.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/davzim/rtiktoken/issues
Pkgdown/docs site:https://davzim.github.io
bpeopenairusttokenizationcargo
Last updated from:52e7f16a92. Checks:12 OK, 1 FAIL. Indexed: yes.
| Target | Result | Time | Files | Syslog |
|---|---|---|---|---|
| linux-devel-arm64 | OK | 160 | ||
| linux-devel-x86_64 | OK | 136 | ||
| source / vignettes | OK | 199 | ||
| linux-release-arm64 | OK | 149 | ||
| linux-release-x86_64 | OK | 149 | ||
| macos-release-arm64 | OK | 123 | ||
| macos-release-x86_64 | OK | 309 | ||
| macos-oldrel-arm64 | OK | 116 | ||
| macos-oldrel-x86_64 | OK | 246 | ||
| windows-devel | OK | 177 | ||
| windows-release | OK | 169 | ||
| windows-oldrel | OK | 174 | ||
| wasm-release | FAIL | 153 |
Exports:decode_tokensget_token_countget_tokensmodel_to_tokenizer
Dependencies:
Readme and manuals
Help Manual
| Help page | Topics |
|---|---|
| Decodes tokens back to text | decode_tokens |
| Returns the number of tokens in a text | get_token_count |
| Converts text to tokens | get_tokens |
| Gets the name of the tokenizer used by a model | model_to_tokenizer |