Package 'rtiktoken' reference manual

Title:	A Byte-Pair-Encoding (BPE) Tokenizer for OpenAI's Large Language Models
Description:	A thin wrapper around the tiktoken-rs crate, allowing to encode text into Byte-Pair-Encoding (BPE) tokens and decode tokens back to text. This is useful to understand how Large Language Models (LLMs) perceive text.
Authors:	David Zimmermann-Kollenda [aut, cre], Roger Zurawicki [aut] (tiktoken-rs Rust library), Authors of the dependent Rust crates [aut] (see AUTHORS file)
Maintainer:	David Zimmermann-Kollenda <[email protected]>
License:	MIT + file LICENSE
Version:	0.0.6
Built:	2025-02-22 05:24:47 UTC
Source:	https://github.com/davzim/rtiktoken

Decodes tokens back to text

Description

Decodes tokens back to text

Usage

decode_tokens(tokens, model)
decode_tokens(tokens, model)

Arguments

`tokens`	a vector of tokens to decode, or a list of tokens
`model`	a model to use for tokenization, either a model name, e.g., `⁠gpt-4o⁠` or a tokenizer, e.g., `o200k_base`. See also available tokenizers.

Value

a character string of the decoded tokens or a vector or strings

Examples

tokens <- get_tokens("Hello World", "gpt-4o")
tokens
decode_tokens(tokens, "gpt-4o")

tokens <- get_tokens(c("Hello World", "Alice Bob Charlie"), "gpt-4o")
tokens
decode_tokens(tokens, "gpt-4o")
tokens <- get_tokens("Hello World", "gpt-4o")
tokens
decode_tokens(tokens, "gpt-4o")

tokens <- get_tokens(c("Hello World", "Alice Bob Charlie"), "gpt-4o")
tokens
decode_tokens(tokens, "gpt-4o")

Returns the number of tokens in a text

Description

Returns the number of tokens in a text

Usage

get_token_count(text, model)
get_token_count(text, model)

Arguments

`text`	a character string to encode to tokens, can be a vector
`model`	a model to use for tokenization, either a model name, e.g., `⁠gpt-4o⁠` or a tokenizer, e.g., `o200k_base`. See also available tokenizers.

Value

the number of tokens in the text, vector of integers

Examples

get_token_count("Hello World", "gpt-4o")
get_token_count("Hello World", "gpt-4o")

Converts text to tokens

Description

Converts text to tokens

Usage

get_tokens(text, model)
get_tokens(text, model)

Arguments

`text`	a character string to encode to tokens, can be a vector
`model`	a model to use for tokenization, either a model name, e.g., `⁠gpt-4o⁠` or a tokenizer, e.g., `o200k_base`. See also available tokenizers.

Value

a vector of tokens for the given text as integer

Examples

get_tokens("Hello World", "gpt-4o")
get_tokens("Hello World", "o200k_base")
get_tokens("Hello World", "gpt-4o")
get_tokens("Hello World", "o200k_base")

Gets the name of the tokenizer used by a model

Description

Gets the name of the tokenizer used by a model

Usage

model_to_tokenizer(model)
model_to_tokenizer(model)

Arguments

model

the model to use, e.g., ⁠gpt-4o⁠

Value

the tokenizer used by the model

Examples

model_to_tokenizer("gpt-4o")
model_to_tokenizer("gpt-4-1106-preview")
model_to_tokenizer("text-davinci-002")
model_to_tokenizer("text-embedding-ada-002")
model_to_tokenizer("text-embedding-3-small")
model_to_tokenizer("gpt-4o")
model_to_tokenizer("gpt-4-1106-preview")
model_to_tokenizer("text-davinci-002")
model_to_tokenizer("text-embedding-ada-002")
model_to_tokenizer("text-embedding-3-small")

Package 'rtiktoken'

Help Index

Decodes tokens back to text

Description

Usage

Arguments

Value

See Also

Examples

Returns the number of tokens in a text

Description

Usage

Arguments

Value

See Also

Examples

Converts text to tokens

Description

Usage

Arguments

Value

See Also

Examples

Gets the name of the tokenizer used by a model

Description

Usage

Arguments

Value

Examples