Token

The atomic unit of language — a single bead on the Universal Translator's abacus.

Tokens are the basic units that large language models process during both input reading and output generation. A token is not exactly a word — it's more precisely a word fragment or common character sequence as defined by the model's tokenizer. In English, common short words ("the," "is," "a") are typically single tokens; longer words are split into multiple tokens ("tokenization" → ["token", "ization"]); numbers and special characters may each be their own token. As a rough rule of thumb, 100 tokens ≈ 75 words in English, though the ratio varies by language (many non-Latin languages tokenize less efficiently and use more tokens per word). GPT-style models typically tokenize approximately 750 words per 1,000 tokens.

Token count determines two critical parameters for LLM usage: context window capacity (how much total text — input plus output — a model can handle in a single call) and API cost (virtually all commercial LLM APIs charge per input and output token, making token efficiency directly tied to cost). A model with a 200,000-token context window can process approximately 150,000 words in a single call — roughly a 500-page book. Models with smaller context windows (4,000-8,000 tokens) require chunking strategies for longer documents. For cost-sensitive applications, prompt efficiency — providing necessary context concisely rather than verbosely — can meaningfully reduce token consumption without sacrificing output quality.

For B2B teams building AI workflows, understanding tokens is essential for cost estimation, context window planning, and performance optimization. A team processing customer support tickets with an LLM needs to know: how many tokens are typical support tickets, how large is the response template, and therefore what is the cost per ticket processed? A team building a RAG system needs to know: how many document chunks can fit in context simultaneously, given the model's window size? Token literacy translates directly into practical decisions about model selection, prompt design, and infrastructure costs that affect whether an AI application is economically viable to deploy at scale.

tokenLLMtokenizationcontext windowAPIGPT

Related terms

← Back to Glossary