April 30, 2026tokens

Building With LLMs — A 4-Post Series · Part 3

Tokens and Temperature, in Plain English

Tokens and Temperature, in Plain English — explore insights on tokens, temperature and more.

tokenstemperatureplainenglish

Most of the LLM cost and quality issues I've seen in production trace back to two things people don't intuit about how these models work. One is *tokens* — the unit of measurement everyone gets billed in but few think about in human terms. The other is *temperature* — the single setting that controls whether your model is reliable or random. Get these two right and a lot of the harder optimization work gets easier.

This post is short by design. If you already understand both, skip ahead to the next post. If not, this is the vocabulary you need before the playbook will make sense.

## What a token actually is

A token is a chunk of text the model treats as a single unit. It's usually a word, a piece of a word, or a punctuation mark. The exact split depends on the tokenizer, but the rule of thumb that's stuck around for English is:

**1 token ≈ 4 characters ≈ 0.75 words.**

So the conversions look roughly like this:

| You have | That's about |
|---|---|
| 100 tokens | ~75 words · ~400 characters · ~400 bytes of English |
| 1,000 tokens | ~750 words · ~4 KB · 1.5 pages of a novel |
| 10,000 tokens | ~7,500 words · ~40 KB · a longish blog post |
| 100,000 tokens | ~75,000 words · ~400 KB · a short novel |
| 1 million tokens | ~750,000 words · ~4 MB · 3 long novels |
| 1 GB of English text | ~250 million tokens — far beyond any context window |

To put this in app-relevant terms: Claude Sonnet's 200K context window holds about 800 KB of English, or 150,000 words. That's roughly six hundred paperback pages. Plenty for almost any single-call task — the limit usually isn't space, it's cost and the model's attention quality at the long end.

A few caveats. **Code is denser than prose** because the tokenizer doesn't have learned merges for things like `}`, `_`, or `</div>` — so JSON, HTML, and code typically run 1.3–1.5× more tokens per character. URLs and IDs are even worse, sometimes 2× denser. **Non-English languages range** from "similar to English" (most European languages) to "much denser" (Chinese, Japanese, Korean — sometimes 2–3 tokens per character). If you're processing those, the byte-to-token ratios above are optimistic.

For the audit numbers in the next post to feel concrete: when I say a particular system prompt is "1,400 tokens," that's about 5.6 KB of text — four single-spaced pages. The model providers bill in *per million tokens* — so $3 per million for input means roughly $3 to feed the model 4 MB of English. About a tenth of a cent per page.

That framing makes the small numbers less abstract. When you forget to cap an input and accidentally send a 50-page customer document into your prompt, you've sent ~12,500 tokens — about 4 cents at Sonnet input pricing. Which is nothing on a single call. The problem is when it happens on every call, for every customer, every day.

## What temperature actually does

When the model is generating its next word, it's not picking one option — it's looking at *every possible next token* and assigning each one a probability. "The cat sat on the ___" might give "mat" 35%, "floor" 18%, "couch" 9%, and so on down a long tail of weirder options.

**Temperature controls how strictly the model follows those probabilities.**

At **temperature 0**, the model always picks the single highest-probability token. Same input gives the same output, near-deterministically. Best for tasks where there's a right answer: classification, code generation, structured extraction.

At **temperature 1.0** (the default on most APIs), the model samples roughly in proportion to the probabilities. "Mat" gets picked 35% of the time, but sometimes you get "couch" or "floor." This produces variety, which is great for creative writing and dreadful for anything that needs consistency.

At **temperature above 1**, the model amplifies its less-likely options. The probability distribution flattens. You get wilder, more surprising — and often less coherent — outputs.

A rough mapping for what you'd use in a real app:

| Temperature | Use it for |
|---|---|
| 0.0 | Pure classification. CSV column mapping. Image picking. Code generation. |
| 0.1–0.3 | Analytical answers. Validation passes that correct rather than invent. Factual extraction. |
| 0.3–0.5 | Brand-voice writing. Editing in a defined style. Captions where you want some variety. |
| 0.5–0.7 | Marketing copy. Longer-form generation. Writing that should feel human. |
| 0.7–1.0 | Brainstorming. Creative ideation. Deliberately surprising outputs. |
| Above 1.0 | Almost never in production — usually a sampling experiment. |

The single most common mistake I see: **leaving temperature at the default (1.0) for an analytical task.** The model produces correct outputs most of the time, but every now and then it picks a weird tail option and gives someone an answer they can't reproduce. You debug for an hour, change nothing, and the bug "fixes itself" because the next call rolled the dice differently. The fix is one line.

The intuition that's stuck with me: temperature is *how willing the model is to take a less-obvious path*. For tasks where you want the obvious right answer, keep it low. For tasks where the obvious answer is boring, turn it up.

---

## Coming next in the series

That's the foundation. **[Post 4: Twelve Steps to a Cheaper, Better LLM App](blog-series-4-twelve-step-playbook.md)** is the tactical playbook — twelve concrete moves, in roughly the order you'd want to do them, that routinely cut LLM costs in half and improve output quality at the same time. None take more than a couple of hours each.

← Previous: [Buy vs. Build After AI](blog-series-2-buy-vs-build.md) · [Series index](blog-series-index.md) · Next post → [Twelve Steps to a Cheaper, Better LLM App](blog-series-4-twelve-step-playbook.md)

Turn your brand into content like this

Narratr reads your website and generates SEO-optimised blog posts that sound like you.

Try Narratr free →