How I built a typing trainer that practices what you actually mistype

Most typing trainers give you random words. KeyDown tracks the specific character sequences where you make mistakes and generates practice text weighted toward those weak spots. Here's how the adaptive engine works under the hood.

How I built a typing trainer that practices what you actually mistype

Most typing trainers give you random words or quotes. You type, get your WPM, move on. A hundred sessions later you're still stumbling on the same key transitions, and the trainer has no idea.

I built one that does. It's called KeyDown. This post covers how the adaptive engine works: from collecting errors to generating text that's skewed toward your weak spots.

The problem: errors aren't random

Your typing errors aren't evenly distributed. Everyone has their own set of difficult transitions. Some people consistently fumble e→r, others drop letters in tion, others misfire on t→h. This isn't randomness — it's specific finger mechanics.

A regular typing trainer ignores this. You get the same text regardless of which transitions give you trouble. 80% of your practice time goes to keys you already type fine.

KeyDown's idea: track the n-grams (character sequences) where you make mistakes, and generate text with a higher concentration of those sequences.

N-grams in the context of typing

An n-gram is a sequence of N characters. For typing, n-grams of length 2 through 5 matter.

The bigram th is one transition. The trigram the is two transitions in a row. The 4-gram ther and 5-gram there are longer chains that catch persistent error patterns.

When a user types monster and hits i instead of o, the engine records the error not against the word, but against specific n-grams: mo, mon, mons — everything that can be built from the buffer of recent correct characters plus the expected character. The word doesn't matter. The transition mo appears in hundreds of words: moment, money, month, more. Train one transition, fix the error everywhere.

The dictionary: two lists sorted by frequency

KeyDown's dictionary is two JSON files: nouns and verbs. Both are sorted by descending frequency in the COCA corpus (Corpus of Contemporary American English — one billion words, balanced across genres: fiction, newspapers, academic texts, spoken language).

Frequency data comes from wordfrequency.info. You practice transitions on words you actually encounter in everyday English, not on pneumonoultramicroscopicsilicovolcanoconiosis.

The frequencyThreshold parameter controls how much of the dictionary is available to the generator:

  • frequencyThreshold = 1 — pool is limited to the most frequent words (top-N)
  • frequencyThreshold = 0 — full dictionary, including rare words
  • Values in between expand the pool linearly

Pool size formula:

poolSize = max(numberOfWords, ceil(numberOfWords + (totalAvailable - numberOfWords) × (1 - frequencyThreshold)))

With the slider at maximum, you get time, people, work, make. Lower it, and glimpse, drought, reluctant start showing up.

How the generator works

The function gentext() is the engine's core. It takes parameters and returns a string of words for typing.

Step 1: Build the word pool

A pool is formed from the dictionary. Size is determined by frequencyThreshold. For nouns-and-verbs generation, the pool is split between nouns and verbs using nounToVerbRatio:

const nounCount = Math.min(
  Math.max(1, Math.round(poolSize * nounToVerbRatio)),
  nouns.length
);
const verbCount = Math.min(
  Math.max(1, poolSize - nounCount),
  verbs.length
);

Default is 50/50. Users can shift the slider toward nouns or verbs.

Step 2: Filter and backfill

Words from excludeWords are removed — these are words already typed in the current session. If filtering drops the pool below numberOfWords, additional words are pulled from the rest of the dictionary (backfill).

Step 3: Without n-grams

If the ngrams array is empty, adaptation is off. The pool gets shuffled with Knuth's algorithm (Fisher-Yates), first N words are taken, joined with spaces:

return shuffle(filtered).slice(0, actualCount).join(" ");

Step 4: With n-grams

The ngrams array arrives from outside, sorted by priority: first n-gram is the most problematic.

The generator distributes slots with decreasing weights. Three n-grams ["mo", "th", "er"] get weights 3, 2, 1. Total weight: 6. Out of 20 words, the first n-gram gets floor(20 × 3/6) = 10, the second floor(20 × 2/6) = 6, the third floor(20 × 1/6) = 3:

const N = ngrams.length;
const totalWeight = (N * (N + 1)) / 2;

for (let i = 0; i < N; i++) {
  const weight = N - i;
  const targetCount = Math.max(1, Math.floor(actualCount * weight / totalWeight));
  const matching = shuffledPool.filter(w => w.includes(ngram) && !usedSet.has(w));
  const picked = matching.slice(0, targetCount);
  // ...
}

For each n-gram, the generator looks for words containing that substring: w.includes(ngram). month contains mo — match. other contains th — match. Already-selected words aren't reused.

If there aren't enough words for a given n-gram, remaining slots are filled with regular words:

if (selected.length < actualCount) {
  const rest = shuffledPool.filter(w => !usedSet.has(w));
  selected.push(...rest.slice(0, actualCount - selected.length));
}

The final set is shuffled again so n-gram words don't cluster together.

The n-gram tracker: what happens outside gentext()

gentext() knows nothing about errors. It receives an array of n-grams and generates text. The prioritization logic lives in TrainContext — a React context that manages the typing session.

Correct-character buffer

The tracker maintains a buffer of recently correct characters. On each correct keystroke, the buffer grows, and n-grams of length 2 through 5 are built from it:

const recordNgramCorrect = (key: string) => {
  const now = Date.now();
  const buf = lastCorrectCharsRef.current;

  if (/[a-z]/i.test(key) && lastCorrectTimeRef.current !== null) {
    const ms = now - lastCorrectTimeRef.current;
    for (let n = 2; n <= TRAIN_NGRAM_MAX; n++) {
      if (buf.length < n - 1) break;
      const ngram = [...buf.slice(-(n - 1)), key].join("");
      const entry = ngramStatsRef.current.get(ngram) ?? { attempts: 0, errors: 0, totalMs: 0 };
      ngramStatsRef.current.set(ngram, {
        ...entry,
        attempts: entry.attempts + 1,
        totalMs: entry.totalMs + ms,
      });
    }
  }

  lastCorrectTimeRef.current = now;
  lastCorrectCharsRef.current =
    key === " " || !/[a-z]/i.test(key) ? [] : [...buf.slice(-(TRAIN_NGRAM_MAX - 2)), key];
};

Spaces reset the buffer — n-grams don't cross word boundaries.

Beyond the attempt count, the tracker records the time since the previous keystroke (totalMs). This data isn't used in generation yet, but it's already being collected for per-n-gram speed analysis.

Recording errors

On a wrong keystroke, the tracker builds n-grams using the expected character, not the one the user actually pressed:

const recordNgramError = (expected: string) => {
  const buf = lastCorrectCharsRef.current;
  if (buf.length === 0 || !/[a-z]/i.test(expected)) return null;

  let longestNgram = null;
  for (let n = 2; n <= TRAIN_NGRAM_MAX; n++) {
    if (buf.length < n - 1) break;
    const ngram = [...buf.slice(-(n - 1)), expected].join("");
    const entry = ngramStatsRef.current.get(ngram) ?? { attempts: 0, errors: 0, totalMs: 0 };
    ngramStatsRef.current.set(ngram, { ...entry, errors: entry.errors + 1 });
    longestNgram = ngram;
  }

  lastCorrectCharsRef.current = [];
  lastCorrectTimeRef.current = null;
  return longestNgram;
};

After an error, the buffer resets. One mistake can't "infect" neighboring n-grams — only the transition where the user stumbled gets counted.

Accumulation across chunks

A session is 20 words, but text is generated in chunks. When the user finishes the current chunk, the next one is generated. N-gram statistics accumulate across chunks.

When generating a new chunk, buildNgramPriority() merges errors from the previous session and the current one:

function buildNgramPriority(
  snapshots: TNgramSnapshot[],
  statsMap: Map<string, TNgramStats>,
): string[] {
  const scores = new Map<string, number>();

  snapshots.forEach((s, i) => {
    scores.set(s.ngram, snapshots.length - i);
  });

  for (const [ngram, stats] of statsMap.entries()) {
    if (stats.errors > 0) {
      scores.set(ngram, (scores.get(ngram) ?? 0) + stats.errors);
    }
  }

  return Array.from(scores.entries())
    .sort((a, b) => b[1] - a[1])
    .map(([ngram]) => ngram);
}

If mo had 3 errors in the previous session and th has 4 in the current one, th moves up. Adaptation happens in real time, mid-session.

Deduplication

Typed words are passed to gentext() as excludeWords. The word month won't appear twice in a session.

What gets saved to the server

After a session, only a summary is persisted: WPM, accuracy, word count, duration, per-second WPM chart snapshots, error markers, and up to 8 problematic n-grams. Per-character data stays in browser memory.

N-gram error counts are stored in Firestore at two levels: per-day (statistics/{dateKey}) and all-time (statistics/ngrams). Format: { ngram: { count } }. This powers the error trend charts in the profile: which n-grams were problematic last week vs. today.

Offline

Users who aren't logged in don't lose data: sessions are saved to localStorage. When they sign in, accumulated sessions sync to the server automatically.

How WPM is calculated

WPM in KeyDown is median, not mean. At session end, each word's WPM is computed (typing time → 60 / time), and the median is taken from that array:

const wpm = median(state.wordSnapshots.map(s => s.wpm));
export function median(vals: number[]) {
  if (vals.length === 0) return 0;
  const sorted = [...vals].sort((a, b) => a - b);
  const mid = Math.floor(sorted.length / 2);
  return sorted.length % 2 === 0
    ? Math.round((sorted[mid - 1] + sorted[mid]) / 2)
    : sorted[mid];
}

One word typed over 10 seconds (while you got distracted) won't tank the result. One word typed in 0.2 seconds won't inflate it.

What the user sees

After each session (20 words) — a stats page:

  • Four cards: words, WPM with goal indicator, accuracy with goal indicator, errors.
  • Performance chart: per-second WPM with red error markers on the timeline. Hover shows time, WPM, and the word.
  • Weak sequences: bars showing n-grams ranked by error count. Tooltip: c→o · 1 mistype.
  • Key heatmap: QWERTY layout with error keys highlighted.

The profile shows aggregated data for a week, month, or year: median WPM, accuracy, speed-vs-accuracy scatter plot, n-gram trends.

Comparison with VerseQ

Adaptive generation isn't a new idea. VerseQ used Markov chains to generate syllables based on errors. How KeyDown differs:

  • Real words instead of syllables. VerseQ generated pseudo-words. KeyDown picks from a frequency-ranked dictionary. The text reads like English, and the skill transfers to real typing.
  • Web instead of desktop. Progress syncs through an account.
  • Transparency. VerseQ didn't show which transitions it was training. KeyDown shows your problematic n-grams after every session and in aggregated stats.
  • Any layout. KeyDown compares the typed character to the expected one — QWERTY, Colemak, Dvorak, whatever you use.

Stack

  • Frontend: Next.js, TypeScript, Tailwind CSS, DaisyUI, Recharts
  • Backend: Firebase (Auth with Google/GitHub OAuth, Firestore)
  • State: React Query + React Context
  • Payments: Stripe (Checkout Sessions + Webhooks → Firestore)

The adaptive engine runs entirely on the client. Only aggregated session results go to the server. Per-keystroke data never leaves the browser.

The text generator is open source: github.com/iamursky/gentext.

Try it

Trainer, stats, profile, paid features — all live. Try keydown.io and tell me what's broken. First 100 users get a free year of Pro. Coupon code: HWVVYl7P.