Count whitespace-separated tokens in input

009-word-count. Read all bytes from standard input.

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

# Task 009 — word-count

## Problem

Read all bytes from standard input. A **word** is a maximal run of
non-whitespace bytes. The whitespace bytes are:

- space (0x20)
- tab (0x09)
- newline (0x0A)
- carriage return (0x0D)

Write the total word count as a decimal integer to standard output,
followed by exactly one newline. Exit with status 0.

## Edge cases

- Empty input → write `0\n`.
- All-whitespace input → write `0\n`.
- Single word with no trailing whitespace → write `1\n`.
- Words separated by any mix of whitespace bytes count by maximal
  non-whitespace runs only; consecutive whitespace bytes never produce
  an empty word.

## Acceptance

The stdout produced by your program must match the expected bytes
exactly for every test case (public and hidden). Stderr must be empty
and the process must exit 0.

## Input convention

- **stdin** (TypeScript, Rust, Go, Python): read all bytes until EOF.
- **argv[1]** (Zero 0.1.2 has no exposed stdin capability): read the
  bytes from the first command-line argument.

## Token budget

1200 tokens.

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (byte-exact, per case)	true
stderr (exact bytes)	""
exit code	0
wall time max (ms)	5000
tags	io, parsing, counting

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model	Zero	TypeScript	Rust	Go	Python
gpt-4o	compile	✓	✓	✓	✓
gpt-4o-mini	compile	✓	✓	wrong output	wrong output
gpt-5	compile	✓	✓	✓	✓
opus	compile	✓	wrong output	✓	✓
sonnet	wrong output	✓	✓	✓	✓

Failure excerpts

8 of 25 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

gpt-4o Zero compile

ref.zero:6:8 PAR100: expected '{' before block

gpt-4o-mini Zero compile

ref.zero:3:17 PAR100: expected '{' before block

gpt-4o-mini Go wrong output
```
(no diagnostic captured)
```
gpt-4o-mini Python wrong output
```
(no diagnostic captured)
```

gpt-5 Zero compile

ref.zero:3:9 PAR100: expected '{' before block

opus Zero compile

ref.zero:1:8 PAR100: expected '{' before block

opus Rust wrong output
```
(no diagnostic captured)
```
sonnet Zero wrong output
```
(no diagnostic captured)
```

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero n/a

No reference implementation in this language.

TypeScript 23 lines

// Count whitespace-separated tokens in input, TypeScript reference.

import { readFileSync } from "node:fs";

function main(): void {
  const data = readFileSync(0);
  let count = 0;
  let inWord = false;
  for (let i = 0; i < data.length; i++) {
    const b = data[i];
    const isWS = b === 32 || b === 9 || b === 10 || b === 13;
    if (isWS) {
      inWord = false;
    } else if (!inWord) {
      count++;
      inWord = true;
    }
  }
  process.stdout.write(`${count}\n`);
}

main();

Rust 22 lines

// Count whitespace-separated tokens in input, Rust reference.

use std::io::{self, Read, Write};

fn main() {
    let mut data = Vec::new();
    io::stdin().read_to_end(&mut data).expect("read stdin");
    let mut count: u32 = 0;
    let mut in_word = false;
    for &b in &data {
        let is_ws = b == 32 || b == 9 || b == 10 || b == 13;
        if is_ws {
            in_word = false;
        } else if !in_word {
            count += 1;
            in_word = true;
        }
    }
    let mut stdout = io::stdout();
    writeln!(stdout, "{}", count).expect("write stdout");
}

Go 29 lines

// Count whitespace-separated tokens in input, Go reference.

package main

import (
	"io"
	"os"
	"strconv"
)

func main() {
	data, err := io.ReadAll(os.Stdin)
	if err != nil {
		os.Exit(1)
	}
	count := 0
	inWord := false
	for _, b := range data {
		isWS := b == 32 || b == 9 || b == 10 || b == 13
		if isWS {
			inWord = false
		} else if !inWord {
			count++
			inWord = true
		}
	}
	os.Stdout.WriteString(strconv.Itoa(count) + "\n")
}

Python 23 lines

#!/usr/bin/env python3
"""Count whitespace-separated tokens in input, Python reference."""

import sys


def main() -> None:
    data = sys.stdin.buffer.read()
    count = 0
    in_word = False
    for b in data:
        is_ws = b == 32 or b == 9 or b == 10 or b == 13
        if is_ws:
            in_word = False
        elif not in_word:
            count += 1
            in_word = True
    sys.stdout.write(f"{count}\n")


if __name__ == "__main__":
    main()

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/009-word-count/notes.md.

Algorithm

One-pass state machine over the input bytes:

in_word starts false, count starts 0.
For each byte b: if b is whitespace, set in_word = false; otherwise if in_word is false, increment count and set in_word = true.
Whitespace bytes: space (0x20), tab (0x09), newline (0x0A), CR (0x0D).
After the loop, emit count as decimal followed by one newline.

Edge cases

Empty input → count stays 0 → output 0\n.
All-whitespace input → no non-whitespace bytes → output 0\n.
Trailing whitespace (including a trailing newline from stdin) does not create a phantom word; the state machine simply stays at in_word = false past the end.
Single word with no trailing newline → output 1\n.
Tab, newline, and CR all act as separators identically to space.

Zero-specific notes

argv[1] is the input bytes; an absent argv falls back to an empty span. Embedded tabs and newlines inside argv[1] (passed as $'a\tb\nc' from bash) are preserved literally.
The whitespace classification is written as four independent if branches that each set is_ws = true. This avoids assuming || short-circuits in the Zero 0.1.2 direct backend; even though none of the byte comparisons here would trap, the explicit-flag pattern matches the safety shape used elsewhere in the corpus.
Zero 0.1.2 does not have a ! prefix operator for booleans. The natural shape if !in_word { ... } trips PAR100 at the !. Use the explicit comparison if in_word == false { ... } instead. This is the fourth Zero direct-backend quirk surfaced by the corpus; it joins the 2^32-divisor SIGFPE (task 002), the i32-vs-usize and bool-vs-Bool TYP002 quirks (task 004), and the && non-short-circuit trap (task 008) as silent-under-zero check codegen gaps.
The decimal renderer uses small-divisor u32 divmod (% 10_u32 / / 10_u32). This dodges the 2^32-divisor narrowing-cast SIGFPE documented in task 002 notes; 10 fits in u32 so the cast pattern is never triggered.

Cross-implementation parity

All five references treat the four whitespace bytes identically and walk the input byte-by-byte (no Unicode-aware tokenization). TypeScript and Python read raw byte buffers (readFileSync(0) and sys.stdin.buffer); Rust uses read_to_end for the same reason. Go uses io.ReadAll and ranges over the byte slice with _, b := range which yields bytes from a []byte. Byte-exact agreement holds on every case.

Cost

Model	Prompt tokens	Completion tokens	API ms
gpt-4o	2,580	655	7,003
gpt-4o-mini	2,580	581	10,779
gpt-5	2,575	11,976	113,233
opus	14	873	81,850
sonnet	14	852	62,836

Tokens and API ms are summed across the five languages this model attempted for this task.

Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 · opus · sonnet . Back to the leaderboard and methodology.

Truffle · 009-word-count · run captured 2026-06-27