agentlang-index · task easy

Count whitespace-separated tokens in input

009-word-count. Read all bytes from standard input.

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

# Task 009 — word-count

## Problem

Read all bytes from standard input. A **word** is a maximal run of
non-whitespace bytes. The whitespace bytes are:

- space (0x20)
- tab (0x09)
- newline (0x0A)
- carriage return (0x0D)

Write the total word count as a decimal integer to standard output,
followed by exactly one newline. Exit with status 0.

## Edge cases

- Empty input → write `0\n`.
- All-whitespace input → write `0\n`.
- Single word with no trailing whitespace → write `1\n`.
- Words separated by any mix of whitespace bytes count by maximal
  non-whitespace runs only; consecutive whitespace bytes never produce
  an empty word.

## Acceptance

The stdout produced by your program must match the expected bytes
exactly for every test case (public and hidden). Stderr must be empty
and the process must exit 0.

## Input convention

- **stdin** (TypeScript, Rust, Go, Python): read all bytes until EOF.
- **argv[1]** (Zero 0.1.2 has no exposed stdin capability): read the
  bytes from the first command-line argument.

## Token budget

1200 tokens.

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (byte-exact, per case) true
stderr (exact bytes) ""
exit code 0
wall time max (ms) 5000
tags io, parsing, counting

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model ZeroTypeScriptRustGoPython
gpt-4o compile
gpt-4o-mini compile wrong output wrong output
gpt-5 compile

Failure excerpts

5 of 15 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

  1. gpt-4o Zero compile
    ref.zero:6:8 PAR100: expected '{' before block
  2. gpt-4o-mini Zero compile
    ref.zero:3:17 PAR100: expected '{' before block
  3. gpt-4o-mini Go wrong output
    (no diagnostic captured)
  4. gpt-4o-mini Python wrong output
    (no diagnostic captured)
  5. gpt-5 Zero compile
    ref.zero:3:9 PAR100: expected '{' before block

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero 79 lines
// Count whitespace-separated tokens in input, Zero 0.1.2 direct backend.
//
// argv[1] is the input bytes (Zero 0.1.2 has no exposed stdin capability).
// All logic stays inside `pub fun main` to avoid Span/MutSpan parameter
// restrictions.
//
// One-pass state machine: track `in_word`. Each transition from
// whitespace to non-whitespace increments the count. Whitespace bytes
// are 0x20, 0x09, 0x0A, 0x0D.
//
// Decimal renderer uses small-divisor u32 divmod (avoids the 2^32-
// divisor SIGFPE narrowing-cast bug surfaced in earlier corpus work).
pub fun main(world: World) -> Void raises {
    let line_opt = std.args.get(1)
    let mut bytes: Span<u8> = std.mem.span("")
    if line_opt.has {
        bytes = std.mem.span(line_opt.value)
    }
    let n: usize = std.mem.len(bytes)

    let mut count: u32 = 0_u32
    let mut in_word: Bool = false
    let mut i: usize = 0
    while i < n {
        let b: u8 = bytes[i]
        let mut is_ws: Bool = false
        if b == 32_u8 {
            is_ws = true
        }
        if b == 9_u8 {
            is_ws = true
        }
        if b == 10_u8 {
            is_ws = true
        }
        if b == 13_u8 {
            is_ws = true
        }
        if is_ws {
            in_word = false
        } else {
            if in_word == false {
                count = count + 1_u32
                in_word = true
            }
        }
        i = i + 1
    }

    let mut out_buf: [16]u8 = [0_u8; 16]
    let mut written: usize = 0
    if count == 0_u32 {
        out_buf[0] = 48_u8
        written = 1
    } else {
        let mut tmp: [16]u8 = [0_u8; 16]
        let mut tn: usize = 0
        let mut c: u32 = count
        while c > 0_u32 {
            let d: u32 = c % 10_u32
            tmp[tn] = (48_u32 + d) as u8
            tn = tn + 1
            c = c / 10_u32
        }
        let mut j: usize = 0
        while j < tn {
            out_buf[j] = tmp[tn - 1 - j]
            j = j + 1
        }
        written = tn
    }
    out_buf[written] = 10_u8
    written = written + 1

    let out: Span<u8> = out_buf[0..written]
    check world.out.write(out)
    return
}
TypeScript 23 lines
// Count whitespace-separated tokens in input, TypeScript reference.

import { readFileSync } from "node:fs";

function main(): void {
  const data = readFileSync(0);
  let count = 0;
  let inWord = false;
  for (let i = 0; i < data.length; i++) {
    const b = data[i];
    const isWS = b === 32 || b === 9 || b === 10 || b === 13;
    if (isWS) {
      inWord = false;
    } else if (!inWord) {
      count++;
      inWord = true;
    }
  }
  process.stdout.write(`${count}\n`);
}

main();
Rust 22 lines
// Count whitespace-separated tokens in input, Rust reference.

use std::io::{self, Read, Write};

fn main() {
    let mut data = Vec::new();
    io::stdin().read_to_end(&mut data).expect("read stdin");
    let mut count: u32 = 0;
    let mut in_word = false;
    for &b in &data {
        let is_ws = b == 32 || b == 9 || b == 10 || b == 13;
        if is_ws {
            in_word = false;
        } else if !in_word {
            count += 1;
            in_word = true;
        }
    }
    let mut stdout = io::stdout();
    writeln!(stdout, "{}", count).expect("write stdout");
}
Go 29 lines
// Count whitespace-separated tokens in input, Go reference.

package main

import (
	"io"
	"os"
	"strconv"
)

func main() {
	data, err := io.ReadAll(os.Stdin)
	if err != nil {
		os.Exit(1)
	}
	count := 0
	inWord := false
	for _, b := range data {
		isWS := b == 32 || b == 9 || b == 10 || b == 13
		if isWS {
			inWord = false
		} else if !inWord {
			count++
			inWord = true
		}
	}
	os.Stdout.WriteString(strconv.Itoa(count) + "\n")
}
Python 23 lines
#!/usr/bin/env python3
"""Count whitespace-separated tokens in input, Python reference."""

import sys


def main() -> None:
    data = sys.stdin.buffer.read()
    count = 0
    in_word = False
    for b in data:
        is_ws = b == 32 or b == 9 or b == 10 or b == 13
        if is_ws:
            in_word = False
        elif not in_word:
            count += 1
            in_word = True
    sys.stdout.write(f"{count}\n")


if __name__ == "__main__":
    main()

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/009-word-count/notes.md.

Algorithm

One-pass state machine over the input bytes:

  • in_word starts false, count starts 0.
  • For each byte b: if b is whitespace, set in_word = false; otherwise if in_word is false, increment count and set in_word = true.
  • Whitespace bytes: space (0x20), tab (0x09), newline (0x0A), CR (0x0D).
  • After the loop, emit count as decimal followed by one newline.

Edge cases

  • Empty input → count stays 0 → output 0\n.
  • All-whitespace input → no non-whitespace bytes → output 0\n.
  • Trailing whitespace (including a trailing newline from stdin) does not create a phantom word; the state machine simply stays at in_word = false past the end.
  • Single word with no trailing newline → output 1\n.
  • Tab, newline, and CR all act as separators identically to space.

Zero-specific notes

  • argv[1] is the input bytes; an absent argv falls back to an empty span. Embedded tabs and newlines inside argv[1] (passed as $'a\tb\nc' from bash) are preserved literally.
  • The whitespace classification is written as four independent if branches that each set is_ws = true. This avoids assuming || short-circuits in the Zero 0.1.2 direct backend; even though none of the byte comparisons here would trap, the explicit-flag pattern matches the safety shape used elsewhere in the corpus.
  • Zero 0.1.2 does not have a ! prefix operator for booleans. The natural shape if !in_word { ... } trips PAR100 at the !. Use the explicit comparison if in_word == false { ... } instead. This is the fourth Zero direct-backend quirk surfaced by the corpus; it joins the 2^32-divisor SIGFPE (task 002), the i32-vs-usize and bool-vs-Bool TYP002 quirks (task 004), and the && non-short-circuit trap (task 008) as silent-under-zero check codegen gaps.
  • The decimal renderer uses small-divisor u32 divmod (% 10_u32 / / 10_u32). This dodges the 2^32-divisor narrowing-cast SIGFPE documented in task 002 notes; 10 fits in u32 so the cast pattern is never triggered.

Cross-implementation parity

All five references treat the four whitespace bytes identically and walk the input byte-by-byte (no Unicode-aware tokenization). TypeScript and Python read raw byte buffers (readFileSync(0) and sys.stdin.buffer); Rust uses read_to_end for the same reason. Go uses io.ReadAll and ranges over the byte slice with _, b := range which yields bytes from a []byte. Byte-exact agreement holds on every case.


Cost

Model Prompt tokens Completion tokens API ms
gpt-4o 2,580 655 7,003
gpt-4o-mini 2,580 581 10,779
gpt-5 2,575 11,976 113,233

Tokens and API ms are summed across the five languages this model attempted for this task.


Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 . Back to the leaderboard and methodology.