agentlang-index · task easy
Count whitespace-separated tokens in input
009-word-count. Read all bytes from standard input.
Prompt
This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.
# Task 009 — word-count
## Problem
Read all bytes from standard input. A **word** is a maximal run of
non-whitespace bytes. The whitespace bytes are:
- space (0x20)
- tab (0x09)
- newline (0x0A)
- carriage return (0x0D)
Write the total word count as a decimal integer to standard output,
followed by exactly one newline. Exit with status 0.
## Edge cases
- Empty input → write `0\n`.
- All-whitespace input → write `0\n`.
- Single word with no trailing whitespace → write `1\n`.
- Words separated by any mix of whitespace bytes count by maximal
non-whitespace runs only; consecutive whitespace bytes never produce
an empty word.
## Acceptance
The stdout produced by your program must match the expected bytes
exactly for every test case (public and hidden). Stderr must be empty
and the process must exit 0.
## Input convention
- **stdin** (TypeScript, Rust, Go, Python): read all bytes until EOF.
- **argv[1]** (Zero 0.1.2 has no exposed stdin capability): read the
bytes from the first command-line argument.
## Token budget
1200 tokens.
Acceptance
A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."
| stdout (byte-exact, per case) | true |
|---|---|
| stderr (exact bytes) | "" |
| exit code | 0 |
| wall time max (ms) | 5000 |
| tags | io, parsing, counting |
Results
Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.
| Model | Zero | TypeScript | Rust | Go | Python |
|---|---|---|---|---|---|
| gpt-4o | compile | ✓ | ✓ | ✓ | ✓ |
| gpt-4o-mini | compile | ✓ | ✓ | wrong output | wrong output |
| gpt-5 | compile | ✓ | ✓ | ✓ | ✓ |
Failure excerpts
5 of 15 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.
-
ref.zero:6:8 PAR100: expected '{' before block -
ref.zero:3:17 PAR100: expected '{' before block -
(no diagnostic captured) -
(no diagnostic captured) -
ref.zero:3:9 PAR100: expected '{' before block
Reference implementations
The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.
Click a language to expand
Zero 79 lines
// Count whitespace-separated tokens in input, Zero 0.1.2 direct backend.
//
// argv[1] is the input bytes (Zero 0.1.2 has no exposed stdin capability).
// All logic stays inside `pub fun main` to avoid Span/MutSpan parameter
// restrictions.
//
// One-pass state machine: track `in_word`. Each transition from
// whitespace to non-whitespace increments the count. Whitespace bytes
// are 0x20, 0x09, 0x0A, 0x0D.
//
// Decimal renderer uses small-divisor u32 divmod (avoids the 2^32-
// divisor SIGFPE narrowing-cast bug surfaced in earlier corpus work).
pub fun main(world: World) -> Void raises {
let line_opt = std.args.get(1)
let mut bytes: Span<u8> = std.mem.span("")
if line_opt.has {
bytes = std.mem.span(line_opt.value)
}
let n: usize = std.mem.len(bytes)
let mut count: u32 = 0_u32
let mut in_word: Bool = false
let mut i: usize = 0
while i < n {
let b: u8 = bytes[i]
let mut is_ws: Bool = false
if b == 32_u8 {
is_ws = true
}
if b == 9_u8 {
is_ws = true
}
if b == 10_u8 {
is_ws = true
}
if b == 13_u8 {
is_ws = true
}
if is_ws {
in_word = false
} else {
if in_word == false {
count = count + 1_u32
in_word = true
}
}
i = i + 1
}
let mut out_buf: [16]u8 = [0_u8; 16]
let mut written: usize = 0
if count == 0_u32 {
out_buf[0] = 48_u8
written = 1
} else {
let mut tmp: [16]u8 = [0_u8; 16]
let mut tn: usize = 0
let mut c: u32 = count
while c > 0_u32 {
let d: u32 = c % 10_u32
tmp[tn] = (48_u32 + d) as u8
tn = tn + 1
c = c / 10_u32
}
let mut j: usize = 0
while j < tn {
out_buf[j] = tmp[tn - 1 - j]
j = j + 1
}
written = tn
}
out_buf[written] = 10_u8
written = written + 1
let out: Span<u8> = out_buf[0..written]
check world.out.write(out)
return
}
TypeScript 23 lines
// Count whitespace-separated tokens in input, TypeScript reference.
import { readFileSync } from "node:fs";
function main(): void {
const data = readFileSync(0);
let count = 0;
let inWord = false;
for (let i = 0; i < data.length; i++) {
const b = data[i];
const isWS = b === 32 || b === 9 || b === 10 || b === 13;
if (isWS) {
inWord = false;
} else if (!inWord) {
count++;
inWord = true;
}
}
process.stdout.write(`${count}\n`);
}
main();
Rust 22 lines
// Count whitespace-separated tokens in input, Rust reference.
use std::io::{self, Read, Write};
fn main() {
let mut data = Vec::new();
io::stdin().read_to_end(&mut data).expect("read stdin");
let mut count: u32 = 0;
let mut in_word = false;
for &b in &data {
let is_ws = b == 32 || b == 9 || b == 10 || b == 13;
if is_ws {
in_word = false;
} else if !in_word {
count += 1;
in_word = true;
}
}
let mut stdout = io::stdout();
writeln!(stdout, "{}", count).expect("write stdout");
}
Go 29 lines
// Count whitespace-separated tokens in input, Go reference.
package main
import (
"io"
"os"
"strconv"
)
func main() {
data, err := io.ReadAll(os.Stdin)
if err != nil {
os.Exit(1)
}
count := 0
inWord := false
for _, b := range data {
isWS := b == 32 || b == 9 || b == 10 || b == 13
if isWS {
inWord = false
} else if !inWord {
count++
inWord = true
}
}
os.Stdout.WriteString(strconv.Itoa(count) + "\n")
}
Python 23 lines
#!/usr/bin/env python3
"""Count whitespace-separated tokens in input, Python reference."""
import sys
def main() -> None:
data = sys.stdin.buffer.read()
count = 0
in_word = False
for b in data:
is_ws = b == 32 or b == 9 or b == 10 or b == 13
if is_ws:
in_word = False
elif not in_word:
count += 1
in_word = True
sys.stdout.write(f"{count}\n")
if __name__ == "__main__":
main()
Design notes
Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/009-word-count/notes.md.
Algorithm
One-pass state machine over the input bytes:
in_wordstartsfalse,countstarts0.- For each byte
b: ifbis whitespace, setin_word = false; otherwise ifin_wordis false, incrementcountand setin_word = true. - Whitespace bytes: space (0x20), tab (0x09), newline (0x0A), CR (0x0D).
- After the loop, emit
countas decimal followed by one newline.
Edge cases
- Empty input →
countstays 0 → output0\n. - All-whitespace input → no non-whitespace bytes → output
0\n. - Trailing whitespace (including a trailing newline from stdin) does
not create a phantom word; the state machine simply stays at
in_word = falsepast the end. - Single word with no trailing newline → output
1\n. - Tab, newline, and CR all act as separators identically to space.
Zero-specific notes
- argv[1] is the input bytes; an absent argv falls back to an empty
span. Embedded tabs and newlines inside argv[1] (passed as
$'a\tb\nc'from bash) are preserved literally. - The whitespace classification is written as four independent
ifbranches that each setis_ws = true. This avoids assuming||short-circuits in the Zero 0.1.2 direct backend; even though none of the byte comparisons here would trap, the explicit-flag pattern matches the safety shape used elsewhere in the corpus. - Zero 0.1.2 does not have a
!prefix operator for booleans. The natural shapeif !in_word { ... }trips PAR100 at the!. Use the explicit comparisonif in_word == false { ... }instead. This is the fourth Zero direct-backend quirk surfaced by the corpus; it joins the 2^32-divisor SIGFPE (task 002), the i32-vs-usize and bool-vs-Bool TYP002 quirks (task 004), and the&&non-short-circuit trap (task 008) as silent-under-zero checkcodegen gaps. - The decimal renderer uses small-divisor u32 divmod (
% 10_u32// 10_u32). This dodges the 2^32-divisor narrowing-cast SIGFPE documented in task 002 notes; 10 fits in u32 so the cast pattern is never triggered.
Cross-implementation parity
All five references treat the four whitespace bytes identically and
walk the input byte-by-byte (no Unicode-aware tokenization). TypeScript
and Python read raw byte buffers (readFileSync(0) and
sys.stdin.buffer); Rust uses read_to_end for the same reason. Go
uses io.ReadAll and ranges over the byte slice with _, b := range
which yields bytes from a []byte. Byte-exact agreement holds on
every case.
Cost
| Model | Prompt tokens | Completion tokens | API ms |
|---|---|---|---|
| gpt-4o | 2,580 | 655 | 7,003 |
| gpt-4o-mini | 2,580 | 581 | 10,779 |
| gpt-5 | 2,575 | 11,976 | 113,233 |
Tokens and API ms are summed across the five languages this model attempted for this task.
Compare
Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 . Back to the leaderboard and methodology.