Hello, stdout

000-hello-stdout. Write a complete program that prints the six bytes `hello\n` (lowercase,

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

# Task: Hello, stdout

Write a complete program that prints the six bytes `hello\n` (lowercase,
no quotes, single trailing newline) to standard output and exits with
status code 0. Read no input.

## Acceptance

- Stdout must contain exactly the six bytes `hello\n`.
- Stderr must be empty.
- Exit code must be 0.
- Program must complete within 5 seconds.

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (exact bytes)	"hello\n"
stderr (exact bytes)	""
exit code	0
wall time max (ms)	5000
tags	io, smoke-test

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model	Zero	TypeScript	Rust	Go	Python
gpt-4o	compile	✓	✓	✓	✓
gpt-4o-mini	compile	✓	✓	✓	✓
gpt-5	compile	✓	✓	✓	✓
opus	compile	✓	✓	✓	✓
sonnet	wrong output	✓	✓	✓	✓

Failure excerpts

5 of 25 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

gpt-4o Zero compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

gpt-4o-mini Zero compile

ref.zero:1:1 PAR100: expected '{' before block

gpt-5 Zero compile

ref.zero:1:10 PAR100: expected '{' before block

opus Zero compile

ref.zero:1:12 PAR100: expected '{' before block

sonnet Zero wrong output
```
(no diagnostic captured)
```

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero n/a

No reference implementation in this language.

TypeScript 2 lines

process.stdout.write("hello\n");

Rust 5 lines

// ref.rs
fn main() {
    print!("hello\n");
}

Go 9 lines

// ref.go
package main

import "fmt"

func main() {
	fmt.Print("hello\n")
}

Python 3 lines

import sys
sys.stdout.write("hello\n")

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/000-hello-stdout/notes.md.

000-hello-stdout — notes

Smoke-test task. Probes the multi-language toolchain plumbing, not model capability. All five reference implementations must pass on day one or the harness cannot be trusted.

Why this exists

If a frontier model fails this task, something is wrong with the prompt template, the sandbox, or the scoring grader. Use it as a canary on every harness change.

Calibration

Zero: bin/zero run ref.zero from a Zero checkout. Output is exactly hello\n.
TS: node or tsx against ref.ts. Pure stdio API, no async surprises.
Rust: rustc ref.rs -o hello && ./hello. No println! because the spec is exact-byte; print! is the right call.
Go: go run ref.go. fmt.Print (not fmt.Println) because the trailing newline is part of the byte sequence we're matching exactly.
Python: sys.stdout.write instead of print() to avoid Python's print-end behavior; the spec wants exactly seven bytes.

Cost

Model	Prompt tokens	Completion tokens	API ms
gpt-4o	1,720	124	3,578
gpt-4o-mini	1,720	120	7,162
gpt-5	1,715	4,085	58,359
opus	14	134	60,671
sonnet	14	178	56,735

Tokens and API ms are summed across the five languages this model attempted for this task.

Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 · opus · sonnet . Back to the leaderboard and methodology.

Truffle · 000-hello-stdout · run captured 2026-06-27