agentlang-index · task easy
Hello, stdout
000-hello-stdout. Write a complete program that prints the six bytes `hello\n` (lowercase,
Prompt
This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.
# Task: Hello, stdout
Write a complete program that prints the six bytes `hello\n` (lowercase,
no quotes, single trailing newline) to standard output and exits with
status code 0. Read no input.
## Acceptance
- Stdout must contain exactly the six bytes `hello\n`.
- Stderr must be empty.
- Exit code must be 0.
- Program must complete within 5 seconds.
## Language scaffold
{language_scaffold}
Acceptance
A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."
| stdout (exact bytes) | "hello\n" |
|---|---|
| stderr (exact bytes) | "" |
| exit code | 0 |
| wall time max (ms) | 5000 |
| tags | io, smoke-test |
Results
Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.
| Model | Zero | TypeScript | Rust | Go | Python |
|---|---|---|---|---|---|
| gpt-4o | compile | ✓ | ✓ | ✓ | ✓ |
| gpt-4o-mini | compile | ✓ | ✓ | ✓ | ✓ |
| gpt-5 | compile | ✓ | ✓ | ✓ | ✓ |
Failure excerpts
3 of 15 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.
-
ref.zero:1:1 IMP001: unknown package-local import 'std' -
ref.zero:1:1 PAR100: expected '{' before block -
ref.zero:1:10 PAR100: expected '{' before block
Reference implementations
The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.
Click a language to expand
Zero 4 lines
pub fun main(world: World) -> Void raises {
check world.out.write("hello\n")
}
TypeScript 2 lines
process.stdout.write("hello\n");
Rust 5 lines
// ref.rs
fn main() {
print!("hello\n");
}
Go 9 lines
// ref.go
package main
import "fmt"
func main() {
fmt.Print("hello\n")
}
Python 3 lines
import sys
sys.stdout.write("hello\n")
Design notes
Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/000-hello-stdout/notes.md.
000-hello-stdout — notes
Smoke-test task. Probes the multi-language toolchain plumbing, not model capability. All five reference implementations must pass on day one or the harness cannot be trusted.
Why this exists
If a frontier model fails this task, something is wrong with the prompt template, the sandbox, or the scoring grader. Use it as a canary on every harness change.
Calibration
- Zero:
bin/zero run ref.zerofrom a Zero checkout. Output is exactlyhello\n. - TS:
nodeortsxagainstref.ts. Pure stdio API, no async surprises. - Rust:
rustc ref.rs -o hello && ./hello. Noprintln!because the spec is exact-byte;print!is the right call. - Go:
go run ref.go.fmt.Print(notfmt.Println) because the trailing newline is part of the byte sequence we're matching exactly. - Python:
sys.stdout.writeinstead ofprint()to avoid Python's print-end behavior; the spec wants exactly seven bytes.
Cost
| Model | Prompt tokens | Completion tokens | API ms |
|---|---|---|---|
| gpt-4o | 1,720 | 124 | 3,578 |
| gpt-4o-mini | 1,720 | 120 | 7,162 |
| gpt-5 | 1,715 | 4,085 | 58,359 |
Tokens and API ms are summed across the five languages this model attempted for this task.
Compare
Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 . Back to the leaderboard and methodology.