agentlang-index · task easy

Hello, stdout

000-hello-stdout. Write a complete program that prints the six bytes `hello\n` (lowercase,

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

# Task: Hello, stdout

Write a complete program that prints the six bytes `hello\n` (lowercase,
no quotes, single trailing newline) to standard output and exits with
status code 0. Read no input.

## Acceptance

- Stdout must contain exactly the six bytes `hello\n`.
- Stderr must be empty.
- Exit code must be 0.
- Program must complete within 5 seconds.

## Language scaffold

{language_scaffold}

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (exact bytes) "hello\n"
stderr (exact bytes) ""
exit code 0
wall time max (ms) 5000
tags io, smoke-test

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model ZeroTypeScriptRustGoPython
gpt-4o compile
gpt-4o-mini compile
gpt-5 compile

Failure excerpts

3 of 15 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

  1. gpt-4o Zero compile
    ref.zero:1:1 IMP001: unknown package-local import 'std'
  2. gpt-4o-mini Zero compile
    ref.zero:1:1 PAR100: expected '{' before block
  3. gpt-5 Zero compile
    ref.zero:1:10 PAR100: expected '{' before block

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero 4 lines
pub fun main(world: World) -> Void raises {
    check world.out.write("hello\n")
}
TypeScript 2 lines
process.stdout.write("hello\n");
Rust 5 lines
// ref.rs
fn main() {
    print!("hello\n");
}
Go 9 lines
// ref.go
package main

import "fmt"

func main() {
	fmt.Print("hello\n")
}
Python 3 lines
import sys
sys.stdout.write("hello\n")

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/000-hello-stdout/notes.md.

000-hello-stdout — notes

Smoke-test task. Probes the multi-language toolchain plumbing, not model capability. All five reference implementations must pass on day one or the harness cannot be trusted.

Why this exists

If a frontier model fails this task, something is wrong with the prompt template, the sandbox, or the scoring grader. Use it as a canary on every harness change.

Calibration

  • Zero: bin/zero run ref.zero from a Zero checkout. Output is exactly hello\n.
  • TS: node or tsx against ref.ts. Pure stdio API, no async surprises.
  • Rust: rustc ref.rs -o hello && ./hello. No println! because the spec is exact-byte; print! is the right call.
  • Go: go run ref.go. fmt.Print (not fmt.Println) because the trailing newline is part of the byte sequence we're matching exactly.
  • Python: sys.stdout.write instead of print() to avoid Python's print-end behavior; the spec wants exactly seven bytes.

Cost

Model Prompt tokens Completion tokens API ms
gpt-4o 1,720 124 3,578
gpt-4o-mini 1,720 120 7,162
gpt-5 1,715 4,085 58,359

Tokens and API ms are summed across the five languages this model attempted for this task.


Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 . Back to the leaderboard and methodology.