agentlang-index · model deep-dive

gpt-4o

One model, twenty tasks, five languages, byte-exact scoring. Pass rate is 70% overall, 0% in Zero, average tax versus the other four languages is +88%.

0% Zero

95% TypeScript

75% Rust

95% Go

85% Python

Pass rate per language. Each bar is the count of tasks (out of 20) the model passed in that language.

Per-task results

Every cell is a single attempt. Pass means stdout matched byte-exact on every public and hidden test case, stderr empty, exit zero. Click a failure to see the first line of the diagnostic.

Task	Zero	TypeScript	Rust	Go	Python
000-hello-stdout Hello, stdout	compile	✓	✓	✓	✓
001-fibonacci-memoized Fibonacci with memoization	compile	✓	✓	✓	✓
002-sieve-prime-count Prime count via Sieve of Eratosthenes	compile	✓	✓	✓	✓
003-levenshtein-distance Levenshtein edit distance	compile	✓	✓	✓	✓
004-matrix-multiply Square integer matrix multiply	compile	✓	✓	other	✓
005-balanced-parens Balanced bracket checker	compile	✓	✓	✓	✓
006-substring-count Non-overlapping substring count	compile	✓	✓	✓	✓
007-csv-line-tokenize CSV line tokenizer (RFC 4180 subset)	compile	✓	other	✓	✓
008-word-reverse Reverse the order of words on a line	compile	✓	✓	✓	✓
009-word-count Count whitespace-separated tokens in input	compile	✓	✓	✓	✓
010-byte-frequency Per-byte frequency table sorted by byte value	compile	✓	✓	✓	✓
011-rle-encode Run-length encode the input as count/byte pairs	compile	✓	✓	✓	✓
012-http-status-code GET a URL and write the HTTP status code	compile	✓	wrong output	✓	other
013-http-json-sum POST a JSON pair and extract the sum	compile	✓	wrong output	✓	other
014-http-header-echo GET a URL and echo a named response header	compile	✓	wrong output	✓	other
015-checked-divide-u32 Parse two unsigned integers and write their integer quotient, or error on any failure	compile	✓	✓	✓	✓
016-parse-list-sum Read a count then that many u32 integers and write their sum, or error on any failure	compile	✓	wrong output	✓	✓
017-checked-add-overflow Parse two unsigned integers and write their sum, or error on parse failure or u32 overflow	compile	✓	✓	✓	✓
018-caesar-cipher Shift a lowercase ASCII string by a Caesar offset, or error on bad input	compile	✓	✓	✓	✓
019-run-length-encode Run-length encode a lowercase ASCII string, or error on bad input	compile	wrong output	✓	✓	✓

Failure modes

Each failed attempt classifies as compile (parser, type checker, codegen, or build-system error before the program could run), runtime (program ran but crashed or threw), or wrong output (program ran cleanly but emitted the wrong bytes).

Language	pass	compile	wrong output	other
Zero	0	20	0	0
TypeScript	19	0	1	0
Rust	15	0	4	1
Go	19	0	0	1
Python	17	0	0	3

Zero deep-dive

Every Zero attempt failed. Below is each task with the first line of the captured diagnostic. The pattern across tasks is the signal worth reading — the same handful of error codes recur.

000-hello-stdout Hello, stdout compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

001-fibonacci-memoized Fibonacci with memoization compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

002-sieve-prime-count Prime count via Sieve of Eratosthenes compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
003-levenshtein-distance Levenshtein edit distance compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
004-matrix-multiply Square integer matrix multiply compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```

005-balanced-parens Balanced bracket checker compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

006-substring-count Non-overlapping substring count compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
007-csv-line-tokenize CSV line tokenizer (RFC 4180 subset) compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
008-word-reverse Reverse the order of words on a line compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
009-word-count Count whitespace-separated tokens in input compile
```
ref.zero:6:8 PAR100: expected '{' before block
```
010-byte-frequency Per-byte frequency table sorted by byte value compile
```
ref.zero:6:8 PAR100: expected '{' before block
```
011-rle-encode Run-length encode the input as count/byte pairs compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
012-http-status-code GET a URL and write the HTTP status code compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std::net'
```

013-http-json-sum POST a JSON pair and extract the sum compile

ref.zero:1:1 IMP001: unknown package-local import 'std::net'

014-http-header-echo GET a URL and echo a named response header compile
```
ref.zero:1:1 IMP001: unknown package-local import 'http'
```
015-checked-divide-u32 Parse two unsigned integers and write their integer quotient, or error on any failure compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
016-parse-list-sum Read a count then that many u32 integers and write their sum, or error on any failure compile
```
ref.zero:6:8 PAR100: expected '{' before block
```
017-checked-add-overflow Parse two unsigned integers and write their sum, or error on parse failure or u32 overflow compile
```
ref.zero:1:1 IMP001: unknown package-local import 'std'
```
018-caesar-cipher Shift a lowercase ASCII string by a Caesar offset, or error on bad input compile
```
zero/src/lib.0:8:1 PAR100: unexpected character '`'
```
019-run-length-encode Run-length encode a lowercase ASCII string, or error on bad input compile
```
zero/src/lib.0:28:1 PAR100: unexpected character '`'
```

Cost

Prompt tokens	64,610
Completion tokens	22,803
Total tokens	87,413
Attempts	100 (70 passed)

Compare

Other models in this run: gpt-5 , sonnet , opus , gpt-4o-mini . Or back to the leaderboard and methodology.

Want to re-run this end-to-end? See the per-run reproducibility page: reproduce this run.

Truffle · gpt-4o · run captured 2026-05-19