agentlang-index · model deep-dive

gpt-4o-mini

One model, twenty tasks, five languages, byte-exact scoring. Pass rate is 56% overall, 0% in Zero, average tax versus the other four languages is +70%.

0% Zero

70% TypeScript

65% Rust

75% Go

70% Python

Pass rate per language. Each bar is the count of tasks (out of 20) the model passed in that language.

Per-task results

Every cell is a single attempt. Pass means stdout matched byte-exact on every public and hidden test case, stderr empty, exit zero. Click a failure to see the first line of the diagnostic.

Task	Zero	TypeScript	Rust	Go	Python
000-hello-stdout Hello, stdout	compile	✓	✓	✓	✓
001-fibonacci-memoized Fibonacci with memoization	compile	✓	✓	✓	✓
002-sieve-prime-count Prime count via Sieve of Eratosthenes	compile	✓	other	✓	✓
003-levenshtein-distance Levenshtein edit distance	compile	other	✓	✓	✓
004-matrix-multiply Square integer matrix multiply	compile	✓	✓	other	✓
005-balanced-parens Balanced bracket checker	compile	✓	✓	✓	✓
006-substring-count Non-overlapping substring count	compile	✓	✓	✓	✓
007-csv-line-tokenize CSV line tokenizer (RFC 4180 subset)	compile	✓	✓	other	other
008-word-reverse Reverse the order of words on a line	compile	✓	✓	✓	✓
009-word-count Count whitespace-separated tokens in input	compile	✓	✓	wrong output	wrong output
010-byte-frequency Per-byte frequency table sorted by byte value	compile	✓	wrong output	✓	wrong output
011-rle-encode Run-length encode the input as count/byte pairs	compile	✓	✓	✓	✓
012-http-status-code GET a URL and write the HTTP status code	compile	wrong output	wrong output	✓	other
013-http-json-sum POST a JSON pair and extract the sum	compile	wrong output	wrong output	✓	other
014-http-header-echo GET a URL and echo a named response header	compile	runtime	wrong output	✓	other
015-checked-divide-u32 Parse two unsigned integers and write their integer quotient, or error on any failure	compile	other	✓	✓	✓
016-parse-list-sum Read a count then that many u32 integers and write their sum, or error on any failure	compile	✓	wrong output	✓	✓
017-checked-add-overflow Parse two unsigned integers and write their sum, or error on parse failure or u32 overflow	compile	✓	✓	✓	✓
018-caesar-cipher Shift a lowercase ASCII string by a Caesar offset, or error on bad input	compile	✓	✓	other	✓
019-run-length-encode Run-length encode a lowercase ASCII string, or error on bad input	compile	wrong output	other	wrong output	✓

Failure modes

Each failed attempt classifies as compile (parser, type checker, codegen, or build-system error before the program could run), runtime (program ran but crashed or threw), or wrong output (program ran cleanly but emitted the wrong bytes).

Language	pass	compile	runtime	wrong output	other
Zero	0	20	0	0	0
TypeScript	14	0	1	3	2
Rust	13	0	0	5	2
Go	15	0	0	2	3
Python	14	0	0	2	4

Zero deep-dive

Every Zero attempt failed. Below is each task with the first line of the captured diagnostic. The pattern across tasks is the signal worth reading — the same handful of error codes recur.

000-hello-stdout Hello, stdout compile

ref.zero:1:1 PAR100: expected '{' before block

001-fibonacci-memoized Fibonacci with memoization compile
```
ref.zero:4:1 PAR100: expected '{' before block
```
002-sieve-prime-count Prime count via Sieve of Eratosthenes compile
```
ref.zero:4:1 PAR100: expected '{' before block
```
003-levenshtein-distance Levenshtein edit distance compile
```
ref.zero:9:1 PAR100: expected '{' before block
```
004-matrix-multiply Square integer matrix multiply compile
```
ref.zero:3:10 PAR100: expected '{' before block
```
005-balanced-parens Balanced bracket checker compile
```
ref.zero:3:18 PAR100: expected expression
```
006-substring-count Non-overlapping substring count compile
```
ref.zero:1:1 PAR100: expected '{' before block
```
007-csv-line-tokenize CSV line tokenizer (RFC 4180 subset) compile
```
ref.zero:1:15 PAR100: expected expression
```
008-word-reverse Reverse the order of words on a line compile
```
ref.zero:3:8 PAR100: expected '{' before block
```
009-word-count Count whitespace-separated tokens in input compile
```
ref.zero:3:17 PAR100: expected '{' before block
```
010-byte-frequency Per-byte frequency table sorted by byte value compile
```
ref.zero:1:15 PAR100: expected expression
```
011-rle-encode Run-length encode the input as count/byte pairs compile
```
ref.zero:1:15 PAR100: expected expression
```
012-http-status-code GET a URL and write the HTTP status code compile
```
ref.zero:1:1 IMP001: unknown package-local import 'lib http'
```

013-http-json-sum POST a JSON pair and extract the sum compile

ref.zero:1:1 IMP001: unknown package-local import 'lib http'

014-http-header-echo GET a URL and echo a named response header compile
```
ref.zero:1:1 IMP001: unknown package-local import 'lib http'
```
015-checked-divide-u32 Parse two unsigned integers and write their integer quotient, or error on any failure compile
```
ref.zero:3:13 PAR100: expected '{' before block
```
016-parse-list-sum Read a count then that many u32 integers and write their sum, or error on any failure compile
```
ref.zero:3:13 PAR100: expected '{' before block
```
017-checked-add-overflow Parse two unsigned integers and write their sum, or error on parse failure or u32 overflow compile
```
ref.zero:3:13 PAR100: expected '{' before block
```
018-caesar-cipher Shift a lowercase ASCII string by a Caesar offset, or error on bad input compile
```
zero/src/main.0:1:1 IMP001: unknown package-local import '"src/lib.0"'
```
019-run-length-encode Run-length encode a lowercase ASCII string, or error on bad input compile
```
zero/src/lib.0:23:1 PAR100: unexpected character '`'
```

Cost

Prompt tokens	64,610
Completion tokens	21,038
Total tokens	85,648
Attempts	100 (56 passed)

Compare

Other models in this run: gpt-5 , sonnet , opus , gpt-4o . Or back to the leaderboard and methodology.

Want to re-run this end-to-end? See the per-run reproducibility page: reproduce this run.

Truffle · gpt-4o-mini · run captured 2026-05-19