Square integer matrix multiply

004-matrix-multiply. Read two `N x N` integer matrices `A` and `B` from standard input, compute the

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

## Task: Square integer matrix multiply

Read two `N x N` integer matrices `A` and `B` from standard input, compute the
matrix product `C = A * B`, and write `C` to standard output.

**Input format** (stdin):

```
N
a_11 a_12 ... a_1N
a_21 a_22 ... a_2N
...
a_N1 a_N2 ... a_NN
b_11 b_12 ... b_1N
b_21 b_22 ... b_2N
...
b_N1 b_N2 ... b_NN
```

- `1 <= N <= 5`.
- Each matrix entry is an integer in `[-99, 99]`.
- Each row is on its own line; entries on a row are separated by single spaces.

**Output format** (stdout): `N` lines. Each line is one row of `C`, written as
`N` decimal integers separated by single spaces, terminated by a single newline.
The output ends with a newline after the last row.

## Acceptance

- Stdout: byte-exact match per test case.
- Stderr: empty.
- Exit code: 0.
- The program must complete each test case within 5 seconds.

## Examples

### Example 1

Input:
```
1
2
3
```

Output:
```
6
```

### Example 2

Input:
```
2
1 2
3 4
5 6
7 8
```

Output:
```
19 22
43 50
```

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (byte-exact, per case)	true
stderr (exact bytes)	""
exit code	0
wall time max (ms)	5000
tags	computation, matrix, loops

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model	Zero	TypeScript	Rust	Go	Python
gpt-4o	compile	✓	✓	other	✓
gpt-4o-mini	compile	✓	✓	other	✓
gpt-5	compile	✓	✓	✓	✓
opus	wrong output	✓	✓	✓	✓
sonnet	wrong output	✓	✓	✓	✓

Failure excerpts

7 of 25 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

gpt-4o Zero compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

gpt-4o Go other
```
# command-line-arguments
```

gpt-4o-mini Zero compile

ref.zero:3:10 PAR100: expected '{' before block

gpt-4o-mini Go other
```
# command-line-arguments
```

gpt-5 Zero compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

opus Zero wrong output
```
(no diagnostic captured)
```
sonnet Zero wrong output
```
(no diagnostic captured)
```

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero n/a

No reference implementation in this language.

TypeScript 38 lines

// Square integer matrix multiply, TypeScript reference.
// Reads N, then N rows of A, then N rows of B, prints rows of C = A * B.
import { readFileSync } from "node:fs";

const raw = readFileSync(0, "utf8");
const tokens = raw.split(/\s+/).filter((t) => t.length > 0);
let pos = 0;
const n = Number.parseInt(tokens[pos++], 10);
if (!Number.isInteger(n) || n < 1 || n > 5) {
  process.stderr.write("N must be in [1, 5]\n");
  process.exit(1);
}

const a: number[][] = [];
const b: number[][] = [];
for (let i = 0; i < n; i++) {
  const row: number[] = [];
  for (let j = 0; j < n; j++) row.push(Number.parseInt(tokens[pos++], 10));
  a.push(row);
}
for (let i = 0; i < n; i++) {
  const row: number[] = [];
  for (let j = 0; j < n; j++) row.push(Number.parseInt(tokens[pos++], 10));
  b.push(row);
}

const rows: string[] = [];
for (let i = 0; i < n; i++) {
  const cells: string[] = [];
  for (let j = 0; j < n; j++) {
    let s = 0;
    for (let k = 0; k < n; k++) s += a[i][k] * b[k][j];
    cells.push(String(s));
  }
  rows.push(cells.join(" "));
}
process.stdout.write(rows.join("\n") + "\n");

Rust 62 lines

// Square integer matrix multiply, Rust reference.
// Reads N, then N rows of A, then N rows of B, prints C = A * B.
use std::io::{self, Read, Write};
use std::process::ExitCode;

fn main() -> ExitCode {
    let mut input = String::new();
    if io::stdin().read_to_string(&mut input).is_err() {
        let _ = writeln!(io::stderr(), "failed to read stdin");
        return ExitCode::from(1);
    }
    let mut iter = input.split_ascii_whitespace();
    let n: i64 = match iter.next().and_then(|t| t.parse().ok()) {
        Some(v) if (1..=5).contains(&v) => v,
        _ => {
            let _ = writeln!(io::stderr(), "N must be in [1, 5]");
            return ExitCode::from(1);
        }
    };
    let n = n as usize;
    let read_matrix = |it: &mut std::str::SplitAsciiWhitespace| -> Option<Vec<Vec<i32>>> {
        let mut m = vec![vec![0i32; n]; n];
        for i in 0..n {
            for j in 0..n {
                let v: i32 = it.next()?.parse().ok()?;
                m[i][j] = v;
            }
        }
        Some(m)
    };
    let a = match read_matrix(&mut iter) {
        Some(m) => m,
        None => {
            let _ = writeln!(io::stderr(), "failed to read A");
            return ExitCode::from(1);
        }
    };
    let b = match read_matrix(&mut iter) {
        Some(m) => m,
        None => {
            let _ = writeln!(io::stderr(), "failed to read B");
            return ExitCode::from(1);
        }
    };
    let mut out = String::new();
    for i in 0..n {
        for j in 0..n {
            let mut s: i32 = 0;
            for k in 0..n {
                s += a[i][k] * b[k][j];
            }
            if j > 0 {
                out.push(' ');
            }
            out.push_str(&s.to_string());
        }
        out.push('\n');
    }
    let _ = io::stdout().write_all(out.as_bytes());
    ExitCode::SUCCESS
}

Go 72 lines

// Square integer matrix multiply, Go reference.
// Reads N, then N rows of A, then N rows of B, prints C = A * B.
package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
	"strconv"
	"strings"
)

func main() {
	raw, err := io.ReadAll(bufio.NewReader(os.Stdin))
	if err != nil {
		fmt.Fprintln(os.Stderr, "failed to read stdin")
		os.Exit(1)
	}
	tokens := strings.Fields(string(raw))
	pos := 0
	if len(tokens) == 0 {
		fmt.Fprintln(os.Stderr, "missing input")
		os.Exit(1)
	}
	n, err := strconv.Atoi(tokens[pos])
	if err != nil || n < 1 || n > 5 {
		fmt.Fprintln(os.Stderr, "N must be in [1, 5]")
		os.Exit(1)
	}
	pos++
	a := make([][]int, n)
	b := make([][]int, n)
	for i := 0; i < n; i++ {
		a[i] = make([]int, n)
		for j := 0; j < n; j++ {
			a[i][j], err = strconv.Atoi(tokens[pos])
			if err != nil {
				fmt.Fprintln(os.Stderr, "failed to read A")
				os.Exit(1)
			}
			pos++
		}
	}
	for i := 0; i < n; i++ {
		b[i] = make([]int, n)
		for j := 0; j < n; j++ {
			b[i][j], err = strconv.Atoi(tokens[pos])
			if err != nil {
				fmt.Fprintln(os.Stderr, "failed to read B")
				os.Exit(1)
			}
			pos++
		}
	}
	var sb strings.Builder
	for i := 0; i < n; i++ {
		for j := 0; j < n; j++ {
			s := 0
			for k := 0; k < n; k++ {
				s += a[i][k] * b[k][j]
			}
			if j > 0 {
				sb.WriteByte(' ')
			}
			sb.WriteString(strconv.Itoa(s))
		}
		sb.WriteByte('\n')
	}
	fmt.Print(sb.String())
}

Python 38 lines

"""Square integer matrix multiply, Python reference.

Reads N, then N rows of A, then N rows of B. Writes the N rows of C = A * B,
each row as space-separated decimal integers terminated by a single newline.
"""
import sys


def main() -> None:
    tokens = sys.stdin.read().split()
    pos = 0
    n = int(tokens[pos]); pos += 1
    if n < 1 or n > 5:
        sys.stderr.write("N must be in [1, 5]\n")
        sys.exit(1)
    a = [[0] * n for _ in range(n)]
    b = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            a[i][j] = int(tokens[pos]); pos += 1
    for i in range(n):
        for j in range(n):
            b[i][j] = int(tokens[pos]); pos += 1
    out_rows: list[str] = []
    for i in range(n):
        row_vals = []
        for j in range(n):
            s = 0
            for k in range(n):
                s += a[i][k] * b[k][j]
            row_vals.append(str(s))
        out_rows.append(" ".join(row_vals))
    sys.stdout.write("\n".join(out_rows) + "\n")


if __name__ == "__main__":
    main()

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/004-matrix-multiply/notes.md.

Why this task

Fourth and last pure-computation task. Differs from 001-003 in two shape changes:

Multi-value input. Tasks 001 and 002 took one integer; task 003 took two strings. This one takes one integer and 2NN more integers. The model must read a variable amount of input, tokenize it, and dispatch into the right matrix slot. Off-by-ones in the read loop translate directly into wrong outputs.
Multi-line, multi-column output. Tasks 001-003 all emitted a single decimal followed by \n. This one emits an N x N table. The model must space-separate within a row, newline-terminate per row, and not append a trailing line. A model that pretty-prints with extra whitespace produces a byte-exact failure even with the right math.

Difficulty: easy as math, but the I/O matrix is broader than 001-003. The triple-nested loop is canonical algorithm material.

Input convention

Zero takes the flat sequence on argv: argv[1] = N, argv[2..2+N*N-1] = A row-major, argv[2+N*N..2+2*N*N-1] = B row-major. With the spec cap of N <= 5 the worst-case argv length is 1 + 50 + 1 (program name) = 52 tokens, well inside the shell limit. The other four languages read the user-friendly multi-line stdin schema.

Edge cases probed

N = 1 (the trivial scalar case [a] * [b] = [a*b]).
Identity-times-arbitrary (catches models that swap row/column indices in the inner loop, because the output then ceases to be B).
Mixed positive and negative entries (catches models that try to use unsigned ints or skip the sign in the decimal renderer).
N = 5 at the spec cap (the largest matrix in the matrix exercises the full DP-flat indexing pattern; if the model used the wrong row stride it falls over here).

Zero implementation notes

Two flat [25]i32 buffers (5*5 = 25 = spec cap; using flat instead of [5][5] keeps it inside the direct backend's single-element array support). Element type i32 because entries are in [-99, 99] and worst-case inner-product sum is 5 * 99 * 99 = 49005, well inside i32 range.

The integer parser handles a leading - and emits an i32. The decimal renderer handles negative i32 by computing (0 - s) as u32 under guard s < 0 and emitting a - byte before the digits. The divisor in the renderer is 10 (u32), so it stays clear of the 2^32-divisor codegen SIGFPE.

The output buffer is [512]u8. For the worst case (N=5, every cell 6 chars including a leading -), the row length is 5 * 7 - 1 + 1 (space-separated + newline) = 35 bytes; total = 175 bytes; 512 is a comfortable cap.

Future revisions

Bigger N. The spec caps at 5 to make argv ergonomic and to keep the test fixtures readable. Lifting to N=20 (so 800 argv tokens for the two matrices) would still fit, but the test JSON files would balloon. Defer to v0.2 when the spec gains an alternative input convention.
Floating point. A future variant could swap i32 for f64 and check that the model preserves precision in the sum. Catch: f64 printing conventions diverge across languages (Python's repr vs Rust's Display), so the spec would need a strict format string.
Non-square. The current spec uses NxN matrices to halve the required input length. A non-square (MxK * KxN) variant would test whether the model truly understands the contraction dimension versus just copying a "do three nested loops" recipe.

Cost

Model	Prompt tokens	Completion tokens	API ms
gpt-4o	2,990	1,678	18,506
gpt-4o-mini	2,990	1,508	32,411
gpt-5	2,985	16,193	169,045
opus	12	1,217	168,821
sonnet	12	1,340	116,638

Tokens and API ms are summed across the five languages this model attempted for this task.

Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 · opus · sonnet . Back to the leaderboard and methodology.

Truffle · 004-matrix-multiply · run captured 2026-06-27