agentlang-index · task easy

Square integer matrix multiply

004-matrix-multiply. Read two `N x N` integer matrices `A` and `B` from standard input, compute the

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

## Task: Square integer matrix multiply

Read two `N x N` integer matrices `A` and `B` from standard input, compute the
matrix product `C = A * B`, and write `C` to standard output.

**Input format** (stdin):

```
N
a_11 a_12 ... a_1N
a_21 a_22 ... a_2N
...
a_N1 a_N2 ... a_NN
b_11 b_12 ... b_1N
b_21 b_22 ... b_2N
...
b_N1 b_N2 ... b_NN
```

- `1 <= N <= 5`.
- Each matrix entry is an integer in `[-99, 99]`.
- Each row is on its own line; entries on a row are separated by single spaces.

**Output format** (stdout): `N` lines. Each line is one row of `C`, written as
`N` decimal integers separated by single spaces, terminated by a single newline.
The output ends with a newline after the last row.

## Acceptance

- Stdout: byte-exact match per test case.
- Stderr: empty.
- Exit code: 0.
- The program must complete each test case within 5 seconds.

## Examples

### Example 1

Input:
```
1
2
3
```

Output:
```
6
```

### Example 2

Input:
```
2
1 2
3 4
5 6
7 8
```

Output:
```
19 22
43 50
```

## Language scaffold

{language_scaffold}

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (byte-exact, per case) true
stderr (exact bytes) ""
exit code 0
wall time max (ms) 5000
tags computation, matrix, loops

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model ZeroTypeScriptRustGoPython
gpt-4o compile other
gpt-4o-mini compile other
gpt-5 compile

Failure excerpts

5 of 15 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

  1. gpt-4o Zero compile
    ref.zero:1:1 IMP001: unknown package-local import 'std'
  2. gpt-4o Go other
    # command-line-arguments
  3. gpt-4o-mini Zero compile
    ref.zero:3:10 PAR100: expected '{' before block
  4. gpt-4o-mini Go other
    # command-line-arguments
  5. gpt-5 Zero compile
    ref.zero:1:1 IMP001: unknown package-local import 'std'

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero 190 lines
// Square integer matrix multiply, Zero 0.1.2 direct backend.
//
// Same input convention as 001-003: arguments come from argv because
// Zero 0.1.2 has no exposed stdin capability. argv[1] is N; the next
// N*N args are A row-major; the next N*N args are B row-major. All
// logic stays inside `pub fun main` to avoid the Span/MutSpan
// parameter restriction.
//
// Matrices are stored as flat [25]i32 buffers (the spec caps N at 5
// so 5*5 = 25 fixes both upper bound and stack size). Element type
// i32 because entries are in [-99, 99] and worst-case product sum is
// 5 * 99 * 99 = 49005, comfortably inside i32 range. The decimal
// renderer handles negative values and emits N lines of space-
// separated decimals terminated by \n.
pub fun main(world: World) -> Void raises {
    let n_opt = std.args.get(1)
    if n_opt.has == false {
        check world.err.write("usage: pass N as argv[1] and the matrix entries as argv[2..]\n")
        return
    }
    let n_bytes: Span<u8> = std.mem.span(n_opt.value)
    let n_str_len: usize = std.mem.len(n_bytes)
    if n_str_len == 0 {
        check world.err.write("N must be in [1, 5]\n")
        return
    }
    let mut np: usize = 0
    let mut n_val: u32 = 0_u32
    while np < n_str_len {
        let ch: u8 = n_bytes[np]
        if ch < 48_u8 || ch > 57_u8 {
            check world.err.write("N must be in [1, 5]\n")
            return
        }
        n_val = n_val * 10_u32 + ((ch - 48_u8) as u32)
        np = np + 1
    }
    if n_val < 1_u32 || n_val > 5_u32 {
        check world.err.write("N must be in [1, 5]\n")
        return
    }
    let n: usize = n_val as usize
    let nn: usize = n * n

    let mut a: [25]i32 = [0_i32; 25]
    let mut b: [25]i32 = [0_i32; 25]

    let mut idx: usize = 0
    while idx < nn {
        let arg = std.args.get(2_usize + idx)
        if arg.has == false {
            check world.err.write("missing matrix entry\n")
            return
        }
        let bytes: Span<u8> = std.mem.span(arg.value)
        let bl: usize = std.mem.len(bytes)
        if bl == 0 {
            check world.err.write("empty matrix entry\n")
            return
        }
        let mut p: usize = 0
        let mut neg: Bool = false
        if bytes[0] == 45_u8 {
            neg = true
            p = 1
        }
        if p >= bl {
            check world.err.write("invalid matrix entry\n")
            return
        }
        let mut v: i32 = 0_i32
        while p < bl {
            let c: u8 = bytes[p]
            if c < 48_u8 || c > 57_u8 {
                check world.err.write("invalid matrix entry\n")
                return
            }
            v = v * 10_i32 + ((c - 48_u8) as i32)
            p = p + 1
        }
        if neg {
            v = 0_i32 - v
        }
        a[idx] = v
        idx = idx + 1
    }

    let mut jdx: usize = 0
    while jdx < nn {
        let arg = std.args.get(2_usize + nn + jdx)
        if arg.has == false {
            check world.err.write("missing matrix entry\n")
            return
        }
        let bytes: Span<u8> = std.mem.span(arg.value)
        let bl: usize = std.mem.len(bytes)
        if bl == 0 {
            check world.err.write("empty matrix entry\n")
            return
        }
        let mut p: usize = 0
        let mut neg: Bool = false
        if bytes[0] == 45_u8 {
            neg = true
            p = 1
        }
        if p >= bl {
            check world.err.write("invalid matrix entry\n")
            return
        }
        let mut v: i32 = 0_i32
        while p < bl {
            let c: u8 = bytes[p]
            if c < 48_u8 || c > 57_u8 {
                check world.err.write("invalid matrix entry\n")
                return
            }
            v = v * 10_i32 + ((c - 48_u8) as i32)
            p = p + 1
        }
        if neg {
            v = 0_i32 - v
        }
        b[jdx] = v
        jdx = jdx + 1
    }

    let mut out_buf: [512]u8 = [0_u8; 512]
    let mut written: usize = 0

    let mut i: usize = 0
    while i < n {
        let mut j: usize = 0
        while j < n {
            let mut s: i32 = 0_i32
            let mut k: usize = 0
            while k < n {
                s = s + a[i * n + k] * b[k * n + j]
                k = k + 1
            }
            // Render s into the buffer.
            if j > 0 {
                out_buf[written] = 32_u8
                written = written + 1
            }
            let mut is_neg: Bool = false
            let mut u: u32 = 0_u32
            if s < 0_i32 {
                is_neg = true
                u = (0_i32 - s) as u32
            } else {
                u = s as u32
            }
            if is_neg {
                out_buf[written] = 45_u8
                written = written + 1
            }
            if u == 0_u32 {
                out_buf[written] = 48_u8
                written = written + 1
            } else {
                let mut digits: [16]u8 = [0_u8; 16]
                let mut rem: u32 = u
                let mut dc: usize = 0
                while rem > 0_u32 {
                    let dq: u32 = rem / 10_u32
                    let dr: u32 = rem % 10_u32
                    digits[dc] = (dr as u8) + 48_u8
                    rem = dq
                    dc = dc + 1
                }
                let mut o: usize = 0
                while o < dc {
                    out_buf[written] = digits[dc - 1 - o]
                    written = written + 1
                    o = o + 1
                }
            }
            j = j + 1
        }
        out_buf[written] = 10_u8
        written = written + 1
        i = i + 1
    }

    let out: Span<u8> = out_buf[0..written]
    check world.out.write(out)
    return
}
TypeScript 38 lines
// Square integer matrix multiply, TypeScript reference.
// Reads N, then N rows of A, then N rows of B, prints rows of C = A * B.
import { readFileSync } from "node:fs";

const raw = readFileSync(0, "utf8");
const tokens = raw.split(/\s+/).filter((t) => t.length > 0);
let pos = 0;
const n = Number.parseInt(tokens[pos++], 10);
if (!Number.isInteger(n) || n < 1 || n > 5) {
  process.stderr.write("N must be in [1, 5]\n");
  process.exit(1);
}

const a: number[][] = [];
const b: number[][] = [];
for (let i = 0; i < n; i++) {
  const row: number[] = [];
  for (let j = 0; j < n; j++) row.push(Number.parseInt(tokens[pos++], 10));
  a.push(row);
}
for (let i = 0; i < n; i++) {
  const row: number[] = [];
  for (let j = 0; j < n; j++) row.push(Number.parseInt(tokens[pos++], 10));
  b.push(row);
}

const rows: string[] = [];
for (let i = 0; i < n; i++) {
  const cells: string[] = [];
  for (let j = 0; j < n; j++) {
    let s = 0;
    for (let k = 0; k < n; k++) s += a[i][k] * b[k][j];
    cells.push(String(s));
  }
  rows.push(cells.join(" "));
}
process.stdout.write(rows.join("\n") + "\n");
Rust 62 lines
// Square integer matrix multiply, Rust reference.
// Reads N, then N rows of A, then N rows of B, prints C = A * B.
use std::io::{self, Read, Write};
use std::process::ExitCode;

fn main() -> ExitCode {
    let mut input = String::new();
    if io::stdin().read_to_string(&mut input).is_err() {
        let _ = writeln!(io::stderr(), "failed to read stdin");
        return ExitCode::from(1);
    }
    let mut iter = input.split_ascii_whitespace();
    let n: i64 = match iter.next().and_then(|t| t.parse().ok()) {
        Some(v) if (1..=5).contains(&v) => v,
        _ => {
            let _ = writeln!(io::stderr(), "N must be in [1, 5]");
            return ExitCode::from(1);
        }
    };
    let n = n as usize;
    let read_matrix = |it: &mut std::str::SplitAsciiWhitespace| -> Option<Vec<Vec<i32>>> {
        let mut m = vec![vec![0i32; n]; n];
        for i in 0..n {
            for j in 0..n {
                let v: i32 = it.next()?.parse().ok()?;
                m[i][j] = v;
            }
        }
        Some(m)
    };
    let a = match read_matrix(&mut iter) {
        Some(m) => m,
        None => {
            let _ = writeln!(io::stderr(), "failed to read A");
            return ExitCode::from(1);
        }
    };
    let b = match read_matrix(&mut iter) {
        Some(m) => m,
        None => {
            let _ = writeln!(io::stderr(), "failed to read B");
            return ExitCode::from(1);
        }
    };
    let mut out = String::new();
    for i in 0..n {
        for j in 0..n {
            let mut s: i32 = 0;
            for k in 0..n {
                s += a[i][k] * b[k][j];
            }
            if j > 0 {
                out.push(' ');
            }
            out.push_str(&s.to_string());
        }
        out.push('\n');
    }
    let _ = io::stdout().write_all(out.as_bytes());
    ExitCode::SUCCESS
}
Go 72 lines
// Square integer matrix multiply, Go reference.
// Reads N, then N rows of A, then N rows of B, prints C = A * B.
package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
	"strconv"
	"strings"
)

func main() {
	raw, err := io.ReadAll(bufio.NewReader(os.Stdin))
	if err != nil {
		fmt.Fprintln(os.Stderr, "failed to read stdin")
		os.Exit(1)
	}
	tokens := strings.Fields(string(raw))
	pos := 0
	if len(tokens) == 0 {
		fmt.Fprintln(os.Stderr, "missing input")
		os.Exit(1)
	}
	n, err := strconv.Atoi(tokens[pos])
	if err != nil || n < 1 || n > 5 {
		fmt.Fprintln(os.Stderr, "N must be in [1, 5]")
		os.Exit(1)
	}
	pos++
	a := make([][]int, n)
	b := make([][]int, n)
	for i := 0; i < n; i++ {
		a[i] = make([]int, n)
		for j := 0; j < n; j++ {
			a[i][j], err = strconv.Atoi(tokens[pos])
			if err != nil {
				fmt.Fprintln(os.Stderr, "failed to read A")
				os.Exit(1)
			}
			pos++
		}
	}
	for i := 0; i < n; i++ {
		b[i] = make([]int, n)
		for j := 0; j < n; j++ {
			b[i][j], err = strconv.Atoi(tokens[pos])
			if err != nil {
				fmt.Fprintln(os.Stderr, "failed to read B")
				os.Exit(1)
			}
			pos++
		}
	}
	var sb strings.Builder
	for i := 0; i < n; i++ {
		for j := 0; j < n; j++ {
			s := 0
			for k := 0; k < n; k++ {
				s += a[i][k] * b[k][j]
			}
			if j > 0 {
				sb.WriteByte(' ')
			}
			sb.WriteString(strconv.Itoa(s))
		}
		sb.WriteByte('\n')
	}
	fmt.Print(sb.String())
}
Python 38 lines
"""Square integer matrix multiply, Python reference.

Reads N, then N rows of A, then N rows of B. Writes the N rows of C = A * B,
each row as space-separated decimal integers terminated by a single newline.
"""
import sys


def main() -> None:
    tokens = sys.stdin.read().split()
    pos = 0
    n = int(tokens[pos]); pos += 1
    if n < 1 or n > 5:
        sys.stderr.write("N must be in [1, 5]\n")
        sys.exit(1)
    a = [[0] * n for _ in range(n)]
    b = [[0] * n for _ in range(n)]
    for i in range(n):
        for j in range(n):
            a[i][j] = int(tokens[pos]); pos += 1
    for i in range(n):
        for j in range(n):
            b[i][j] = int(tokens[pos]); pos += 1
    out_rows: list[str] = []
    for i in range(n):
        row_vals = []
        for j in range(n):
            s = 0
            for k in range(n):
                s += a[i][k] * b[k][j]
            row_vals.append(str(s))
        out_rows.append(" ".join(row_vals))
    sys.stdout.write("\n".join(out_rows) + "\n")


if __name__ == "__main__":
    main()

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/004-matrix-multiply/notes.md.

Why this task

Fourth and last pure-computation task. Differs from 001-003 in two shape changes:

  1. Multi-value input. Tasks 001 and 002 took one integer; task 003 took two strings. This one takes one integer and 2NN more integers. The model must read a variable amount of input, tokenize it, and dispatch into the right matrix slot. Off-by-ones in the read loop translate directly into wrong outputs.
  2. Multi-line, multi-column output. Tasks 001-003 all emitted a single decimal followed by \n. This one emits an N x N table. The model must space-separate within a row, newline-terminate per row, and not append a trailing line. A model that pretty-prints with extra whitespace produces a byte-exact failure even with the right math.

Difficulty: easy as math, but the I/O matrix is broader than 001-003. The triple-nested loop is canonical algorithm material.

Input convention

Zero takes the flat sequence on argv: argv[1] = N, argv[2..2+N*N-1] = A row-major, argv[2+N*N..2+2*N*N-1] = B row-major. With the spec cap of N <= 5 the worst-case argv length is 1 + 50 + 1 (program name) = 52 tokens, well inside the shell limit. The other four languages read the user-friendly multi-line stdin schema.

Edge cases probed

  • N = 1 (the trivial scalar case [a] * [b] = [a*b]).
  • Identity-times-arbitrary (catches models that swap row/column indices in the inner loop, because the output then ceases to be B).
  • Mixed positive and negative entries (catches models that try to use unsigned ints or skip the sign in the decimal renderer).
  • N = 5 at the spec cap (the largest matrix in the matrix exercises the full DP-flat indexing pattern; if the model used the wrong row stride it falls over here).

Zero implementation notes

Two flat [25]i32 buffers (5*5 = 25 = spec cap; using flat instead of [5][5] keeps it inside the direct backend's single-element array support). Element type i32 because entries are in [-99, 99] and worst-case inner-product sum is 5 * 99 * 99 = 49005, well inside i32 range.

The integer parser handles a leading - and emits an i32. The decimal renderer handles negative i32 by computing (0 - s) as u32 under guard s < 0 and emitting a - byte before the digits. The divisor in the renderer is 10 (u32), so it stays clear of the 2^32-divisor codegen SIGFPE.

The output buffer is [512]u8. For the worst case (N=5, every cell 6 chars including a leading -), the row length is 5 * 7 - 1 + 1 (space-separated + newline) = 35 bytes; total = 175 bytes; 512 is a comfortable cap.

Future revisions

  • Bigger N. The spec caps at 5 to make argv ergonomic and to keep the test fixtures readable. Lifting to N=20 (so 800 argv tokens for the two matrices) would still fit, but the test JSON files would balloon. Defer to v0.2 when the spec gains an alternative input convention.
  • Floating point. A future variant could swap i32 for f64 and check that the model preserves precision in the sum. Catch: f64 printing conventions diverge across languages (Python's repr vs Rust's Display), so the spec would need a strict format string.
  • Non-square. The current spec uses NxN matrices to halve the required input length. A non-square (MxK * KxN) variant would test whether the model truly understands the contraction dimension versus just copying a "do three nested loops" recipe.

Cost

Model Prompt tokens Completion tokens API ms
gpt-4o 2,990 1,678 18,506
gpt-4o-mini 2,990 1,508 32,411
gpt-5 2,985 16,193 169,045

Tokens and API ms are summed across the five languages this model attempted for this task.


Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 . Back to the leaderboard and methodology.