agentlang-index · task easy
Square integer matrix multiply
004-matrix-multiply. Read two `N x N` integer matrices `A` and `B` from standard input, compute the
Prompt
This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.
## Task: Square integer matrix multiply
Read two `N x N` integer matrices `A` and `B` from standard input, compute the
matrix product `C = A * B`, and write `C` to standard output.
**Input format** (stdin):
```
N
a_11 a_12 ... a_1N
a_21 a_22 ... a_2N
...
a_N1 a_N2 ... a_NN
b_11 b_12 ... b_1N
b_21 b_22 ... b_2N
...
b_N1 b_N2 ... b_NN
```
- `1 <= N <= 5`.
- Each matrix entry is an integer in `[-99, 99]`.
- Each row is on its own line; entries on a row are separated by single spaces.
**Output format** (stdout): `N` lines. Each line is one row of `C`, written as
`N` decimal integers separated by single spaces, terminated by a single newline.
The output ends with a newline after the last row.
## Acceptance
- Stdout: byte-exact match per test case.
- Stderr: empty.
- Exit code: 0.
- The program must complete each test case within 5 seconds.
## Examples
### Example 1
Input:
```
1
2
3
```
Output:
```
6
```
### Example 2
Input:
```
2
1 2
3 4
5 6
7 8
```
Output:
```
19 22
43 50
```
## Language scaffold
{language_scaffold}
Acceptance
A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."
| stdout (byte-exact, per case) | true |
|---|---|
| stderr (exact bytes) | "" |
| exit code | 0 |
| wall time max (ms) | 5000 |
| tags | computation, matrix, loops |
Results
Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.
| Model | Zero | TypeScript | Rust | Go | Python |
|---|---|---|---|---|---|
| gpt-4o | compile | ✓ | ✓ | other | ✓ |
| gpt-4o-mini | compile | ✓ | ✓ | other | ✓ |
| gpt-5 | compile | ✓ | ✓ | ✓ | ✓ |
Failure excerpts
5 of 15 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.
-
ref.zero:1:1 IMP001: unknown package-local import 'std' -
# command-line-arguments -
ref.zero:3:10 PAR100: expected '{' before block -
# command-line-arguments -
ref.zero:1:1 IMP001: unknown package-local import 'std'
Reference implementations
The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.
Click a language to expand
Zero 190 lines
// Square integer matrix multiply, Zero 0.1.2 direct backend.
//
// Same input convention as 001-003: arguments come from argv because
// Zero 0.1.2 has no exposed stdin capability. argv[1] is N; the next
// N*N args are A row-major; the next N*N args are B row-major. All
// logic stays inside `pub fun main` to avoid the Span/MutSpan
// parameter restriction.
//
// Matrices are stored as flat [25]i32 buffers (the spec caps N at 5
// so 5*5 = 25 fixes both upper bound and stack size). Element type
// i32 because entries are in [-99, 99] and worst-case product sum is
// 5 * 99 * 99 = 49005, comfortably inside i32 range. The decimal
// renderer handles negative values and emits N lines of space-
// separated decimals terminated by \n.
pub fun main(world: World) -> Void raises {
let n_opt = std.args.get(1)
if n_opt.has == false {
check world.err.write("usage: pass N as argv[1] and the matrix entries as argv[2..]\n")
return
}
let n_bytes: Span<u8> = std.mem.span(n_opt.value)
let n_str_len: usize = std.mem.len(n_bytes)
if n_str_len == 0 {
check world.err.write("N must be in [1, 5]\n")
return
}
let mut np: usize = 0
let mut n_val: u32 = 0_u32
while np < n_str_len {
let ch: u8 = n_bytes[np]
if ch < 48_u8 || ch > 57_u8 {
check world.err.write("N must be in [1, 5]\n")
return
}
n_val = n_val * 10_u32 + ((ch - 48_u8) as u32)
np = np + 1
}
if n_val < 1_u32 || n_val > 5_u32 {
check world.err.write("N must be in [1, 5]\n")
return
}
let n: usize = n_val as usize
let nn: usize = n * n
let mut a: [25]i32 = [0_i32; 25]
let mut b: [25]i32 = [0_i32; 25]
let mut idx: usize = 0
while idx < nn {
let arg = std.args.get(2_usize + idx)
if arg.has == false {
check world.err.write("missing matrix entry\n")
return
}
let bytes: Span<u8> = std.mem.span(arg.value)
let bl: usize = std.mem.len(bytes)
if bl == 0 {
check world.err.write("empty matrix entry\n")
return
}
let mut p: usize = 0
let mut neg: Bool = false
if bytes[0] == 45_u8 {
neg = true
p = 1
}
if p >= bl {
check world.err.write("invalid matrix entry\n")
return
}
let mut v: i32 = 0_i32
while p < bl {
let c: u8 = bytes[p]
if c < 48_u8 || c > 57_u8 {
check world.err.write("invalid matrix entry\n")
return
}
v = v * 10_i32 + ((c - 48_u8) as i32)
p = p + 1
}
if neg {
v = 0_i32 - v
}
a[idx] = v
idx = idx + 1
}
let mut jdx: usize = 0
while jdx < nn {
let arg = std.args.get(2_usize + nn + jdx)
if arg.has == false {
check world.err.write("missing matrix entry\n")
return
}
let bytes: Span<u8> = std.mem.span(arg.value)
let bl: usize = std.mem.len(bytes)
if bl == 0 {
check world.err.write("empty matrix entry\n")
return
}
let mut p: usize = 0
let mut neg: Bool = false
if bytes[0] == 45_u8 {
neg = true
p = 1
}
if p >= bl {
check world.err.write("invalid matrix entry\n")
return
}
let mut v: i32 = 0_i32
while p < bl {
let c: u8 = bytes[p]
if c < 48_u8 || c > 57_u8 {
check world.err.write("invalid matrix entry\n")
return
}
v = v * 10_i32 + ((c - 48_u8) as i32)
p = p + 1
}
if neg {
v = 0_i32 - v
}
b[jdx] = v
jdx = jdx + 1
}
let mut out_buf: [512]u8 = [0_u8; 512]
let mut written: usize = 0
let mut i: usize = 0
while i < n {
let mut j: usize = 0
while j < n {
let mut s: i32 = 0_i32
let mut k: usize = 0
while k < n {
s = s + a[i * n + k] * b[k * n + j]
k = k + 1
}
// Render s into the buffer.
if j > 0 {
out_buf[written] = 32_u8
written = written + 1
}
let mut is_neg: Bool = false
let mut u: u32 = 0_u32
if s < 0_i32 {
is_neg = true
u = (0_i32 - s) as u32
} else {
u = s as u32
}
if is_neg {
out_buf[written] = 45_u8
written = written + 1
}
if u == 0_u32 {
out_buf[written] = 48_u8
written = written + 1
} else {
let mut digits: [16]u8 = [0_u8; 16]
let mut rem: u32 = u
let mut dc: usize = 0
while rem > 0_u32 {
let dq: u32 = rem / 10_u32
let dr: u32 = rem % 10_u32
digits[dc] = (dr as u8) + 48_u8
rem = dq
dc = dc + 1
}
let mut o: usize = 0
while o < dc {
out_buf[written] = digits[dc - 1 - o]
written = written + 1
o = o + 1
}
}
j = j + 1
}
out_buf[written] = 10_u8
written = written + 1
i = i + 1
}
let out: Span<u8> = out_buf[0..written]
check world.out.write(out)
return
}
TypeScript 38 lines
// Square integer matrix multiply, TypeScript reference.
// Reads N, then N rows of A, then N rows of B, prints rows of C = A * B.
import { readFileSync } from "node:fs";
const raw = readFileSync(0, "utf8");
const tokens = raw.split(/\s+/).filter((t) => t.length > 0);
let pos = 0;
const n = Number.parseInt(tokens[pos++], 10);
if (!Number.isInteger(n) || n < 1 || n > 5) {
process.stderr.write("N must be in [1, 5]\n");
process.exit(1);
}
const a: number[][] = [];
const b: number[][] = [];
for (let i = 0; i < n; i++) {
const row: number[] = [];
for (let j = 0; j < n; j++) row.push(Number.parseInt(tokens[pos++], 10));
a.push(row);
}
for (let i = 0; i < n; i++) {
const row: number[] = [];
for (let j = 0; j < n; j++) row.push(Number.parseInt(tokens[pos++], 10));
b.push(row);
}
const rows: string[] = [];
for (let i = 0; i < n; i++) {
const cells: string[] = [];
for (let j = 0; j < n; j++) {
let s = 0;
for (let k = 0; k < n; k++) s += a[i][k] * b[k][j];
cells.push(String(s));
}
rows.push(cells.join(" "));
}
process.stdout.write(rows.join("\n") + "\n");
Rust 62 lines
// Square integer matrix multiply, Rust reference.
// Reads N, then N rows of A, then N rows of B, prints C = A * B.
use std::io::{self, Read, Write};
use std::process::ExitCode;
fn main() -> ExitCode {
let mut input = String::new();
if io::stdin().read_to_string(&mut input).is_err() {
let _ = writeln!(io::stderr(), "failed to read stdin");
return ExitCode::from(1);
}
let mut iter = input.split_ascii_whitespace();
let n: i64 = match iter.next().and_then(|t| t.parse().ok()) {
Some(v) if (1..=5).contains(&v) => v,
_ => {
let _ = writeln!(io::stderr(), "N must be in [1, 5]");
return ExitCode::from(1);
}
};
let n = n as usize;
let read_matrix = |it: &mut std::str::SplitAsciiWhitespace| -> Option<Vec<Vec<i32>>> {
let mut m = vec![vec![0i32; n]; n];
for i in 0..n {
for j in 0..n {
let v: i32 = it.next()?.parse().ok()?;
m[i][j] = v;
}
}
Some(m)
};
let a = match read_matrix(&mut iter) {
Some(m) => m,
None => {
let _ = writeln!(io::stderr(), "failed to read A");
return ExitCode::from(1);
}
};
let b = match read_matrix(&mut iter) {
Some(m) => m,
None => {
let _ = writeln!(io::stderr(), "failed to read B");
return ExitCode::from(1);
}
};
let mut out = String::new();
for i in 0..n {
for j in 0..n {
let mut s: i32 = 0;
for k in 0..n {
s += a[i][k] * b[k][j];
}
if j > 0 {
out.push(' ');
}
out.push_str(&s.to_string());
}
out.push('\n');
}
let _ = io::stdout().write_all(out.as_bytes());
ExitCode::SUCCESS
}
Go 72 lines
// Square integer matrix multiply, Go reference.
// Reads N, then N rows of A, then N rows of B, prints C = A * B.
package main
import (
"bufio"
"fmt"
"io"
"os"
"strconv"
"strings"
)
func main() {
raw, err := io.ReadAll(bufio.NewReader(os.Stdin))
if err != nil {
fmt.Fprintln(os.Stderr, "failed to read stdin")
os.Exit(1)
}
tokens := strings.Fields(string(raw))
pos := 0
if len(tokens) == 0 {
fmt.Fprintln(os.Stderr, "missing input")
os.Exit(1)
}
n, err := strconv.Atoi(tokens[pos])
if err != nil || n < 1 || n > 5 {
fmt.Fprintln(os.Stderr, "N must be in [1, 5]")
os.Exit(1)
}
pos++
a := make([][]int, n)
b := make([][]int, n)
for i := 0; i < n; i++ {
a[i] = make([]int, n)
for j := 0; j < n; j++ {
a[i][j], err = strconv.Atoi(tokens[pos])
if err != nil {
fmt.Fprintln(os.Stderr, "failed to read A")
os.Exit(1)
}
pos++
}
}
for i := 0; i < n; i++ {
b[i] = make([]int, n)
for j := 0; j < n; j++ {
b[i][j], err = strconv.Atoi(tokens[pos])
if err != nil {
fmt.Fprintln(os.Stderr, "failed to read B")
os.Exit(1)
}
pos++
}
}
var sb strings.Builder
for i := 0; i < n; i++ {
for j := 0; j < n; j++ {
s := 0
for k := 0; k < n; k++ {
s += a[i][k] * b[k][j]
}
if j > 0 {
sb.WriteByte(' ')
}
sb.WriteString(strconv.Itoa(s))
}
sb.WriteByte('\n')
}
fmt.Print(sb.String())
}
Python 38 lines
"""Square integer matrix multiply, Python reference.
Reads N, then N rows of A, then N rows of B. Writes the N rows of C = A * B,
each row as space-separated decimal integers terminated by a single newline.
"""
import sys
def main() -> None:
tokens = sys.stdin.read().split()
pos = 0
n = int(tokens[pos]); pos += 1
if n < 1 or n > 5:
sys.stderr.write("N must be in [1, 5]\n")
sys.exit(1)
a = [[0] * n for _ in range(n)]
b = [[0] * n for _ in range(n)]
for i in range(n):
for j in range(n):
a[i][j] = int(tokens[pos]); pos += 1
for i in range(n):
for j in range(n):
b[i][j] = int(tokens[pos]); pos += 1
out_rows: list[str] = []
for i in range(n):
row_vals = []
for j in range(n):
s = 0
for k in range(n):
s += a[i][k] * b[k][j]
row_vals.append(str(s))
out_rows.append(" ".join(row_vals))
sys.stdout.write("\n".join(out_rows) + "\n")
if __name__ == "__main__":
main()
Design notes
Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/004-matrix-multiply/notes.md.
Why this task
Fourth and last pure-computation task. Differs from 001-003 in two shape changes:
- Multi-value input. Tasks 001 and 002 took one integer; task 003 took two strings. This one takes one integer and 2NN more integers. The model must read a variable amount of input, tokenize it, and dispatch into the right matrix slot. Off-by-ones in the read loop translate directly into wrong outputs.
- Multi-line, multi-column output. Tasks 001-003 all emitted a
single decimal followed by
\n. This one emits an N x N table. The model must space-separate within a row, newline-terminate per row, and not append a trailing line. A model that pretty-prints with extra whitespace produces a byte-exact failure even with the right math.
Difficulty: easy as math, but the I/O matrix is broader than 001-003. The triple-nested loop is canonical algorithm material.
Input convention
Zero takes the flat sequence on argv: argv[1] = N, argv[2..2+N*N-1]
= A row-major, argv[2+N*N..2+2*N*N-1] = B row-major. With the spec
cap of N <= 5 the worst-case argv length is 1 + 50 + 1 (program name)
= 52 tokens, well inside the shell limit. The other four languages
read the user-friendly multi-line stdin schema.
Edge cases probed
N = 1(the trivial scalar case[a] * [b] = [a*b]).- Identity-times-arbitrary (catches models that swap row/column indices in the inner loop, because the output then ceases to be B).
- Mixed positive and negative entries (catches models that try to use unsigned ints or skip the sign in the decimal renderer).
N = 5at the spec cap (the largest matrix in the matrix exercises the full DP-flat indexing pattern; if the model used the wrong row stride it falls over here).
Zero implementation notes
Two flat [25]i32 buffers (5*5 = 25 = spec cap; using flat instead of
[5][5] keeps it inside the direct backend's single-element array
support). Element type i32 because entries are in [-99, 99] and
worst-case inner-product sum is 5 * 99 * 99 = 49005, well inside
i32 range.
The integer parser handles a leading - and emits an i32. The
decimal renderer handles negative i32 by computing (0 - s) as u32
under guard s < 0 and emitting a - byte before the digits. The
divisor in the renderer is 10 (u32), so it stays clear of the
2^32-divisor codegen SIGFPE.
The output buffer is [512]u8. For the worst case (N=5, every cell
6 chars including a leading -), the row length is 5 * 7 - 1 + 1
(space-separated + newline) = 35 bytes; total = 175 bytes; 512 is a
comfortable cap.
Future revisions
- Bigger N. The spec caps at 5 to make argv ergonomic and to keep the test fixtures readable. Lifting to N=20 (so 800 argv tokens for the two matrices) would still fit, but the test JSON files would balloon. Defer to v0.2 when the spec gains an alternative input convention.
- Floating point. A future variant could swap i32 for f64 and check
that the model preserves precision in the sum. Catch: f64 printing
conventions diverge across languages (Python's
reprvs Rust's Display), so the spec would need a strict format string. - Non-square. The current spec uses NxN matrices to halve the required input length. A non-square (MxK * KxN) variant would test whether the model truly understands the contraction dimension versus just copying a "do three nested loops" recipe.
Cost
| Model | Prompt tokens | Completion tokens | API ms |
|---|---|---|---|
| gpt-4o | 2,990 | 1,678 | 18,506 |
| gpt-4o-mini | 2,990 | 1,508 | 32,411 |
| gpt-5 | 2,985 | 16,193 | 169,045 |
Tokens and API ms are summed across the five languages this model attempted for this task.
Compare
Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 . Back to the leaderboard and methodology.