agentlang-index · task easy

Levenshtein edit distance

003-levenshtein-distance. Read two short ASCII strings `A` and `B` from standard input (one per line),

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

## Task: Levenshtein edit distance

Read two short ASCII strings `A` and `B` from standard input (one per line),
each at most 50 characters and using only printable ASCII (`0x20..=0x7E`).
Compute the **Levenshtein edit distance** between them: the minimum number
of single-character insertions, deletions, or substitutions to transform
`A` into `B`. Each edit has cost 1. Write the distance to standard output
followed by a single newline. Exit with status 0.

Edge cases:

- If `A == B` the distance is `0`.
- If either string is empty the distance is the length of the other.

## Acceptance

- Stdin: two lines. Line 1 is `A`, line 2 is `B`. Either may be empty.
- Stdout: the decimal digits of the distance, followed by `\n`.
- Stderr: empty.
- Exit code: 0.
- The program must complete each test case within 5 seconds.

## Examples

| A          | B          | distance |
| ---------- | ---------- | -------- |
| `kitten`   | `sitting`  | `3`      |
| `saturday` | `sunday`   | `3`      |
| `flaw`     | `lawn`     | `2`      |
| `intention`| `execution`| `5`      |
| empty      | `abc`      | `3`      |
| empty      | empty      | `0`      |

## Language scaffold

{language_scaffold}

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (byte-exact, per case) true
stderr (exact bytes) ""
exit code 0
wall time max (ms) 5000
tags computation, dp, strings

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model ZeroTypeScriptRustGoPython
gpt-4o compile
gpt-4o-mini compile other
gpt-5 compile

Failure excerpts

4 of 15 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

  1. gpt-4o Zero compile
    ref.zero:1:1 IMP001: unknown package-local import 'std'
  2. gpt-4o-mini Zero compile
    ref.zero:9:1 PAR100: expected '{' before block
  3. gpt-4o-mini TypeScript other
    (no diagnostic captured)
  4. gpt-5 Zero compile
    ref.zero:3:8 PAR100: expected '{' before block

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero 98 lines
// Levenshtein edit distance, Zero 0.1.2 direct backend.
//
// Same input convention as tasks 001 and 002: arguments come from argv
// because Zero 0.1.2 has no exposed stdin capability. A comes from
// argv[1] and B from argv[2]. All logic stays inside `pub fun main` to
// avoid the Span/MutSpan-as-parameter restriction.
//
// Two rolling [51]u32 buffers form the DP row (current and previous).
// Spec caps each string at 50 characters so a 51-cell row covers it.
pub fun main(world: World) -> Void raises {
    let a_opt = std.args.get(1)
    let b_opt = std.args.get(2)
    if a_opt.has == false || b_opt.has == false {
        check world.err.write("usage: pass A as argv[1] and B as argv[2] (Zero 0.1.2 has no stdin capability)\n")
        return
    }

    let a_bytes: Span<u8> = std.mem.span(a_opt.value)
    let b_bytes: Span<u8> = std.mem.span(b_opt.value)
    let a_len: usize = std.mem.len(a_bytes)
    let b_len: usize = std.mem.len(b_bytes)
    if a_len > 50 || b_len > 50 {
        check world.err.write("each string must be at most 50 characters\n")
        return
    }

    let mut prev: [51]u32 = [0_u32; 51]
    let mut curr: [51]u32 = [0_u32; 51]

    let mut j: usize = 0
    while j <= b_len {
        prev[j] = j as u32
        j = j + 1
    }

    let mut i: usize = 0
    while i < a_len {
        curr[0] = (i + 1) as u32
        let ai: u8 = a_bytes[i]
        let mut k: usize = 0
        while k < b_len {
            let bk: u8 = b_bytes[k]
            let del_cost: u32 = prev[k + 1] + 1_u32
            let ins_cost: u32 = curr[k] + 1_u32
            let mut sub_cost: u32 = prev[k]
            if ai != bk {
                sub_cost = sub_cost + 1_u32
            }
            let mut best: u32 = del_cost
            if ins_cost < best {
                best = ins_cost
            }
            if sub_cost < best {
                best = sub_cost
            }
            curr[k + 1] = best
            k = k + 1
        }
        let mut copy_i: usize = 0
        while copy_i <= b_len {
            prev[copy_i] = curr[copy_i]
            copy_i = copy_i + 1
        }
        i = i + 1
    }

    let distance: u32 = prev[b_len]

    let mut buf: [16]u8 = [0_u8; 16]
    let mut written: usize = 0
    if distance == 0_u32 {
        buf[0] = 48_u8
        buf[1] = 10_u8
        written = 2
    } else {
        let mut digits: [16]u8 = [0_u8; 16]
        let mut rem: u32 = distance
        let mut digit_count: usize = 0
        while rem > 0_u32 {
            let dq: u32 = rem / 10_u32
            let dr: u32 = rem % 10_u32
            digits[digit_count] = (dr as u8) + 48_u8
            rem = dq
            digit_count = digit_count + 1
        }
        let mut o: usize = 0
        while o < digit_count {
            buf[o] = digits[digit_count - 1 - o]
            o = o + 1
        }
        buf[digit_count] = 10_u8
        written = digit_count + 1
    }
    let out: Span<u8> = buf[0..written]
    check world.out.write(out)
    return
}
TypeScript 31 lines
// Levenshtein edit distance, TypeScript reference.
// Reads A and B from stdin (one per line), runs a two-row DP.
import { readFileSync } from "node:fs";

const raw = readFileSync(0, "utf8");
const lines = raw.split("\n");
const a = lines[0] ?? "";
const b = lines[1] ?? "";
if (a.length > 50 || b.length > 50) {
  process.stderr.write("each string must be at most 50 characters\n");
  process.exit(1);
}

const m = a.length;
const n = b.length;
let prev = new Array<number>(n + 1);
let curr = new Array<number>(n + 1);
for (let j = 0; j <= n; j++) prev[j] = j;
for (let i = 0; i < m; i++) {
  curr[0] = i + 1;
  const ai = a.charCodeAt(i);
  for (let k = 0; k < n; k++) {
    const delCost = prev[k + 1] + 1;
    const insCost = curr[k] + 1;
    const subCost = prev[k] + (ai === b.charCodeAt(k) ? 0 : 1);
    curr[k + 1] = Math.min(delCost, insCost, subCost);
  }
  [prev, curr] = [curr, prev];
}
process.stdout.write(`${prev[n]}\n`);
Rust 38 lines
// Levenshtein edit distance, Rust reference.
// Reads A and B from stdin (one per line), runs a two-row DP.
use std::io::{self, Read, Write};
use std::process::ExitCode;

fn main() -> ExitCode {
    let mut input = String::new();
    if io::stdin().read_to_string(&mut input).is_err() {
        let _ = writeln!(io::stderr(), "failed to read stdin");
        return ExitCode::from(1);
    }
    let mut lines = input.split('\n');
    let a = lines.next().unwrap_or("");
    let b = lines.next().unwrap_or("");
    if a.chars().count() > 50 || b.chars().count() > 50 {
        let _ = writeln!(io::stderr(), "each string must be at most 50 characters");
        return ExitCode::from(1);
    }
    let a_bytes = a.as_bytes();
    let b_bytes = b.as_bytes();
    let m = a_bytes.len();
    let n = b_bytes.len();
    let mut prev: Vec<u32> = (0..=n as u32).collect();
    let mut curr: Vec<u32> = vec![0; n + 1];
    for i in 0..m {
        curr[0] = (i + 1) as u32;
        for k in 0..n {
            let del_cost = prev[k + 1] + 1;
            let ins_cost = curr[k] + 1;
            let sub_cost = prev[k] + if a_bytes[i] == b_bytes[k] { 0 } else { 1 };
            curr[k + 1] = del_cost.min(ins_cost).min(sub_cost);
        }
        std::mem::swap(&mut prev, &mut curr);
    }
    let _ = writeln!(io::stdout(), "{}", prev[n]);
    ExitCode::SUCCESS
}
Go 59 lines
// Levenshtein edit distance, Go reference.
// Reads A and B from stdin (one per line), runs a two-row DP.
package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
	"strings"
)

func main() {
	raw, err := io.ReadAll(bufio.NewReader(os.Stdin))
	if err != nil {
		fmt.Fprintln(os.Stderr, "failed to read stdin")
		os.Exit(1)
	}
	lines := strings.SplitN(string(raw), "\n", 3)
	var a, b string
	if len(lines) >= 1 {
		a = lines[0]
	}
	if len(lines) >= 2 {
		b = lines[1]
	}
	if len(a) > 50 || len(b) > 50 {
		fmt.Fprintln(os.Stderr, "each string must be at most 50 characters")
		os.Exit(1)
	}
	m, n := len(a), len(b)
	prev := make([]int, n+1)
	curr := make([]int, n+1)
	for j := 0; j <= n; j++ {
		prev[j] = j
	}
	for i := 0; i < m; i++ {
		curr[0] = i + 1
		for k := 0; k < n; k++ {
			delCost := prev[k+1] + 1
			insCost := curr[k] + 1
			subCost := prev[k]
			if a[i] != b[k] {
				subCost++
			}
			best := delCost
			if insCost < best {
				best = insCost
			}
			if subCost < best {
				best = subCost
			}
			curr[k+1] = best
		}
		prev, curr = curr, prev
	}
	fmt.Print(prev[n], "\n")
}
Python 33 lines
"""Levenshtein edit distance, Python reference.

Reads two lines from stdin (A and B), computes the classic two-row DP,
and writes the distance plus a trailing newline.
"""
import sys


def main() -> None:
    data = sys.stdin.read().split("\n")
    a = data[0] if len(data) >= 1 else ""
    b = data[1] if len(data) >= 2 else ""
    if len(a) > 50 or len(b) > 50:
        sys.stderr.write("each string must be at most 50 characters\n")
        sys.exit(1)
    m, n = len(a), len(b)
    prev = list(range(n + 1))
    curr = [0] * (n + 1)
    for i in range(m):
        curr[0] = i + 1
        ai = a[i]
        for k in range(n):
            del_cost = prev[k + 1] + 1
            ins_cost = curr[k] + 1
            sub_cost = prev[k] + (0 if ai == b[k] else 1)
            curr[k + 1] = min(del_cost, ins_cost, sub_cost)
        prev, curr = curr, prev
    sys.stdout.write(f"{prev[n]}\n")


if __name__ == "__main__":
    main()

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/003-levenshtein-distance/notes.md.

Why this task

Third pure-computation task. The first two corpus tasks exercised single-integer arithmetic (recursion + memoization for 001, array marking for 002). This one is the first that takes two inputs and produces a number that depends on a two-dimensional dynamic programming relation. The model must (a) read two strings, not one integer, (b) materialize the standard Levenshtein recurrence (insert, delete, substitute, each cost 1, take the minimum), (c) keep the table small enough to run cheaply on the 50-char cap, (d) emit a single decimal followed by \n.

Difficulty: easy. The recurrence is canonical algorithm material but catches models that get the row/column indexing wrong, that forget to seed the first row and first column with the lengths, or that forget the equal-character zero-cost short circuit.

Input shape changes

Tasks 001 and 002 read a single integer N from stdin (or argv[1] for Zero). This task reads two strings, one per line. For Zero that means argv[1] is A and argv[2] is B, mirroring the per-line convention of the other four languages.

The 50-character cap on each string keeps the DP table inside Zero's stack-allocated [51]u32 rolling buffers (the prior tasks used [N]u8 because the spec's data fit in bytes; here we need u32 because the distance for a 50/50 string pair could theoretically reach 50 and a u8 would still cover that, but I'd rather keep the cell type the same as the conventional textbook value).

Edge cases probed

  • Both strings empty: distance 0. Catches off-by-one initialization.
  • One empty, one non-empty: distance equals the length of the other. Catches models that skip the boundary seeding entirely.
  • Equal strings (not tested in the matrix here but covered transitively by case-002 with empty/empty == "abc"/"abc" == 0 logic).
  • Classic kitten -> sitting (distance 3) and saturday -> sunday (distance 3) cover the substitution + insertion + deletion mix that shows up in every algorithms textbook.
  • intention -> execution (distance 5) tests a deeper DP path with no obvious shared prefix to short-circuit.
  • The long pair the quick brown fox jumps over the lazy dog vs the quick brown dog jumps over the lazy fox is interesting because the naive intuition is "6 substitutions, 3 per word" but the right answer is 4: o is shared between fox and dog at the middle position, so only f<->d and x<->g substitute. Easy hand-error.

Zero implementation notes

Same input convention as 001 and 002: argv-based because Zero 0.1.2 has no exposed stdin capability. All logic stays inside pub fun main.

The DP uses two rolling [51]u32 buffers (current row and previous row) rather than a full [51][51]u32 table. Two reasons:

  1. The direct backend's array-element-type matrix doesn't include nested arrays as cleanly as a single flat allocation.
  2. Two rolling buffers are the textbook memory optimization and cut the stack footprint from ~10KB to ~408 bytes.

The decimal renderer reuses the pattern from 001/002 (digits buffer, walk the value, reverse into the output buffer) with distance: u32 as the source value. Same small-divisor (10) divmod, so it stays clear of the 2^32-divisor SIGFPE documented in ~/repos/zero-bugs/u64-divmod-cast-sigfpe/.

Future revisions

  • Higher per-string cap. The current [51]u32 allocation tracks the spec's 50-char limit. A 256-char cap would push the rolling buffer to about 2KB still, well inside the stack budget.
  • Unicode. Today's spec is ASCII-only to keep the byte indexing trivial. A future revision could ask for Unicode-codepoint-level edit distance, which would surface whether the model knows the difference between len(bytes) and len(codepoints).
  • Damerau-Levenshtein (add a transposition rule). Same recurrence with one extra case for adjacent swap. Defer to v0.2 if a model consistently confuses the two variants.

Cost

Model Prompt tokens Completion tokens API ms
gpt-4o 2,955 1,963 15,340
gpt-4o-mini 2,955 1,645 44,590
gpt-5 2,950 15,759 193,755

Tokens and API ms are summed across the five languages this model attempted for this task.


Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 . Back to the leaderboard and methodology.