Non-overlapping substring count

006-substring-count. Read two lines from standard input:

Prompt

This is the natural-language brief given to every model, verbatim. The harness prefixes a language-specific calling-convention block and suffixes a "return only the source code" instruction. Nothing else.

## Task: Non-overlapping substring count

Read two lines from standard input:

1. A pattern `P` (1 to 100 printable-ASCII characters).
2. A text `T` (0 to 1000 printable-ASCII characters).

Count the number of **non-overlapping** occurrences of `P` inside `T`, using
a greedy left-to-right scan: once a match starts at index `i`, the next
candidate match starts at `i + len(P)`. Write the decimal count followed by
a newline. Exit with status 0.

## Acceptance

- Stdin: `P\nT\n`. Both lines printable-ASCII; `P` length in `[1, 100]`,
  `T` length in `[0, 1000]`.
- Stdout: the count as a decimal integer (no leading zeros, no sign) followed
  by exactly one `\n`.
- Stderr: empty.
- Exit code: 0.

## Examples

| pattern (P) | text (T)            | output |
| ----------- | ------------------- | ------ |
| `ab`        | `abababab`          | `4`    |
| `aa`        | `aaaa`              | `2`    |
| `xyz`       | `aaa`               | `0`    |
| `a`         | `bbbbb`             | `0`    |
| `na`        | `banana`            | `2`    |
| `hello`     | `hello`             | `1`    |
| `x`         | (empty)             | `0`    |

In `aa` over `aaaa`, the greedy non-overlapping rule yields 2 matches at
positions 0 and 2, not 3 matches. In `na` over `banana`, the matches are at
positions 2 and 4 (no overlap).

Acceptance

A task counts as passed only when every public and hidden test case agrees on these fields. No fuzzy matching, no "off by one trailing newline is fine."

stdout (byte-exact, per case)	true
stderr (exact bytes)	""
exit code	0
wall time max (ms)	5000
tags	parsing, strings, search

Results

Each cell is one attempt. Pass means stdout matched byte-exact on every test case, stderr empty, exit zero. Hover a failure to see the captured first line of the diagnostic.

Model	Zero	TypeScript	Rust	Go	Python
gpt-4o	compile	✓	✓	✓	✓
gpt-4o-mini	compile	✓	✓	✓	✓
gpt-5	compile	✓	✓	✓	✓
opus	wrong output	✓	✓	✓	✓
sonnet	wrong output	✓	✓	✓	✓

Failure excerpts

5 of 25 attempts failed. Each card is one attempt, with the captured first line of the diagnostic.

gpt-4o Zero compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

gpt-4o-mini Zero compile

ref.zero:1:1 PAR100: expected '{' before block

gpt-5 Zero compile

ref.zero:1:1 IMP001: unknown package-local import 'std'

opus Zero wrong output
```
(no diagnostic captured)
```
sonnet Zero wrong output
```
(no diagnostic captured)
```

Reference implementations

The hand-written reference each language ships with. Every reference passes the same public and hidden test suite under the pinned toolchain before any model touches the task.

Click a language to expand

Zero n/a

No reference implementation in this language.

TypeScript 29 lines

// Non-overlapping substring count, TypeScript reference.
// Reads pattern P from line 1 of stdin and text T from line 2.

import { readFileSync } from "node:fs";

function main(): void {
  const raw = readFileSync(0, "utf8");
  const lines = raw.split("\n");
  const p = lines[0] ?? "";
  const t = lines[1] ?? "";
  let count = 0;
  if (p.length > 0) {
    let i = 0;
    const n = t.length;
    const m = p.length;
    while (i + m <= n) {
      if (t.substring(i, i + m) === p) {
        count++;
        i += m;
      } else {
        i++;
      }
    }
  }
  process.stdout.write(`${count}\n`);
}

main();

Rust 34 lines

// Non-overlapping substring count, Rust reference.
// Reads pattern P from line 1 of stdin and text T from line 2.

use std::io::{self, Read, Write};

fn main() {
    let mut input = String::new();
    io::stdin().read_to_string(&mut input).unwrap();
    let mut lines = input.split('\n');
    let p = lines.next().unwrap_or("");
    let t = lines.next().unwrap_or("");
    let mut count: u64 = 0;
    if !p.is_empty() {
        let pb = p.as_bytes();
        let tb = t.as_bytes();
        let m = pb.len();
        let n = tb.len();
        if m <= n {
            let mut i = 0;
            while i + m <= n {
                if &tb[i..i + m] == pb {
                    count += 1;
                    i += m;
                } else {
                    i += 1;
                }
            }
        }
    }
    let out = io::stdout();
    let mut h = out.lock();
    writeln!(h, "{}", count).unwrap();
}

Go 40 lines

// Non-overlapping substring count, Go reference.
// Reads pattern P from line 1 of stdin and text T from line 2.

package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
	"strings"
)

func main() {
	raw, _ := io.ReadAll(bufio.NewReader(os.Stdin))
	parts := strings.SplitN(string(raw), "\n", 3)
	var p, t string
	if len(parts) >= 1 {
		p = parts[0]
	}
	if len(parts) >= 2 {
		t = parts[1]
	}
	count := 0
	if len(p) > 0 {
		m := len(p)
		n := len(t)
		i := 0
		for i+m <= n {
			if t[i:i+m] == p {
				count++
				i += m
			} else {
				i++
			}
		}
	}
	fmt.Printf("%d\n", count)
}

Python 29 lines

"""Non-overlapping substring count, Python reference.

Reads pattern P from line 1 of stdin and text T from line 2. Counts greedy
non-overlapping occurrences and writes the decimal count followed by \\n.
"""
import sys


def main() -> None:
    raw = sys.stdin.read().split("\n", 2)
    p = raw[0] if len(raw) >= 1 else ""
    t = raw[1] if len(raw) >= 2 else ""
    count = 0
    if p:
        i = 0
        n = len(t)
        m = len(p)
        while i + m <= n:
            if t[i:i + m] == p:
                count += 1
                i += m
            else:
                i += 1
    sys.stdout.write(f"{count}\n")


if __name__ == "__main__":
    main()

Design notes

Algorithm, failure modes, cross-language parity, and where Zero needed a workaround. From corpus/006-substring-count/notes.md.

Algorithm

Greedy left-to-right scan. For each position i in 0..=n-m, test T[i..i+m] == P. On match, count++ and i += m. On miss, i += 1. Empty T or m > n short-circuits to 0.

Why greedy non-overlap (not overlapping count)

The two reasonable count semantics are:

Greedy non-overlapping (what we use): aa in aaaa = 2.
Overlapping: aa in aaaa = 3.

We chose the greedy non-overlapping shape because (a) it is the more common default across str.count implementations (Python "aaaa".count("aa") = 2, Rust str::matches does non-overlapping when used naively), and (b) it is the deterministic shape that a model can derive without ambiguity from the spec wording "once a match starts at index i, the next candidate match starts at i + len(P)."

Models that conflate the two semantics will fail case 2 (the canonical trap) and case 1 (which would yield 5, not 4, under the overlapping rule applied to ab over abababab).

Zero-specific notes

argv[1] = P, argv[2] = T. Both defaulted to empty Span<u8> when the caller omits them (we branch on .has).
Inner equality check is a manual byte-loop (no == on Span<u8> in Zero 0.1.2 direct backend). Early-exit on mismatch by setting the loop index to m (Zero has no break).
The count fits in u32 (max is n/m and n <= 1000). The decimal renderer reuses the small-divisor pattern from tasks 001-004.

Cross-implementation parity

All five references produce byte-exact decimal-count followed by \n on every case in both stdin (TS/Rust/Go/Python) and argv (Zero) input modes.

Cost

Model	Prompt tokens	Completion tokens	API ms
gpt-4o	3,275	870	8,086
gpt-4o-mini	3,275	679	15,692
gpt-5	3,270	14,299	151,663
opus	12	726	130,299
sonnet	12	809	86,806

Tokens and API ms are summed across the five languages this model attempted for this task.

Compare

Model deep-dives: gpt-4o · gpt-4o-mini · gpt-5 · opus · sonnet . Back to the leaderboard and methodology.

Truffle · 006-substring-count · run captured 2026-06-27