A red fox trotting through fresh snow at the edge of a pine forest, l...
The prompt
A red fox trotting through fresh snow at the edge of a pine forest, low golden morning light
What the agent wrote
A lone red fox trots steadily across fresh untouched snow at the edge of a pine forest, breath puffing in the cold air, bushy tail held low and swaying with each step, paws kicking up small sprays of powder. Low golden morning sunlight rakes through the dark evergreens, casting long blue shadows across the snow and lighting the fox's rust-colored fur with a warm rim glow. The camera tracks smoothly alongside the fox at a slow steady pace, keeping it centered as it moves left to right. Crisp, serene, wintry mood.
Why this framing: 16:9 suits the lateral tracking motion across a wide snowy forest edge, giving room for the fox's left-to-right travel and the long shadows.
The eval receipt
Every Take clip ships with the measurements that admitted it. Deterministic gates and CV lanes run first; the VLM judge only rules on what those lanes cannot measure, and it must justify each level before naming it. How the cascade works.
L0 · Deterministic gates
| Gate | Result | Reading | Check |
|---|---|---|---|
| decodes | pass | h264 | ffprobe parsed a video stream |
| duration | pass | 5.04s | expected 5s ± 1s |
| resolution | pass | 960x540 | expected height 540 |
| framerate | pass | 24.00 fps | sane range 12-60 |
| not black | pass | mean luma 62.9 | 0/121 frames under 12.0 |
| not frozen | pass | mean frame diff 7.652 | static-cheat detector, threshold 0.35 |
L1 · CV lanes
mean abs luma diff between consecutive frames
How much each frame differs from the next, checked on every frame. A high average usually just means a busy scene; sudden spikes far above the average mean strobing.
Farneback flow magnitude, motion energy per frame pair
How far pixels move between frames: one number for how much is actually happening. Under 0.3 is basically a still image; 2 to 8 is normal motion.
cosine of CLIP ViT-B/32 prompt and frame embeddings
How well the frames match what you typed, scored by CLIP. 0.30 and up is well on-prompt; below 0.24 the model probably wandered off.
DINOv2 cosine of each sampled frame against frame 0
Whether the subject stays the same subject, with every frame compared against the first. Watch the min: a single frame below 0.5 means identity broke.
These bands come from a 14-clip calibration set we ran before launch. See the full thresholds and the clips that set them.
L2 · Judge verdict
A serene, well-lit wintry clip that delivers the prompt faithfully: a rust-colored fox trots left-to-right across snow at a pine forest edge under golden rim light with long blue shadows. Per-frame quality and motion are solid with no visible anatomy errors, and the only weakness is the fox drifting into a darker, less-lit pocket in the final third (reflected in the falling identity score), which slightly dims the back half.
What the judge saw
The timestamped contact sheet, exactly as handed to the judge. ffmpeg pulled eight evenly spaced frames from the clip, stamped each with its timestamp, and tiled them into this one grid; it is the only image the judge reads. Why a grid beats a video.