An old wooden rowboat tied to a misty lake dock at dawn, ripples spre...
The prompt
An old wooden rowboat tied to a misty lake dock at dawn, ripples spreading slowly as a heron lands on the bow
What the agent wrote
An old weathered wooden rowboat tied with a frayed rope to a misty lake dock at dawn, soft pink and gold light filtering through low fog over still gray water. A great gray heron glides down and lands gently on the bow, its wings folding as the boat dips slightly, sending slow concentric ripples spreading outward across the glassy surface. The camera holds in a slow steady dolly-in toward the boat, mist drifting lazily. Calm, serene, painterly atmosphere with gentle reflections and dewy reeds swaying faintly at the water's edge.
Why this framing: A wide tranquil lakescape with horizontal water and drifting mist reads best in 16:9, giving room for the heron's descent and the ripples to spread.
The eval receipt
Every Take clip ships with the measurements that admitted it. Deterministic gates and CV lanes run first; the VLM judge only rules on what those lanes cannot measure, and it must justify each level before naming it. How the cascade works.
L0 · Deterministic gates
| Gate | Result | Reading | Check |
|---|---|---|---|
| decodes | pass | h264 | ffprobe parsed a video stream |
| duration | pass | 5.04s | expected 5s ± 1s |
| resolution | pass | 960x540 | expected height 540 |
| framerate | pass | 24.00 fps | sane range 12-60 |
| not black | pass | mean luma 86.6 | 0/121 frames under 12.0 |
| not frozen | pass | mean frame diff 3.744 | static-cheat detector, threshold 0.35 |
L1 · CV lanes
mean abs luma diff between consecutive frames
How much each frame differs from the next, checked on every frame. A high average usually just means a busy scene; sudden spikes far above the average mean strobing.
Farneback flow magnitude, motion energy per frame pair
How far pixels move between frames: one number for how much is actually happening. Under 0.3 is basically a still image; 2 to 8 is normal motion.
cosine of CLIP ViT-B/32 prompt and frame embeddings
How well the frames match what you typed, scored by CLIP. 0.30 and up is well on-prompt; below 0.24 the model probably wandered off.
DINOv2 cosine of each sampled frame against frame 0
Whether the subject stays the same subject, with every frame compared against the first. Watch the min: a single frame below 0.5 means identity broke.
These bands come from a 14-clip calibration set we ran before launch. See the full thresholds and the clips that set them.
L2 · Judge verdict
A serene, painterly clip of a gray heron gliding down and landing on the bow of a weathered rowboat, with believable wing mechanics and ripples spreading across still misty water. Scene consistency is solid despite a low dino_drift number, which is an artifact of the bird's pose change and the dolly move rather than morphing. The main gap is mood lighting — it reads cool and gray instead of the warm pink-and-gold dawn the prompt calls for, and the dock and frayed rope aren't clearly shown.
What the judge saw
The timestamped contact sheet, exactly as handed to the judge. ffmpeg pulled eight evenly spaced frames from the clip, stamped each with its timestamp, and tiled them into this one grid; it is the only image the judge reads. Why a grid beats a video.