Take

A red fox trotting through fresh snow at the edge of a pine forest, l...

eval 19/24· 540p / 5s / 16:9 · 1 attempt ·159s wall

The prompt

A red fox trotting through fresh snow at the edge of a pine forest, low golden morning light

What the agent wrote

A lone red fox trots steadily across fresh untouched snow at the edge of a pine forest, breath puffing in the cold air, bushy tail held low and swaying with each step, paws kicking up small sprays of powder. Low golden morning sunlight rakes through the dark evergreens, casting long blue shadows across the snow and lighting the fox's rust-colored fur with a warm rim glow. The camera tracks smoothly alongside the fox at a slow steady pace, keeping it centered as it moves left to right. Crisp, serene, wintry mood.

Why this framing: 16:9 suits the lateral tracking motion across a wide snowy forest edge, giving room for the fox's left-to-right travel and the long shadows.

The eval receipt

Every Take clip ships with the measurements that admitted it. Deterministic gates and CV lanes run first; the VLM judge only rules on what those lanes cannot measure, and it must justify each level before naming it. How the cascade works.

L0 · Deterministic gates

GateResultReadingCheck
decodes pass h264 ffprobe parsed a video stream
duration pass 5.04s expected 5s ± 1s
resolution pass 960x540 expected height 540
framerate pass 24.00 fps sane range 12-60
not black pass mean luma 62.9 0/121 frames under 12.0
not frozen pass mean frame diff 7.652 static-cheat detector, threshold 0.35

L1 · CV lanes

Flicker 7.65
min 6.89 · max 8.33 · σ 0.26

mean abs luma diff between consecutive frames

How much each frame differs from the next, checked on every frame. A high average usually just means a busy scene; sudden spikes far above the average mean strobing.

Optical flow 5.80
min 4.69 · max 7.92 · σ 0.86

Farneback flow magnitude, motion energy per frame pair

How far pixels move between frames: one number for how much is actually happening. Under 0.3 is basically a still image; 2 to 8 is normal motion.

CLIPScore 0.347
min 0.320 · max 0.372 · σ 0.016

cosine of CLIP ViT-B/32 prompt and frame embeddings

How well the frames match what you typed, scored by CLIP. 0.30 and up is well on-prompt; below 0.24 the model probably wandered off.

DINO drift 0.876
min 0.708 · max 1.000 · σ 0.097

DINOv2 cosine of each sampled frame against frame 0

Whether the subject stays the same subject, with every frame compared against the first. Watch the min: a single frame below 0.5 means identity broke.

These bands come from a 14-clip calibration set we ran before launch. See the full thresholds and the clips that set them.

L2 · Judge verdict

fidelity good Across #0-#9 the fox holds plausible quadruped anatomy — four legs resolve in the trotting poses (#1, #5, #9) with no obvious extra or fused limbs, and the fur texture reads naturally with warm rim light. At 960x540 fine detail is soft and the leg geometry blurs slightly during fast gait phases, but I see no mangled anatomy or garbled texture. The darker shadowed frames (#6-#8) lose some definition on the fox's body but introduce no visible artifacts.
aesthetics good Strong cinematic composition: low golden side-light rims the fox's rust fur against dark evergreens, with cool blue snow shadows providing color contrast exactly as a wintry dawn scene should. The fox stays well-placed on the snow ridge with the forest as a clean backdrop. A cinematographer would happily keep #0-#5; the later frames (#6-#9) drift the fox into a darker, less lit pocket that flattens the appeal slightly.
consistency good The subject remains a single rust-colored fox throughout, and the snow ridge / pine backdrop is stable. dino_drift falls from 1.0 to 0.71 by #9, which matches what I see: the fox shrinks and shifts into shadow in #6-#9 and changes pose (from upright trot to lower crouched walk), so the identity-cosine drop reflects scale/lighting/pose change rather than the animal morphing into something else. No subject swap is visible.
motion good Flow mean 5.80 (range 4.7-7.9) indicates steady normal motion, consistent with a tracking shot alongside a trotting fox. The sampled poses progress believably through a trot/walk cycle — legs extend and gather plausibly between #1, #5, and #9 — and the body translates left-to-right as required. Poses look natural with no frozen or teleporting limbs in the samples.
semantics excellent Clipscore mean 0.347 (peaking 0.373) confirms strong prompt alignment. Every required element is present: lone red fox (#0-#9), fresh snow, edge of a pine/evergreen forest, low golden raking sunlight, long blue shadows, warm rim glow on the fur, tail held low, and a smooth left-to-right tracking move. Breath puffs are not clearly resolvable at this resolution but everything else is depicted.
physics good The fox maintains believable ground contact on the snow ridge across frames, with the body weight and leg planting reading correctly for a trot. Lighting direction is consistent — rim light from the low sun on the fox's near side and long cool shadows cast across the snow. No floating, no object permanence breaks, no gravity violations visible in the samples.

A serene, well-lit wintry clip that delivers the prompt faithfully: a rust-colored fox trots left-to-right across snow at a pine forest edge under golden rim light with long blue shadows. Per-frame quality and motion are solid with no visible anatomy errors, and the only weakness is the fox drifting into a darker, less-lit pocket in the final third (reflected in the falling identity score), which slightly dims the back half.

What the judge saw

The timestamped contact sheet, exactly as handed to the judge. ffmpeg pulled eight evenly spaced frames from the clip, stamped each with its timestamp, and tiled them into this one grid; it is the only image the judge reads. Why a grid beats a video.

Timestamped contact sheet

Attempt history

Attempt 1 accepted score 19/24
gen 87.9s · eval 12.3s
Make your own take Read how the eval works