Take

Square 1:1 format. Camera mounted directly overhead, viewing the cup ...

eval 21/24· 720p / 5s / 1:1 · 1 attempt ·168s wall

The prompt

Square 1:1 format. Camera mounted directly overhead, viewing the cup surface from above. Macro close-up of a barista pouring steamed milk into a ceramic cup of espresso, a rosetta leaf pattern blooming face-on in the caramel crema, wisps of steam rising, warm cafe light, ultra realistic top-down footage

What the agent wrote

Overhead macro shot, camera mounted directly above a small white ceramic cup filled with dark espresso, looking straight down at the crema. A thin stream of steamed milk pours steadily from the upper edge into the center, the white milk swirling and feathering outward as the barista wiggles and pulls, a rosetta leaf pattern blooming face-on across the caramel crema in real time. Delicate wisps of steam curl upward through warm tungsten cafe light, the surface rippling gently. Static top-down camera, shallow focus, photorealistic, soft golden highlights on the glossy crema.

Why this framing: The cup viewed dead-on from above is a centered, radially symmetric subject, so a square 1:1 frame keeps the rosetta perfectly composed as requested.

The eval receipt

Every Take clip ships with the measurements that admitted it. Deterministic gates and CV lanes run first; the VLM judge only rules on what those lanes cannot measure, and it must justify each level before naming it. How the cascade works.

L0 · Deterministic gates

GateResultReadingCheck
decodes pass h264 ffprobe parsed a video stream
duration pass 5.04s expected 5s ± 1s
resolution pass 960x960 expected height ~960 (±48)
framerate pass 24.00 fps sane range 12-60
not black pass mean luma 93.4 0/121 frames under 12.0
not frozen pass mean frame diff 1.789 static-cheat detector, threshold 0.35

L1 · CV lanes

Flicker 1.79
min 1.08 · max 2.96 · σ 0.36

mean abs luma diff between consecutive frames

How much each frame differs from the next, checked on every frame. A high average usually just means a busy scene; sudden spikes far above the average mean strobing.

Optical flow 3.84
min 2.86 · max 4.57 · σ 0.57

Farneback flow magnitude, motion energy per frame pair

How far pixels move between frames: one number for how much is actually happening. Under 0.3 is basically a still image; 2 to 8 is normal motion.

CLIPScore 0.340
min 0.331 · max 0.352 · σ 0.007

cosine of CLIP ViT-B/32 prompt and frame embeddings

How well the frames match what you typed, scored by CLIP. 0.30 and up is well on-prompt; below 0.24 the model probably wandered off.

DINO drift 0.946
min 0.904 · max 1.000 · σ 0.033

DINOv2 cosine of each sampled frame against frame 0

Whether the subject stays the same subject, with every frame compared against the first. Watch the min: a single frame below 0.5 means identity broke.

These bands come from a 14-clip calibration set we ran before launch. See the full thresholds and the clips that set them.

L2 · Judge verdict

fidelity good Photorealistic crema texture and crisply feathered milk leaf throughout #0-#9. No mangled anatomy (no hands/limbs in frame to count). The milk stream and rosetta tines hold clean edges; only mild softness in the outer feathering and faint AI sheen on the glossy surface. No tearing or warped geometry.
aesthetics excellent Warm tungsten/golden palette with soft highlights on the glossy crema, centered overhead composition, pleasant bokeh ring of cafe lights around the cup rim (#0,#4). The face-on rosetta is well-placed and the lighting is flattering — a cinematographer would keep this.
consistency excellent Same cup, scene, and lighting across all samples; dino_drift mean 0.946 (min 0.904) confirms no morph. The pattern evolves coherently rather than jumping, and the gentle downward drift in DINO simply tracks the leaf growing/pour stopping, not identity loss.
motion good flow mean 3.84 (2.86-4.57) = healthy normal motion, fitting a live pour. Pose progression is plausible: stream present and pattern building #0-#6, stream withdrawn and pattern settling by #8-#9, with the rosetta tines accumulating naturally outward. No teleporting or frozen passages.
semantics excellent clipscore mean 0.34 (good band). Every required element is present: top-down view, light ceramic cup, dark espresso/caramel crema, steamed-milk stream pouring into center, and a rosetta leaf pattern blooming face-on. Steam wisps are not clearly resolvable in these stills but the core scene matches strongly.
physics good Pour stream contacts at the cup center and the white milk feathers outward in a believable fluid pattern; the leaf spine and tines form with correct surface flow. Stream tapers and stops plausibly by #8. No floating objects or gravity violations visible in-frame.

A convincing overhead latte-art pour: a rosetta leaf blooms face-on across warm caramel crema under golden tungsten light, with steady normal motion and rock-solid scene consistency. Visual quality is high with only mild softness in the milk feathering, and all prompt elements are clearly depicted.

What the judge saw

The timestamped contact sheet, exactly as handed to the judge. ffmpeg pulled eight evenly spaced frames from the clip, stamped each with its timestamp, and tiled them into this one grid; it is the only image the judge reads. Why a grid beats a video.

Timestamped contact sheet

Attempt history

Attempt 1 accepted score 21/24
gen 104.5s · eval 12.6s
Make your own take Read how the eval works