New on Truffle

Type a scene.
The agent shoots it.
The eval grades it.

Take is a small text-to-video studio where the evaluation loop is the product. Describe a shot. An agent writes the direction, Luma Ray3.2 renders it, and a four-level cascade decides whether the take survives: deterministic gates, computer-vision lanes, a vision-language judge, and a bounded retake policy. You watch every step happen live.

The cascade

Generation is the easy half. Take's pipeline treats every render as a claim that has to survive four levels of scrutiny before it gets published.

L0
Deterministic gates

ffprobe decode, duration, resolution, framerate. Then pixel gates: not black, not frozen. A clip that fails here never wastes a judge call.
L1
Computer-vision lanes

Frame-difference flicker, Farneback optical flow, CLIPScore prompt alignment, DINOv2 identity drift against frame zero. The temporal questions a vision-language model answers at chance level are owned by code.
L2
Vision-language judge

A timestamped contact sheet goes to a frontier judge that writes its rationale before its rating, on six axes: fidelity, aesthetics, consistency, motion, semantics, physics. Discrete levels, not fake-precision decimals.
L3
Bounded decision

Deterministic accept-or-retake. Any axis at poor or below triggers a retake with the judge's advice folded into the next composition. Three attempts, then the best surviving take stands.

Why it's built this way: the research behind the eval →

Gallery

Every tile is a take that survived the cascade. Hover to play; open for the full eval receipt.

The cascade

Deterministic gates

Computer-vision lanes

Vision-language judge

Bounded decision

Gallery