I Rebuilt One Lost Garden Scene Three Times - The Generating Was the Cheap Part

I Rebuilt One Lost Garden Scene Three Times - The Generating Was the Cheap Part
There is a corridor in Lost Garden, the dark-fantasy anime series I am building solo, where the main 2026-7-1 16:0:48 Author: hackernoon.com(查看原文) 阅读量:1 收藏

There is a corridor in Lost Garden, the dark-fantasy anime series I am building solo, where the main character walks toward a sound she cannot place. Torchlight. Wet stone. Her face was half in shadow. On paper, it is one of the simplest shots in the whole sequence. A character walks down a hall.

I rebuilt that scene three times. Not three takes. Three full rebuilds, across the better part of three weeks. And here is the part nobody tells you when they sell you on AI filmmaking: generating the shots was the fastest, cheapest, least painful thing I did the entire time.

I want to walk through exactly what happened, because the breakdown of where my hours went is the most useful thing I can hand another AI filmmaker. The numbers were not where I expected them.

The scene and the first version.

The corridor sequence is about forty seconds of finished screen time. Six shots. A wide establishing the hall, a medium of her walking, two inserts (the torch, her hand on the wall), a close-up as she stops, and a reverse on whatever she is walking toward.

For the first version, I did what almost everyone does on their first real project. I opened a generator and started prompting. I had the script. I had a mood in my head. I figured I would describe each shot, pick the good takes, and cut them together.

The individual clips came out fast, and they looked great. That is the trap. A single shot from a current model can look like a frame from a real film. The problem is never one-shot. The problem is the second shot and the third.

The first version fell apart not because any clip was bad, but because no two clips agreed on what they were showing.

Her hair changed length between the wide and the medium. The corridor was three torches wide in one shot and a narrow passage in the next. Her eyes drifted from amber to grey. The torch flame jumped sides. Stitched together, it did not read as one place or one person. It read as six beautiful postcards from six different films.

I had spent maybe an afternoon generating and two days trying to cut it into something coherent. I could not. So I threw it out.

Where the hours actually went.

Before I describe the rebuilds, here is the honest accounting for that one forty-second scene, version one through version three. I logged it because I did not believe my own gut estimate.

Generating clips: roughly 15% of my time. The actual act of running prompts and waiting for output. The thing the demos show you.
Selecting and discarding: roughly 30%. Watching variations, killing the ones that drift. This is the silent tax of AI video, and it is enormous.
Continuity and fixing: roughly 35%. Hunting down why shot four contradicts shot two, then deciding whether to fix it in the prompt, at the cut, or in post.
Planning and setup: roughly 20%. The work I should have done first and did last: locking the character, the world, the palette, the rules.

That top line is the whole lesson. The cheap, fun, marketable part of AI filmmaking is about a sixth of the real job. Everything else is decisions and bookkeeping.

Where my time actually went on one 40-second scene, across three rebuilds.

The discard rate alone surprised me, and it turns out it is normal. Working filmmakers using these tools commonly generate twenty to fifty variations per shot before they keep one, and a typical short can mean eighty to a hundred and fifty individual clips to assemble a final cut. I was not doing it wrong. I was doing the normal thing, which is to say I was drowning in selects.

Version Two: I anchored the character and still drifted.

For the second version, I fixed the most obvious failure first. The character.

This is the part of the toolchain that has genuinely matured. Instead of re-describing her face in every prompt and praying, you train a reusable identity from reference images. Higgsfield’s Soul ID is the clearest example I use: you upload a handful of reference photos, including varied angles and at least one full-height shot, it trains a persistent identity in a few minutes, and that character becomes available across your image and video generations with a stable face, skin tone, and proportions.

So, version two had a consistent protagonist. Her face held across all six shots. I thought I had solved it.

I had solved one of four problems. The corridor was still drifting. The torch was still jumping. The palette warmed and cooled depending on which take I kept. The character was an island of consistency floating in a world that kept rebuilding itself shot to shot. Tools like Runway’s Gen-4 have made real progress on holding a look across shots, but if you need the same thing to appear identically across a dozen scenes, you are still iterating and cherry-picking. Consistency is not a setting you flip on. It is a discipline you carry.

Version two was better and still unusable. The seams had just moved.

Version Three: Build the whole world once, then generate.

The third version is the one that worked, and the change was not a better prompt or a better model. It was a sequence. I built every fixed thing before I generated a single final shot.

The character, locked. Same trained identity from version two, kept.
The world, locked. I pinned the corridor itself: a reference still of the exact hall, its width, the number and placement of torches, and the stone texture. That still became the anchor I fed into every shot in the scene.
The palette, locked. One color decision for the whole sequence. Cold blue shadow, warm amber pool around each torch, nothing in between. Written down, not remembered.
The grammar, locked. Lens feel, eye-level, how the camera moves. Decided once, so the six shots feel cut from one roll.

Then, and only then, I generated. The discard rate dropped because the model had less room to invent. The continuity work dropped because there was less to reconcile. The scene held together on roughly the second assembly instead of never.

The fix was sequence: lock character, world, palette, and grammar before generating.

Generation is the part that got cheap. Continuity is the part that never did, and you pay for it either up front in planning or afterward in fixing. There is no version where you do not pay.

The reason I now treat that planning layer as the actual product is simple. The plan is the only thing in an AI pipeline that has a memory. The models do not. Each clip is generated fresh, with no knowledge of the last one. The continuity has to live somewhere outside the model, and if it does not live in a deliberate, written-down plan, it lives nowhere, which means it lives in your head until your head loses it three weeks in.

In an AI pipeline, the written-down plan is the only thing that remembers. The models do not.

That gap is exactly what I am building ScreenWeaver to close: a place where the character, the world, the palette, and the rules of a project are held as the source of truth, so the plan is a real thing the production runs against instead of a memory that drifts. Version three of the corridor was the first time I ran a scene that way on purpose. It is the only reason it shipped.

What I would tell myself before version one.

If I could hand the three-weeks-ago version of me one note, it would not be about prompting. It would be this:

Assume the shot is the easy part. Budget your time for selecting, reconciling, and planning, because that is where five-sixths of it goes.
Lock before you generate. Character, world, palette, camera grammar. Locking is not bureaucracy. It is the thing that makes the cheap part stay cheap.
Treat continuity as external. The model will not remember. Write down what must not change and feed it back in every time.
Count your discards honestly. If you are tossing twenty takes a shot, you are normal. Build a system for it instead of being surprised by it every time.

The promise of AI filmmaking is real. One person genuinely can make something that used to take a crew. But the promise that gets sold and the work that actually fills your week are two different things. The generation is the magic in the demo. The case study is everything around it.

Lost Garden is still in production. The corridor scene is done. It took three rebuilds to learn that the rebuilds were never about the shots.

文章来源: https://hackernoon.com/i-rebuilt-one-lost-garden-scene-three-times-the-generating-was-the-cheap-part?source=rss
如有侵权请联系:admin#unsafe.sh