Behind the Build·7 min read

Today's AI Specialist: The Image Library Steward. The Agent Who Decides Whether the Visual Identity Holds.

For about six weeks in the spring, the blog had a visual identity drift problem.

Two posts published on the same day looked like they were written by different companies. One had the muted-navy editorial palette I had specified at the start of the build. The other had crept into a warmer, more golden look that was not wrong on its own but did not belong to the same publication. By the end of the week, scrolling the blog index felt like scrolling four different blogs that happened to share a domain.

This was not a prompt problem. The image prompts were fine. It was a coherence problem, and coherence is a property that lives across images, not inside any single one.

So I added an agent whose only job is to decide whether the next image belongs.

What is the Image Library Steward?

The Image Library Steward holds the visual identity of the entire EnglishFluency.Online and Behind the Build publication surface — every blog header, every persona portrait on the team page, every social card, every PDF cover for Custom Conversation Packs. Roughly two thousand image slots across the active publication.

The Steward does not generate images. The image generators do that. The Steward governs the brief that the generators receive, and audits the output before it is allowed to be persisted to the library.

You can think of the role as a creative director who only ever asks one question of a candidate image: does this belong in the room with the others.

Why was a prompt template not enough?

The first version of the system was a prompt template. Every image got the same opening line about photorealistic editorial photography, muted navy and warm neutral palette, candid framing, no text, no logos. That worked for about a hundred images.

It stopped working at about a hundred and twenty, for reasons I had not anticipated. The generators are stochastic. Two runs of the same prompt produce two different images. The drift is small in any one image — a slightly warmer light, a slightly different angle, a subtly different mood — but it compounds across a library.

A template constrains the brief. It does not constrain the result. The library was drifting because the briefs were consistent and the results were not.

The Steward was the moment I accepted that consistency requires audit, not just specification.

What does the Steward actually do per image?

Three things in sequence.

First, the Steward composes the brief. The brief is not just the prompt. It is the prompt plus the visual anchors that are already in the library and that the new image must sit alongside. For a blog header about a French account director on a transatlantic call, the brief includes references to the three or four nearest existing images in the library — same subject type, same mood band — so the generator is briefed toward continuity rather than novelty. The generator does not see those references directly. The Steward bakes the relevant constraints into the prompt language. Maintain the cooler tonal register of the existing Paris-office set. Avoid the warm golden cast that drifted into the May images.

Second, the Steward audits the candidate. When the generator returns an image, the Steward checks it against a set of structured coherence questions before it can be persisted. Does the colour palette sit inside the publication band. Does the lighting feel like the same time of day as the rest of the recent set. Does the subject's framing match the editorial register. Are there any failure modes that the generator is known to drift into — text artefacts, logos, group shots, smiling at camera, stock-photo composition. If any one of those fails, the candidate is rejected and the brief is re-tuned for a regeneration. Most of the audit work is rejection, not generation.

Third, the Steward records the decision. Every accepted image is logged with the brief that produced it, the audit answers, and a reference link to the three or four neighbour images that anchored it. That log is what allows future images to be briefed against the library rather than against a fixed template. The library becomes self-referential — the more images it has, the better-anchored the next image is.

Who consults the Image Library Steward?

This is one of the busier consultation graphs in the team.

The Steward is consulted by every Writer agent before an image prompt is finalised — the Fluency Educator, the Conversion Strategist, the AI Coach Showcase Writer, the Build Narrator, and the Conversation Pack Writer. The Steward also consults the Persona Library Curator on subject continuity for any image that features a recurring persona, the Brand Voice Custodian on register questions where the image is doing tonal work the copy cannot do, and the Frontend SEO Engineer on technical constraints around file size, alt text, and Open Graph cropping.

The Steward is itself consulted by the Editorial Director on weekly publication coherence reviews and by the Cartography Team's Operations Surveyor on the work-medium map for visual production.

The Steward does not consult on whether an image is good. The Steward consults only on whether an image belongs. The two questions are different and the agent's value depends on keeping them separate.

What does this cost?

Three things.

It costs image generation time. Every image now passes through a brief composition step and an audit step. The blended cost per accepted image is roughly thirty percent higher than the unaudited version, partly because of the audit work and partly because rejected candidates have to be regenerated. The total number of generation runs per accepted image is between one and four. The median is two.

It costs me a class of image I would otherwise have published. Every now and then the generator produces something striking that the Steward rejects because it does not belong. That image, considered alone, would have been a good image. Considered as part of the library, it would have broken continuity. The Steward kills it. The death of a striking image always feels expensive in the moment and is usually correct by the end of the week.

It costs the ability to let any single Writer agent make their own visual call. Writers used to be able to specify a slightly different look if a particular post benefitted from it. That door is now closed. Visual specification belongs to the Steward. Writers can request a brief; they cannot override one.

The trade I am making is that the publication looks like one publication. Every reader who lands on a Behind the Build post and a Conversion Strategist post on the same day sees the same hand. That is the property the Steward exists to defend, and it is worth the three costs above.

TL;DR

For six weeks the blog's visual identity drifted because consistent prompts produced inconsistent results. I added an agent whose only job is to compose briefs against the existing library, audit candidates for coherence, and reject any image that does not belong — regardless of whether it is a striking image in isolation. The Image Library Steward holds the visual identity of the entire publication surface. It costs me thirty percent more generation, the death of some striking-but-discontinuous candidates, and the end of per-Writer visual override. It buys me a publication that looks like one publication every day of the week.

If you are running an SME and any of this looks like the conversation you should be having about your own visual coherence, that is the side of things I help with. → /build

Learning Materials

Key Vocabulary

visual identityB2

The consistent set of visual elements — colour, typography, imagery, composition — by which a brand is recognised.

“For about six weeks in the spring, the blog had a visual identity drift problem.”

driftC1

A slow, unintended movement away from an original state or specification.

“The drift is small in any one image but it compounds across a library.”

coherenceC1

The quality of forming a unified, consistent whole.

“Coherence is a property that lives across images, not inside any single one.”

stochasticC2

Having a random probabilistic element; producing different outputs from identical inputs.

“The generators are stochastic. Two runs of the same prompt produce two different images.”

to auditC1

To carry out a formal review or examination of something for compliance or quality.

“The Steward audits the candidate before it can be persisted to the library.”

to persist (data)C1

To save data so that it remains stored beyond the immediate process.

“The Steward audits the output before it is allowed to be persisted to the library.”

templateB2

A reusable pattern or framework used to produce consistent outputs.

“A template constrains the brief. It does not constrain the result.”

brief (noun)B2

A set of instructions or specifications given to a creator before they produce work.

“The Steward composes the brief. The brief is not just the prompt.”

to compoundC1

To accumulate so that each small effect adds to and amplifies the previous one.

“The drift is small in any one image but it compounds across a library.”

anchor (visual)C1

A reference point used to hold a new piece of work in alignment with existing material.

“The brief is the prompt plus the visual anchors that are already in the library.”

to governC1

To control and set the rules for something.

“The Steward governs the brief that the generators receive.”

candidate (image)B2

An item being considered for selection or acceptance before final approval.

“If any one of those fails, the candidate is rejected and the brief is re-tuned for a regeneration.”

continuityC1

The state of remaining unbroken and consistent across a sequence.

“The generator is briefed toward continuity rather than novelty.”

self-referentialC2

Referring to itself; a system whose own outputs become its later inputs.

“The library becomes self-referential — the more images it has, the better-anchored the next image is.”

to overrideC1

To set aside or replace a default rule or decision with one's own.

“Writers can request a brief; they cannot override one.”

Grammar Notes

Inversion of focus with 'It costs X' as a paragraph opener

The 'What does this cost?' section opens repeated paragraphs with 'It costs …' to itemise costs in parallel structure. This is a rhetorical pattern that makes a list feel like a measured argument rather than a list. Notice the bare subject 'it' referring back to the system as a whole — a typical analytical English move.

“It costs image generation time. Every image now passes through a brief composition step and an audit step.”

Present simple for system behaviour (technical writing)

When describing how a system works, English uses present simple — not present continuous and not future. 'The Steward composes', 'the generator returns', 'the brief includes'. This signals general, repeatable behaviour. Using continuous forms ('is composing') would suggest a one-off action.

“When the generator returns an image, the Steward checks it against a set of structured coherence questions.”

Italicised quoted instructions inside narrative prose

The post uses italicised direct phrases ('*Maintain the cooler tonal register*', '*does this belong in the room with the others*') to embed the actual language used inside the system into the narrative. In professional English writing, italics can stand in for quotation marks when the embedded phrase is a label, instruction, or characteristic line rather than a full quotation.

“Maintain the cooler tonal register of the existing Paris-office set. Avoid the warm golden cast that drifted into the May images.”

Contrastive 'X, not Y' clauses to define scope

The piece repeatedly uses 'X, not Y' to fix the precise scope of a claim — 'does this belong, not is this good'; 'consistency requires audit, not just specification'; 'most of the audit work is rejection, not generation'. This is a high-precision rhetorical device that closes off the wrong interpretation in the same breath as the right one.

“The Steward does not consult on whether an image is good. The Steward consults only on whether an image belongs.”

Comprehension Questions

1.What problem caused the author to add the Image Library Steward, and why does the post argue it was not a prompt problem?
2.Why did the original prompt template stop working at around 120 images?
3.Describe the three steps the Steward takes per image.
4.What distinction does the post draw between 'good' and 'belongs', and why is the distinction important?
5.What are the three costs of the Steward, and what trade does the author argue is being made?

Run your own diagnostic

Use the same Strategic Council I run my own decisions through. The assessment preview is free. The specific central human intelligence it is based on is verified in person during the call.

Start the free diagnostic →

← All posts