Prompt Syntax for Stable Diffusion: FAQ

Updated: Jul 10, 2023

Answers to Frequently Asked Questions (FAQ) regarding Stable Diffusion Prompt Syntax

TLDR: 🧠 Learn how to use the prompt syntax to control image generation 📝 Control emphasis using parentheses and brackets, specify numerical weights, handle long prompts, and other FAQs 🌟

What is the purpose of using parentheses and brackets in Stable Diffusion prompts?

Parentheses and brackets are used to modify the model's attention or emphasis on specific words in the prompt. Parentheses "()" increase attention, while brackets "[]" decrease attention. The level of attention can be further adjusted using numerical weights or by nesting the parentheses or brackets.

Example Prompt: "I want to see a (beautiful) [scary] forest."

In this example, the model will give more attention to the word "beautiful" and less attention to the word "scary."

How can I specify a numerical weight for attention in Stable Diffusion?

You can specify a numerical weight for attention by using the syntax (word:weight). For example, (word:1.5) increases attention to the word by a factor of 1.5, while (word:0.25) decreases attention by a factor of 4 (1 / 0.25). This syntax only works with parentheses, not with brackets.

Example Prompt: "I want to see a (beautiful:1.5) (scary:0.5) [haunted:0.25] forest."

In this example, the model will give 50% more attention to the word "beautiful" and 50% less attention to the word "scary." Notice "haunted" weight of 0.25 is invalid because it's within brackets.

How do I use escape characters for literal parentheses and brackets in Stable Diffusion?

If you want to use literal parentheses or brackets in the prompt, you can use the backslash "\" to escape them. This allows you to include the characters "()", "[]" as part of the text without modifying the model's attention.

Example Prompt: "I want to see an anime character with a move called \(SuperPunch\)."

In this example, the backslashes are used to escape the parentheses, so the model will interpret "Super Punch" as part of the text without any change in attention.

How does Stable Diffusion handle long prompts that exceed the standard 75-token limit?

Stable Diffusion can handle prompts longer than 75 tokens by breaking the prompt into chunks of 75 tokens each. Each chunk is processed independently using CLIP's Transformers neural network, and the results are concatenated before being fed into the next component of Stable Diffusion, the Unet. The BREAK keyword (in uppercase) can be used to fill the current chunk with padding characters and start a new chunk.

Example Prompt: "Once upon a time, in a land far, far away, there was a [long prompt with more than 75 tokens]... BREAK And so, the story continues with another chunk of text."

In this example, the prompt exceeds 75 tokens and is broken into chunks. The BREAK keyword is used to start a new chunk after the first part of the prompt.

What is the effect of using multiple nested parentheses or brackets in Stable Diffusion?

Nesting multiple parentheses or brackets increases or decreases the model's attention to the enclosed words by a multiplier. Each additional set of parentheses increases attention by a factor of 1.1, while each additional set of brackets decreases attention by the same factor. The effect is cumulative.

Example Prompt: "I want to see a (((beautiful))) [[scary]] forest."

In this example, the model will give approximately 1.331 times more attention to the word "beautiful" (1.1 * 1.1 * 1.1) and approximately 0.751 times less attention to the word "scary" (1 / 1.1 / 1.1 / 1.1).

How does NAI's implementation of attention modifiers differ from the Stable Diffusion implementation?

NAI uses a similar implementation to Stable Diffusion, but with a few differences. NAI uses curly braces "{}" instead of parentheses "()" to increase attention, and the multiplier is 1.05 instead of 1.1. The conversion between the two implementations is as follows:

NAI's {word} is equivalent to Stable Diffusion's (word:1.05).
NAI's {{word}} is equivalent to Stable Diffusion's (word:1.1025).
NAI's [word] is equivalent to Stable Diffusion's (word:0.952) (0.952 = 1/1.05).
NAI's [[word]] is equivalent to Stable Diffusion's (word:0.907) (0.907 = 1/1.05/1.05).

Example Prompt (NAI): "I want to see a {beautiful} [scary] forest."

Example Prompt (Stable Diffusion): "I want to see a (beautiful:1.05) (scary:0.952) forest."

In these examples, the NAI prompt uses curly braces to increase attention to "beautiful" and brackets to decrease attention to "scary." The equivalent Stable Diffusion prompt uses numerical weights to achieve the same effect.