Creating Stunning AI Influencers: The Complete Stable Diffusion Guide for 1.5, SDXL, and FLUX

Creating Stunning AI Influencers: The Complete Stable Diffusion Guide for 1.5, SDXL, and FLUX


Welcome to the ultimate guide by Digital Divas on creating realistic and engaging AI influencers using Stable Diffusion. Whether you're new to AI or an experienced creator, this guide will help you craft incredible influencer content by mastering Stable Diffusion models 1.5, SDXL, and FLUX. We'll cover everything from prompting techniques and advanced workflows to SEO optimization to help your AI influencers stand out.

Why Prompting Matters for AI Influencers

Great AI influencers start with precise prompts. Whether you're creating a photorealistic influencer, anime-inspired personalities, or cinematic-style images, clear and detailed prompts are crucial.

Stable Diffusion Models: Choosing the Right One for AI Influencers

Stable Diffusion 1.5: Offers vast creative flexibility, excelling particularly with artistic styles like anime or conceptual visuals.

Stable Diffusion XL (SDXL): Best for ultra-high-quality, realistic influencer images thanks to its enhanced detail and resolution.

FLUX.1: Ideal for influencers requiring sophisticated details, readable text, and accurate anatomy through natural language prompts.

1. Understanding the Models and Their Differences

Before crafting prompts, it’s important to know how SD 1.5, SDXL, and FLUX differ under the hood. These differences affect how you should prompt them:

  • Stable Diffusion 1.5 – The classic model (512×512 native resolution) using a CLIP text encoder (ViT-L/14). It often needs more prompt engineering to avoid artifacts (like extra limbs) at higher resolutions. Community models based on 1.5 often use “tag-based” prompts (inspired by anime image tags) for better results. Negative prompts and weighting are frequently used to refine outputs.
  • Stable Diffusion XL (1.0) – A newer model (1024×1024 native) with two text encoders (OpenCLIP ViT-G and CLIP ViT-L) for richer prompt understanding. Out of the box, SDXL can produce higher detail and larger images with fewer artifacts. It interprets longer, more descriptive prompts better than 1.5. It also supports an optional Refiner stage for adding extra detail after initial generation. Negative prompts significantly improve SDXL’s coherence and quality.
  • FLUX.1 (dev & schnell) – An advanced open-source model by Black Forest Labs (not available in standard A1111 yet, but usable via ComfyUI or Forge). FLUX uses a dual encoding: CLIP-L (for keyword tags) and a T5-XXL language model (for natural language understanding) (CLIPTextEncodeFlux Node for ComfyUI Explained | ComfyUI Wiki). This means it can understand complex instructions in plain English exceptionally well. FLUX.1 delivers state-of-the-art prompt adherence and is great at things previous SD models struggle with, like rendering legible text and correct anatomy. It’s also more forgiving – even simple prompts by beginners can yield impressive results.

Key Differences Summary:

Model Text Encoder(s) Prompt Style Supports Prompt Weights? Strengths Weaknesses / Notes
SD 1.5 CLIP (ViT-L/14, 75-token limit) Often tag-based or short phrases (e.g. masterpiece, best quality, 1girl, ...). Can also use descriptive sentences. Yes (with (), [], or :n syntax in A1111) Huge variety of custom models (anime, photorealistic, etc.). Good with prompt weighting and negative prompts to fine-tune results. Struggles with text (signs, letters). Needs more negative prompts to avoid artifacts. Native 512px can show deformities at higher res.
SDXL 1.0 2× CLIP (OpenCLIP ViT-G + CLIP ViT-L) Best with a descriptive sentence + style tags (e.g. “A portrait of a warrior princess in a forest, intricate detail, 8k, photorealistic”). Handles longer prompts well. Yes (same syntax as SD1.5). Also allows using two prompts (one per encoder) in some UIs. Higher native resolution (1024px), better detail & prompt recognition. Dual encoders can improve fidelity. Refiner model can enhance details after initial image. Still not great at text (signs/logos). Requires more VRAM. Benefits from negative prompts for best results. Limited fine-tuned models compared to 1.5 (as it’s newer).
FLUX.1 CLIP-L + T5-XXL (dual encoding) Best with natural language prompts (as if explaining to a human). e.g. “A futuristic city skyline at sunset, with neon signs reflecting on wet streets.” Avoid complex weight syntax (not needed). Not in the same way. (FLUX doesn’t use () weights ([FLUX.1 Prompt Guide: Pro Tips and Common Mistakes to Avoid getimg.ai Blog](https://getimg.ai/blog/flux-1-prompt-guide-pro-tips-and-common-mistakes-to-avoid#:~:text=For%20example%2C%20,part%20of%20the%20prompt)); use phrasing like “with emphasis on…” instead.) Excellent prompt adherence; often no need for heavy prompt “hacks.” Renders legible text and complex compositions well. Great anatomy (hands, faces) out of the box. Good results with simple prompts.

Tip: If you’re deciding which model to use for a task, consider the subject:

  • Need a photorealistic photo with accurate details or readable text? SDXL or FLUX might be better than base 1.5 (FLUX is particularly good with text).
  • Want a specific art style (anime, comic, concept art)? Many community 1.5 model merges or LoRAs exist for style – sometimes a fine-tuned 1.5 model like “AnythingV5” (anime) will outperform SDXL for that style.
  • For beginners, SDXL and FLUX are more forgiving with simple language prompting, whereas SD1.5 often needs more careful prompt tuning. For advanced users, all models can shine once you adapt your prompting to their quirks.

Next, we’ll cover how to craft prompts (positive and negative) effectively for each model, and then we’ll get into using the interfaces (AUTOMATIC1111 vs ComfyUI) with step-by-step instructions.

2. Crafting Effective Prompts for Each Model

No matter the model, a prompt is typically divided into a positive prompt (what you want to see) and a negative prompt (what you don’t want). Let’s break down best practices for each:

2.1 Positive Prompt Strategies

General Prompt Structure: Most prompts can be thought of as combining a description of the subject with details about style/appearance. A common formula is:

[Subject/scene] with [specific details], [style or medium], [lighting], [quality settings]

For example, “A wizard standing on a misty cliff, holding a glowing staff, digital painting, cinematic lighting, 4K detail”. Let’s see how to adjust prompt style per model:

Stable Diffusion 1.5 (and its derivatives): Originally, SD1.5 was trained on image captions, but the community found that using terse tags often works better (especially for anime or art styles). For instance, an anime prompt might be: “1girl, blue hair, looking at viewer, masterpiece, best quality, UHD. Here the first part lists subject details (1 girl, hair color, pose) and the latter part lists quality or style tags. Grammatically correct sentences are not necessary – you can string keywords separated by commas. However, for photorealistic outputs you might use more natural phrases like “portrait photo of a woman with soft lighting, 50mm lens, film grain”. Experiment with both styles (tag lists vs. descriptive phrases) depending on your model checkpoint (many 1.5-based checkpoints like AnythingV4 or DreamShaper expect tag-style prompts).

    • Emphasis (weights): SD1.5 supports increasing a token’s importance by wrapping it with parentheses () or appending a weight like :1.3. E.g. beautiful face (smiling:1.4) or beautiful face ((smiling)) – both tell 1.5 to prioritize “smiling” more. De-emphasis can be done with square brackets [] or weights below 1, e.g. (background:0.5). Use these to tune tricky aspects of the image. Note: Don’t overuse weighting – often a single set of parentheses is enough to hint importance.
    • Style Tokens: Leverage known artist names or styles for dramatic effect (e.g. “in the style of H.R. Giger”, “trending on ArtStation”, “Studio Ghibli style”). SD1.5 models know many famous styles. Keep in mind that some custom 1.5 models have their own style trigger words (check model info on sites like CivitAI for any special tokens).

      Stable Diffusion XL: SDXL was trained with two text encoders and has a larger prompt capacity (up to 2048 tokens internally, though practical UI limits may be lower). It responds best to descriptive prompts written in natural language, followed by style tags. For example: An astronaut walking on a distant planet, detailed clouds of dust in the air, realistic lighting, 8K photograph, fisheye lens.” This mixes a sentence describing the scene with some style keywords. Another example format: A [medium/style] of [subject] [doing something], [additional descriptors]. (tags...). The community often suggests a structure like: “(Style/medium) of (subject) (action/detail). Tags.”. For instance: “Anime screencap of a woman with blue eyes wearing a tank top sitting in a bar. Studio Ghibli, masterpiece, pixiv, official art.” – notice the sentence followed by comma-separated style tags.

      • Using Dual Prompts (Advanced): Because SDXL has two encoders, some interfaces (ComfyUI, Diffusers) let you feed a different prompt into each. One could be a main scene prompt, and the second a refinement or style prompt. For example, Prompt 1: “A serene lake at sunrise, mountains in the distance” and Prompt 2: “photorealistic, ultra-detailed, golden hour lighting”. The output will try to satisfy both. In practice, if using AUTOMATIC1111 without special extensions, you can usually just write one prompt and SDXL will use it for both encoders. But in ComfyUI, you might see two text nodes for SDXL (one for each encoder). Tip: If supported, experiment with giving a shorter style-focused text to the second encoder; some users report one encoder tends to dominate (often the second one).
      • Emphasis: SDXL supports the same () weighting syntax as 1.5. Use it similarly for key elements. Because SDXL already tends to follow prompts well, you might not need as much weighting, but it can still help (e.g., emphasizing “high detail” or a specific object if it’s being ignored).

        FLUX.1: The guiding principle with FLUX is to “write as if you’re talking to a human artist.” It parses language in a more sophisticated way, thanks to the T5 transformer. So, you can be quite conversational or specific. For example: Photorealistic close-up portrait of a medieval knight, intricate engravings on armor, background is a stormy battlefield.” You could even write this as multiple sentences or a run-on description – FLUX is robust to that. It doesn’t require the “telegraphic” style of comma-separated tags (though you can still provide short phrases if you want; FLUX will handle it). In fact, FLUX excels with detailed, precise descriptions. Feel free to add little details that CLIP models might ignore.

        • No Parenthesis Weights: FLUX (dev and schnell) does not support the () weight syntax in most common UIs (FLUX.1 Prompt Guide: Pro Tips and Common Mistakes to Avoid | getimg.ai Blog). If you include ++ or -- or () in a prompt, it will likely just confuse the T5 encoder or be ignored. Instead, if you want to emphasize something, just say it clearly, possibly with phrases like “with a focus on X” or “emphasizing X” (FLUX.1 Prompt Guide: Pro Tips and Common Mistakes to Avoid | getimg.ai Blog). For example, instead of a garden with a single rose (highly detailed), you might say “a garden with a single rose, with a strong emphasis on fine details of the rose. This approach guides FLUX via language rather than syntax.
        • Dual Nature of Prompt: Under the hood, FLUX splits your prompt into two parts for its two models – one part is treated like CLIP keywords, the other like a T5 sentence. In ComfyUI, the CLIPTextEncodeFlux node actually has two inputs: clip_l (for “list” of tags) and t5xxl (for the sentence) (CLIPTextEncodeFlux Node for ComfyUI Explained | ComfyUI Wiki). Most users can simply feed the same prompt into both for convenience (CLIPTextEncodeFlux Node for ComfyUI Explained | ComfyUI Wiki), but advanced prompting can involve giving a short list of keywords to CLIP and a detailed sentence to T5 (more on how to do this in ComfyUI later). As a beginner, you can ignore this complexity and just write a normal sentence – FLUX will internally make use of both encoders automatically.
        • Style: Because FLUX is so capable, you can mix styles in a very natural way. e.g. “A watercolor painting of an old ship at sea, but with photorealistic waves and dramatic, cinematic lighting.” FLUX can blend the concept of “watercolor” with “photorealistic” elements more deftly. Describe any artistic style, medium, or camera technique in words (it knows a lot of art and photography concepts).

Examples (Positive Prompt for each model):

  • SD 1.5 Example: portrait of a cyberpunk cityscape, neon signs, rain-soaked streets, **blade runner style**, **high detail**, **artstation**
  • SDXL Example: A cyberpunk city at night with rain-soaked streets and neon signs. **Photorealistic**, **ray tracing reflections**, **ultra detailed**, **4K**.
  • FLUX Example: A rainy night in a bustling cyberpunk city, neon signs reflecting on wet pavement. **Hyper-detailed** art style, with a dark, cinematic atmosphere and vibrant colors.

All three describe a similar scene, but notice the subtle differences in phrasing:

  • The SD1.5 version relies more on style tags (“blade runner style, artstation”) to convey look.
  • The SDXL version is a full sentence followed by some descriptive tags (“Photorealistic, ray tracing reflections…”).
  • The FLUX version reads like a rich description with adjectives woven in (“hyper-detailed art style, dark cinematic atmosphere…”).

2.2 Negative Prompt Strategies

Negative prompts help you tell the model what to avoid. They’re especially useful for removing common artifacts or undesired styles.

  • Why use negative prompts? Stable Diffusion models sometimes add unwanted elements (extra limbs, text artifacts, bad anatomy) or default to certain styles. By listing things you don’t want, you push the model away from those features via classifier-free guidance.

Here are best practices by model:

  • SD 1.5: Negative prompts are quite important to clean up outputs. A classic negative prompt for 1.5 (especially with realistic or character images) might include: low quality, ugly, blurry, extra limbs, extra fingers, mutated hands, deformed face, watermark, text, watermark, out of frame. You can chain many terms. For example, Emad (StabilityAI) once suggested: “ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, mutation, mutated, extra limbs, extra legs, extra arms, disfigured, deformed, cross-eye, body out of frame, blurry, bad anatomy, bad proportions, cloned face, watermark, grainy”. You don’t always need that entire list, but it gives an idea. Remove things that are irrelevant too – e.g., if you see the model adding text or borders, put text, logo, watermark in negatives.

    • Use the same weighting syntax if needed: e.g. EasyNegative (a popular negative embedding) or writing (worst quality:1.4) in the negative prompt to strongly emphasize it.
    • SD1.5 tends to require more negatives to get a clean result compared to newer models. Many user-shared prompts for 1.5 have a huge negative prompt; you can start with a general one and refine over time.
  • SDXL: Negative prompting is even more crucial to get the best out of SDXL. Without any negatives, SDXL might sometimes give incoherent or oddly composed results. With a good negative prompt, SDXL’s output improves noticeably. The negative terms can be similar to 1.5’s list (since many common artifacts are the same). For example: ugly, low quality, extra limbs, out of frame, text, watermark, deformed, cartoon, 3d, sketch. (We added cartoon, 3d, sketch if we specifically want photorealism and the model might otherwise mix in an illustrated look.)

    • Because SDXL is better at understanding nuance, you might not need as many repetitive terms. Sometimes a concise negative like low quality, oversaturated, cartoonish suffices for a landscape, whereas a portrait might need the full “bad anatomy” list.
    • One difference: SDXL’s training included a lot of variety, so if you want to avoid a certain style, explicitly negate it. E.g. illustration, painting in negative if you want only photo style.
    • You can also use two negative prompts (one per text encoder) in advanced setups (similar to dual positive prompts). In practice, a single combined negative works fine in most UIs.
  • FLUX: Interestingly, FLUX.1 is less reliant on negative prompts for basic quality – it often produces good results with none. The developers note it’s “not very demanding regarding prompting” (FLUX.1 Prompt Guide: Pro Tips and Common Mistakes to Avoid | getimg.ai Blog), meaning even simple prompts come out well. That said, you can and should use negatives to refine as needed. For example, if you get unwanted text or a style, just say text, letters or cartoon, 3d in negative. FLUX can interpret some higher-level negatives too; e.g. no text, no watermark should remove text elements.

    • Avoid negatives that confuse the T5. For instance, earlier versions of FLUX had a bug where the phrase “white background” in positive prompt caused blur. If you wanted a plain white background, the workaround was to instead prompt “plain background” or just not mention background. In negative, if you said white background, it might also impact things weirdly. In general, keep negatives to simple undesirable nouns or adjectives.
    • FLUX’s strength is understanding context, so you can sometimes be more specific in negative. For example: “text, watermark, no blur, no distortion – FLUX might actually understand “no blur” as avoiding blur. Standard SD models tend to ignore words like “no” in prompts, but FLUX’s T5 could grasp it. (Keep this as a tip to experiment with – results can vary.)

Negative Prompt Example: For a portrait on SDXL, you might use:

Negative prompt: ugly, poorly drawn face, extra fingers, text, watermark, out of frame, low contrast, boring, duplicated

This tells the model to avoid those pitfalls. You can reuse a well-crafted negative prompt across many jobs (some users even save their favorite negative prompt as a text file or use pretrained negative embeddings like “EasyNegative”).

Important: Don’t include things in the positive prompt that you absolutely don’t want, thinking “the model will know I mean the opposite.” Models don’t understand negation in the positive prompt well. For instance, prompting “a photo of a person without glasses” might actually produce a person with glasses (because it hears “person” and “glasses”). Instead, prompt “a photo of a person” and put glasses in the negative prompt. Always move undesired elements to the negative side explicitly.

2.3 Tips for Specific Styles

Now, some quick tips for achieving popular styles or subjects, and how each model handles them:

  • Photorealistic Photography: Use terms like “photorealistic, 35mm photograph, DSLR, film grain, high dynamic range, RAW”. Including camera jargon helps (e.g., bokeh, f/1.8 aperture, shutter speed 1/500 for action shots). SDXL and FLUX particularly excel at photorealism due to their detail and resolution. For SD1.5, consider using a photorealistic-focused model (like RealisticVision or analog-diffusion) or LoRAs trained for realism. Lighting keywords are key: “golden hour sunlight”, “soft studio lighting”, “cinematic backlight”. For example, “portrait of a man, photorealistic, detailed skin texture, soft diffused lighting, shallow depth of field, bokeh”.

    • FLUX note: FLUX can produce extremely detailed photorealism. If anything, you might need to dial back CFG or detail to avoid an uncanny level of detail on faces. It can handle complex lighting descriptions easily (e.g. “the scene is illuminated by neon blue and pink lights from the signs”).
    • Example Output Comparison: Below, the prompt “a neon sign saying 'WELCOME TO THE FUTURE'” was given to several models. Notice how FLUX (left) nails the legible neon text, whereas SDXL and SD1.5 struggle to spell correctly (middle and right images).

    (FLUX.1 vs Stable Diffusion: AI Text to Image Models Comparison | getimg.ai Blog) FLUX.1 output – clear, accurate neon text

    (FLUX.1 vs Stable Diffusion: AI Text to Image Models Comparison | getimg.ai Blog) SDXL output – letters are jumbled (text is hard for SDXL)

    (FLUX.1 vs Stable Diffusion: AI Text to Image Models Comparison | getimg.ai Blog) SD 1.5 output – nearly illegible text (common for SD1.x)

    Figure: FLUX vs SDXL vs SD1.5 on rendering text in a photorealistic neon sign. FLUX clearly produces the intended text.

  • Anime/Manga Style: Many SD1.5 models are tuned for anime art (e.g. Anything, NovelAI, etc.). These typically respond well to Danbooru-style tags: e.g., 1girl, long hair, smile, school uniform, outdoors, cherry blossoms, anime art, pixiv, highly detailed. For anime, you often include tags for “masterpiece, best quality” (these came from NovelAI’s training) to boost quality, though their effect is debated. SDXL can do anime style too, but there are fewer anime-trained SDXL models as of now. If using base SDXL, try prompting like “anime style illustration of ...” and include anime-related terms. FLUX was trained more on photorealistic and general imagery, but it can mimic anime if you describe it (e.g. “a character in anime style, with flat shading and bold outlines”). Still, a dedicated anime model (SD1.5-based) is your best bet for authentic look.

    • LoRAs for style: LoRAs (Low-Rank Adaptations) are small add-on models that can impart a style or character. For anime, countless LoRAs exist (for specific artists’ styles, or particular character looks). In AUTOMATIC1111, you can apply a LoRA by adding a prompt like <lora:NameOfLoRA:0.8> in your prompt (0.8 being the strength). In ComfyUI, you’d use a “Load LoRA” node connected to the UNet. We’ll detail this in the UI sections later. LoRAs trained on SD1.5 generally only work on SD1.5 or its merges; LoRAs for SDXL are separate. FLUX being new means fewer LoRAs, but it is reported to handle LoRAs well (and even more consistently than SDXL does).
    • Example Prompt (SD1.5 anime model): 1girl, solo, full body, dynamic pose, hair fluttering, scenic background, (detailed eyes:1.3), anime illustration, trending on pixiv, vibrant colors
    • Example Prompt (SDXL anime attempt): An anime-style illustration of a girl in a dynamic pose, hair fluttering, detailed background. high detail, vibrant colors, clean lines, Studio Ghibli art style. – (if using an SDXL anime model checkpoint, you could also revert to tags).
  • Fantasy Art / Concept Art: Use imaginative descriptors and artist names. E.g., “A dragon circling a towering castle, epic fantasy art, artstation, by Greg Rutkowski, volumetric lighting, highly detailed, dramatic shadows.” SD1.5 models like Epic Diffusion or Deliberate are great for general fantasy scenes. SDXL can produce fantastic detail for fantasy as well, just be descriptive. FLUX will follow a narrative prompt nicely if you describe the scene like a storyteller: “An epic scene of a dragon circling a towering castle, under a stormy night sky with lightning. The style is like a digital painting, with dramatic, high-contrast lighting.” You might even split into sentences: FLUX won’t mind.

    • Weights for composition: If the castle is crucial, you might do castle (castle:1.2) in SD1.5. In FLUX, emphasize via wording: “with focus on the towering castle in the center”.
    • Control composition: Consider using ControlNet with depth or pose if you want to ensure a particular layout (more on ControlNet in section 5).
  • Cinematic / Lighting-focused: To get that cinematic look, include terms like “cinematic lighting, film still, ultra realistic, 4k UHD, intricate shadows, dramatic lighting. Using movie references or directors can help (e.g. “Gregory Crewdson photography style” for dramatic suburb scenes, or “cinematography by Roger Deakins” for certain lighting). Also, aspect ratio matters for cinematic shots (e.g., 16:9 wide). You can set a custom resolution (like 768×432, etc.) or use the “Aspect Ratio” setting in the UI if available.

    • SDXL’s higher resolution helps with cinematic wide shots. FLUX will understand even subtle things like “teal and orange cinematic color grading” as it has an LLM component.
    • If you specifically want lens flare or bokeh, say so. “Cinematic shot, anamorphic lens flare, bokeh in the background” – these models know those terms.
    • Example: “A lone figure standing under a streetlamp in the rain, at night. Cinematic composition, film noir lighting, high contrast, long shadows, 8K detail.” – This prompt would work well on SDXL or FLUX to get a moody, filmic image. SD1.5 can do it too with the right model (maybe add artist tags like “Greg Rutkowski” for fantasy cinematic or “photorealistic” for noir).

Table: Quick Style Keywords

Style Useful Keywords (add to positive prompt)
Photorealism photorealistic, DSLR, 35mm, realistic, high detail, 8k, ultra high res, sharp focus, bokeh
Anime masterpiece, best quality, anime illustration, clean line art, flat colors, 2D, character design(plus specific tags for features: e.g. blue hair, school uniform, etc.)
Fantasy Art digital painting, concept art, epic, fantasy, highly detailed, trending on ArtStation, matte painting, dramatic (and artist names like Greg Rutkowski, John Avon, etc.)
Cinematic cinematic, film still, dramatic lighting, volumetric light, fog, depth of field, motion blur, color graded, 4k
Comic/Cartoon comic book style, ink outline, cel shading, pop art, Pixar style, Disney, cartoon, 2D
Vintage Photo black and white, vintage, 35mm film, grainy, sepia tone, 1920s, Polaroid, overexposed edges

Use these as inspiration and mix/match. Remember to also adjust your negative prompt if you are aiming for a specific style (e.g., if you want pure anime 2D look, put photorealistic in negative; if you want realistic, put drawing, illustration in negative to avoid cartoonish outputs).

3. Using AUTOMATIC1111 Web UI (Stable Diffusion WebUI)

Now that we’ve covered what to write in prompts, let’s go through how to use the interfaces to generate images. We’ll start with AUTOMATIC1111’s Web UI (often just called “A1111”), since it’s very popular and user-friendly, and then cover ComfyUI in the next section.

Assumption: You have AUTOMATIC1111 WebUI installed with access to the models (SD1.5, SDXL, etc.). If not, follow a Stable Diffusion WebUI installation guide first. Make sure to place your model .ckpt or .safetensors files in the models/Stable-diffusion directory and restart the UI so they appear.

3.1 Loading Models in A1111

Step 1: Launch the Web UI. You’ll see the interface with a text area for prompt, one for negative prompt, and options below.

Step 2: Select your model checkpoint. In the top left, there’s a drop-down (it might show “Stable Diffusion v1.5” or another model’s name). Click it and choose the model you want:

  • For SD 1.5, select the v1-5-pruned-emaonly.ckpt (or any custom 1.5-based model you’ve added).
  • For SDXL, select the sdxl_base_1.0.safetensors (and if you have the SDXL Refiner, there might be an option to apply it after, or use an extension).
    • Note: As of writing, base SDXL is supported. The Refiner can be used via an extension or script in A1111 (“SDXL Refinery” script) – if installed, you’d generate with base then refine. Alternatively, you might have a merged checkpoint that includes the refiner (some merges exist) – consult documentation if so.
  • FLUX in A1111? FLUX is not natively supported in standard A1111 because of its dual encoder architecture. However, there are forks like Stable Diffusion “Forge” that do support FLUX with an A1111-like interface. If you’re using Forge, loading FLUX is similar (select the flux model file). In pure A1111, you currently cannot load FLUX. Instead, use ComfyUI (Section 4) or a colab/other UI for FLUX. For the purpose of this guide, we assume you’ll use ComfyUI for FLUX.

Step 3: VAE (if needed). Some models (especially SD1.5 custom ones) require a VAE (Variational Autoencoder) for color fidelity. A1111 might auto-load a default one. If your outputs have strange colors or contrast, ensure the correct VAE is loaded (in Settings > Stable Diffusion > SD VAE or via the UI’s bottom drop-down if visible). SDXL has its own VAE built-in, so usually no action needed there.

3.2 Structuring Prompts and Settings in A1111

Now, type your positive prompt in the text area at the top, and your negative prompt in the bottom text area. Refer to Section 2 for content guidance. Let’s go over key settings:

  • Sampling method (sampler): This is the algorithm used to generate the image. Common ones:
    • Euler a (ancestral) – often good for quick experimentation, can produce varied results.
    • DPM++ 2M Karras – a very popular sampler for stable results; good detail.
    • DPM++ SDE – also great for quality, sometimes smoother.
    • UniPC – a newer sampler that can give good results fast.
    • For most purposes, DPM++ 2M Karras at 20-30 steps is a solid choice for SD1.5 and SDXL. FLUX often also does well with DPM++ or UniPC. Tip: When in doubt, start with the default that A1111 picks (which might be Euler A or whatever you last used), then try others if you want to fine-tune. Some samplers excel at certain looks; e.g. Euler a might introduce more chaos (which could be good for abstract art), whereas DPM++ is more refined.
  • Sampling Steps: This controls how many denoising steps the process runs. More steps can mean more refined images up to a point, but with diminishing returns (and slower speed).
    • SD1.5 usually looks good by ~20-30 steps; going beyond 50 often yields little improvement unless using a tricky prompt.
    • SDXL can also do well around 30 steps. It may benefit from slightly more in some cases (40-50) due to complexity, but again it depends.
    • FLUX in dev variant might reach good quality in ~30 steps too. If using the “schnell” (fast) variant, it’s optimized for speed so you might only need ~20.
    • Tip: A quick way to see if more steps help is to generate the same prompt with 20, 40, 60 steps and see if you notice improvement. Often, going from 20 to 30 helps, but 60 vs 30 might look almost the same for many models.
  • CFG Scale (Classifier-Free Guidance): This is very important. CFG controls how strongly the prompt (positive and negative) guides the image.
    • A lower CFG (e.g. 4 or 5) means the model has more freedom (it might take more artistic liberty and sometimes deviate from your prompt slightly, but could produce more “natural” looking results).
    • A higher CFG (e.g. 12 or 15) forces the model to stick to the prompt very closely, which can sometimes cause over-saturation or odd artifacts if too high.
    • Recommended ranges: For SD1.5, typically 7 to 12 is used. 9 is a common default. For SDXL, many users found 5 to 9 to work well (SDXL tends to get too contrasty above 9). For FLUX, the default in many workflows is around 7 (and they sometimes offer a “guidance” parameter separately for T5 vs CLIP, but in A1111/Forge it would be one CFG).
    • If an image looks too literal or overcooked, try lowering CFG. If it’s way off from the prompt, raise CFG.
  • Resolution: Decide your width and height.
    • For SD1.5, staying around 512×512 or 512×768, etc., is good for initial generations (since 1.5 was trained on 512). You can use the “Highres fix” in A1111 to then upscale with additional steps – this generates a smaller image and then upscales it, helping preserve composition (found in the pop-up options when you enable “Hires. fix”).
    • SDXL can handle 1024×1024 natively, so you can try that for square images. It’s generally better at landscapes (like 1024×576). But be careful with VRAM; larger sizes use more memory.
    • FLUX was trained on high-res as well (it can do 1024×1024 or more). If you have the VRAM, you could attempt 1024px or even larger. Start with 512 or 768 to get a quick idea, then go higher if needed.
    • Note: Always keep the aspect ratio relevant to your subject. For a single character full-body, a taller aspect (e.g. 512×768) is better. For landscapes, wider (e.g. 768×512) is better.
  • Batch Count / Size: If you want multiple images, you can set Batch count to e.g. 4, which will generate 4 images in one go (each with a different random seed). This is great for seeing variations. Keep Batch size at 1 (A1111 uses batch size differently – generally leave it at 1 unless you know you need it).
  • Seed: The seed determines the random starting noise. If you get a result you like and want to tweak the prompt slightly and regenerate the same composition, use the same seed. If you want a new random result, set seed to -1 (which gives a random new seed each time) or enter any random number to start from.

Step 4: Generate the image. Click the Generate button. Wait for the process to complete and your image will appear.

  • If the result is not what you hoped, adjust the prompt or settings and try again. Often a bit of iteration is needed. You can use the preview thumbnail or the “send to image-to-image” feature for further refining (like upscaling or inpainting).
  • Keep an eye on the console/log (if running locally) for any warnings – e.g. if the prompt is too long and got truncated (A1111 will truncate prompts beyond the model’s token limit, usually 75 tokens for SD1.5, longer for SDXL).

3.3 Using Negative Embeddings (Optional Advanced)

A1111 allows using textual inversion embeddings (tiny .pt or .bin files that represent a concept or style) in prompts. For example, a popular negative embedding is “EasyNegative” which, when put in the negative prompt, can improve general quality of portraits. If you have such an embedding (usually you’d download it and place in embeddings folder), you just type its name in the negative prompt. E.g. negative prompt: EasyNegative, bad-hands-5, (low quality:1.3). The embedding names act like special tokens.

Likewise, there are positive embeddings to invoke styles or specific people. Use these carefully and ensure they match the model version (most embeddings are for SD1.5, they might not work well in SDXL). FLUX likely doesn’t support textual inversion (since its text encoder is different).

3.4 Applying LoRAs in A1111

If you want to use a LoRA model to apply a style or character, make sure the LoRA file (.safetensors) is placed in models/Lora. Then:

  • In your prompt, add the syntax: <lora:filename:weight>. For example, if the LoRA file is artstyle.safetensors, you might write <lora:artstyle:0.8> in the prompt. This will mix that LoRA’s learned style at 80% strength.
  • Some UIs have a separate tab or widget to add LoRAs without writing the tag manually (the “Extra networks” or the little 📄 icon in the prompt box of A1111 opens a LoRA picker).
  • Use weight 0.6 to 1.0 typically. Too high (like 1.2) can distort output.
  • Remember: LoRAs are usually version-specific. A LoRA trained on SD1.5 won’t look right on SDXL images (if it even works at all). Use SDXL-specific LoRAs for SDXL (they exist but are newer). FLUX might accept SD1.5 LoRAs if the underlying architecture is similar? (This is unconfirmed; likely not directly, unless someone made a LoRA specifically for FLUX or a conversion.)

3.5 ControlNet in A1111

ControlNet is an extension that lets you guide image generation with an input like a pose skeleton, sketch, depth map, or other conditions. For example, you can draw a rough pose stick figure and have SD generate a character in that exact pose.

To use ControlNet (after installing the extension):

  • Go to the ControlNet panel (usually below the Generate button, once enabled).
  • Check “Enable” and choose a preprocessor and model. For instance, to use a hand-drawn sketch as a map, you might select canny preprocessor and control_canny-fp16 [model].
  • Upload your reference image (e.g., a sketch or depth map) in the ControlNet image slot.
  • Adjust the Control Weight (how strongly to follow the control). Start with 1.0.
  • Now, when you hit Generate, the model will consider both your text prompt and the ControlNet condition.
  • SDXL and ControlNet: Make sure to use SDXL-specific ControlNet models when using SDXL (they’re often named with “sdxl” in them). SD1.5 ControlNets won’t work correctly on SDXL. As of SDXL’s release, some ControlNets like Canny, Depth, OpenPose have SDXL version (ControlNet with Stable Diffusion XL - Hugging Face)】.
  • FLUX and ControlNet: Not natively supported in A1111 (since FLUX isn’t there), but in ComfyUI it is possible via special nodes (Section 4).

ControlNet is extremely powerful for achieving specific compositions, matching a reference, or doing things like inpainting (filling in part of an image), outpainting (expanding an image), etc. Covering all of ControlNet is beyond our scope, but many online guides exist. For now, remember it’s a tool to add when you need more control than prompts alone can offer.

3.6 High-Res Fix / Upscaling

If you want a larger final image or more detail:

  • Highres. fix (in txt2img): This A1111 option will generate a low-res image (e.g. 512×512), then upscales it (to e.g. 1024×1024) and performs additional diffusion passes to add detail. It’s great for SD1.5 to get large outputs without distortions. For SDXL, you may not need it for 1024×1024, but for going beyond 1024 it helps.
  • To use it, check “Hires. fix”. Set a Upscale by factor (2 means double size) or directly set a target width/height. Set a Hires steps (often ~15-20) and pick an upscaler (Latent upscalers do it in latent space; you can also use ESRGAN or SwinIR type upscalers after generation).
  • Alternatively, use the Extras tab after generation to upscale with a chosen algorithm (this won’t add new details from the model, just resize).
  • SDXL Refiner: If you have the SDXL refiner model, you can also upscale and refine in one go. There’s usually a script or separate interface to apply the refiner. You would generate with the base model, then the refiner uses the base’s output + prompt to enhance details (especially faces, textures】. Keep the refiner steps lower (like 10-15) or the image can become too mess】.
  • FLUX doesn’t have a refiner model per se, but it has variants like FLUX [dev] vs [schnell]. [dev] is slower but highest quality (akin to a refined image), [schnell] is fast and slightly lower quality. You can generate with [schnell] for speed and switch to [dev] for the final hi-res version if desired.

By now, you should be comfortable using A1111 to generate images with each model. Next, we’ll explore ComfyUI, which might seem complex at first but offers powerful control – especially useful for things like SDXL’s dual prompts and FLUX.

4. Using ComfyUI for Advanced Workflows (and FLUX)

ComfyUI is a node-based interface for Stable Diffusion. It’s extremely flexible: you build a graph of nodes for your pipeline. This makes it perfect for advanced models like SDXL (with dual encoders and refiners) and FLUX (with its custom text encoder), and for doing things like model merges or complex ControlNet setups.

If you’re new to ComfyUI, the interface will look like a flowchart editor. You add nodes (each node might do something like “Load Checkpoint”, “Text Encode Prompt”, “Sampler Step”) and connect them. Thankfully, many community workflow files exist that you can load and just edit the prompts.

4.1 Setting Up and Loading a Model in ComfyUI

Step 1: Open ComfyUI. You’ll see an empty workspace (grid background). Typically, ComfyUI might come with an example workflow or you can load one.

Step 2: Load a workflow for the model you want. The easiest way to start is to use a pre-made workflow:

  • For SD1.5 or SDXL, you can find basic text2img workflows on the ComfyUI wiki or community. For example, a simple SDXL workflow will have nodes for SDXL base and refiner.
  • For FLUX.1, because it’s unique, find a FLUX-specific workflow. The Stable Diffusion Art site provides a FLUX workflow fil】. Also, ComfyUI_examples on GitHub has some flux example】.
  • To load a workflow: drag-and-drop the .json or .canvas file into the ComfyUI window. Or use the Load button (folder icon) if available.
  • Once loaded, you’ll see many nodes connected in a graph.

Step 3: Identify key nodes to interact with:

  • Checkpoint Loader / Model Loader node: This node holds the model file. It might be named like “Load Checkpoint” for SD models or a custom loader for FLUX (since FLUX might need loading two parts). Click it and ensure it’s pointing to the correct model file (there may be a dropdown or text path). If the workflow came from someone else, you might have to select your local model. E.g., select sdxl_base_1.0 for base, and another node for sdxl_refiner_1.0 if present. For FLUX, the workflow could have one combined checkpoint node for the FP8 model, or separate UNet and text encoders nodes.
  • Text Encode nodes: These take your prompt text and produce an embedding. For SD1.5, a single CLIPTextEncode node is used for the positive prompt (and one for negative). For SDXL, there might be two CLIPTextEncode nodes (one might be CLIPTextEncode for the OpenCLIP and another for the secondary). Or a special SDXL text encoder node that outputs two embeddings. For FLUX, you’ll see a CLIPTextEncodeFlux nod (CLIPTextEncodeFlux Node for ComfyUI Explained | ComfyUI Wiki)】 which has two text inputs (one for clip_l tags, one for t5xxl sentence). Often, this node might be preceded by a Combine Text or just a Note telling you “enter prompt here”.
  • Sampler / KSampler node: This is akin to the part that does the diffusion steps, combining model + conditioning. It will have inputs for noise, model, conditioning (from text) and outputs the image.
  • VAE Decode node: At the end, to get an image, the latent output is passed through a decoder. Make sure a VAE is loaded (usually the Checkpoint loader automatically provides VAE, but in SDXL workflows sometimes a separate VAE node is used).
  • Image Save/Preview node: Comfy might have a node to preview or save the output. Ensure it’s connected.

Step 4: Enter your prompt in ComfyUI:

  • Many workflows have a special node for conveniently editing the prompt. For example, a “conditioning” node that wraps the text encoder. It might have a text area in its UI panel where you can type the positive prompt and negative prompt. In others, you actually click the Text Encode node and type the prompt there.
  • For SDXL workflows, if there are two text encoders, some workflows might ask you to provide two prompts. If you want to use the same prompt for both, you can often just link one text encode node to both inputs of the SDXL model.
  • For FLUX workflows, the CLIPTextEncodeFlux node will have two text boxes in its UI: one for clip_l (it suggests entering “tags” or short descriptors) and one for t5xxl (full sentence (CLIPTextEncodeFlux Node for ComfyUI Explained | ComfyUI Wiki)】. You can either:
    • Enter the same text in both (some workflows might auto-copy one to the other if left blank).
    • Or enter a concise list in clip_l (like cat, relaxing, windowsill, sunlight, detailed fur) and a descriptive sentence in t5xxl (like “A cat is relaxing on a windowsill, sunlight streaming through the window shows the rich detail of the cat’s fur.”】. This dual prompting can yield very coherent results, as one encoder handles the “style/tags” and the other the “description”. It’s a bit advanced, but feel free to experiment once comfortable.

Step 5: Set other parameters in the nodes:

  • Find the Sampler node (often KSampler or similar). Set the steps, CFG (Guidance), and sampler method here. It will have fields for these when clicked. If not sure, use the values from the loaded workflow (they often preset e.g. 20 steps, CFG 7, Euler).
  • Check the resolution in the noise or sampler node. There might be a “Noise” node that has width/height, or the sampler itself has resolution fields. Adjust to your desired size.
  • Ensure the model is connected: The Checkpoint loader outputs a UNet (model) which should feed into the sampler. If the model didn’t load or connect, you might need to re-connect it (drag from model node output to sampler’s model input).
  • If using SDXL with refiner, ensure that after the base sampler, the refiner is set up: typically, the base produces an image, then the refiner takes that image latent plus prompt and does a few extra steps. Workflows might automate this; just double-check if there’s a second sampling stage for the refiner (and you may have another CFG/steps for it).
  • Check the LoRA or embeddings if any are in the workflow. If you want to add a LoRA in ComfyUI, you’d insert a Load LoRA node and connect it to the Checkpoint (and perhaps to text encoder if it’s a text lora, usually just UNet for style LoRAs). For simplicity, skip this until you have a basic image working.

Step 6: Run the workflow:

  • Press the “Execute” or play button (triangle icon in top menu). The workflow will start running through nodes. You’ll see each node highlight as it processes.
  • The output image will likely appear on a viewer node or be saved to the output folder of ComfyUI.
  • If there’s an error, read it: often it’s because a model file path was wrong or a node wasn’t connected. Fix any issues (e.g., load the correct model file, reconnect nodes) and try again.

Using ComfyUI effectively: It may be useful to split your workflow into sections:

  • Text Conditioning Section – where prompts are encoded (positive and negative).
  • UNet and Sampler Section – where the model generates the latent image.
  • VAE Decode Section – to convert latent to actual image.
  • Post-processing Section – if any (like upscaling nodes or saving).

ComfyUI lets you do model merging as well by using Model Merge node】, but that’s beyond basic usage. You can merge two models by loading them and connecting to a merge node with a ratio, producing a new model output. This is an advanced topic; you might use A1111’s checkpoint merger for simplicity if you’re not comfortable with nodes for merging.

4.2 Special Considerations in ComfyUI for Each Model

  • SD1.5 in ComfyUI: Very straightforward – just one text encoder node (CLIPTextEncode) for positive, one for negative, and the UNet model. Make sure to use the correct VAE if needed (there’s a VAE loader node or the Checkpoint node might output VAE too).
  • SDXL in ComfyUI: Use the dedicated SDXL nodes if available. By now, there are SDXL Conditioning nodes that handle the dual encoders. For example, a node might be called “SDXL Prompt Encoder” that takes a text and outputs a two-part conditioning suitable for SDXL base. If not, you can manually use two CLIPTextEncode nodes. Typically:
    • One CLIPTextEncode (with OpenCLIP ViT-G model selected) for the main prompt.
    • Another CLIPTextEncode (with CLIP ViT-L model selected) for the second prompt.
    • Both outputs are then merged into the SDXL Diffusion model node (which might be a specialized SDXL sampler node).
    • Many community workflows simplify this, so try to use those to avoid dealing with raw internals.
    • Don’t forget the Refiner: after the base generation, to use the refiner, ComfyUI might have an SDXL Refiner node where you feed the base image and prompt again. There’s something called SDXL Refiner Prompt Encoder in some node librarie】 which preps the conditioning for the refiner. Ensure you use SDXL refiner model file in a second checkpoint loader.
  • FLUX in ComfyUI: As mentioned, the key node is *CLIPTextEncodeFlux (CLIPTextEncodeFlux Node for ComfyUI Explained | ComfyUI Wiki)】. This node expects:
    • A CLIP model (it likely loads internally if using the combined FP8 flux model, or you might have a CLIP model file).
    • It might need you to load a T5 model checkpoint too (depending on implementation – if using the official FLUX FP8 single file, it includes everything; if using older method, you had to load a T5 encoder as well).
    • The node will output a “conditioning” that goes into a specialized FLUX KSampler (or maybe it can plug into a normal KSampler if the UNet is loaded).
    • If you use a ready FLUX workflow from a guid】, follow its instructions. Likely, you just update the prompt and hit generate.
    • FLUX might be slower on the first run as it loads the large T5 model (which can be several GB in RAM). Be patient.
    • In FLUX ComfyUI workflows, you’ll also see maybe a DualCLIPLoader node or similar that loads the two text models. Or a combined Checkpoint loader that outputs the UNet and also feeds into CLIPTextEncodeFlux.
  • ControlNet in ComfyUI: ComfyUI uses ControlNet nodes that you place in the graph. You’d load a ControlNet model and connect the conditioning. If you’re comfortable, you can insert a ControlNet for SD1.5 or SDXL. For FLUX, it’s tricky but apparently possible through FLUX ControlNet nodes (the ComfyUI wiki shows “FLUX.1 ControlNet” in tutorial lis】).
    • Likely approach: For SDXL, use the SDXL ControlNet models by adding a ControlNet injector in the pipeline.
    • For SD1.5, plenty of nodes exist (like “ControlNet Apply” node which takes the control conditioning and the UNet).
    • Covering step-by-step for ComfyUI ControlNet is too detailed here; check the ComfyUI wiki ControlNet tutoria (Flux.1 ComfyUI Guide, workflow and example | ComfyUI Wiki)】 if needed.
  • Merging Models in ComfyUI: If you want to experiment with blending SDXL with a 1.5 model or such, ComfyUI’s ModelMergeSimple node can mix two checkpoints by a rati】. For example, to merge a style from one model into another. To do so:
    • Load Model A (output to A), Load Model B (output to B), feed both into ModelMerge node with weight (e.g. 0.3 meaning 30% of B into A) and then that output goes to your sampler as the UNet.
    • You might also have to merge the text encoders or ensure compatibility. Usually merging is done between same architecture models (e.g. two SD1.5 models). Merging SDXL with 1.5 is not straightforward due to different size.
    • For FLUX, merging doesn’t really apply unless another similar model exists to merge with.

4.3 Workflow Example in ComfyUI

Let’s walk through an example of generating an SDXL image in ComfyUI to consolidate understanding:

Suppose we want to generate a fantasy castle scene with SDXL and refine it.

  1. Load SDXL base and refiner workflow (for example, the ComfyUI Wiki’s SDXL workflow or one shared on Reddit).
  2. In the graph, find the node or section to input prompts. It might have a node labeled “Positive Prompt” and “Negative Prompt” (some workflows create a custom node group for convenience).
  3. Enter: Positive: “A majestic castle on a hilltop overlooking a lake, golden sunset light, high detail, concept art, matte painting, epic atmosphere”. Negative: “low quality, oversaturated, cartoonish, people” (we don’t want any characters or low quality).
  4. Ensure SDXL base checkpoint is loaded in the CheckpointLoader (e.g. select stable-diffusion-xl-base-1.0.safetensors). The refiner CheckpointLoader should have stable-diffusion-xl-refiner-1.0.safetensors.
  5. Check the resolution: set base generation at 1024×576 (a nice wide aspect for a castle landscape).
  6. Sampler node: use DPM++ 2M Karras, 30 steps, CFG 7 for base. The refiner sampler: maybe 15 steps, CFG 5 (you want a lighter touch on the refiner).
  7. Hit Execute. The base model generates the image. Then the refiner model will run (you’ll see the nodes processing sequentially if set up).
  8. Result: An image appears of a castle. If it’s too dark or not as expected, adjust prompt or settings and run again.
  9. If satisfied, save the image (ComfyUI usually auto-saves in output folder with a filename). You can also connect an “Save Image” node to auto-save with a custom name if desired.

Now a FLUX example snippet for comparison:

Say we want FLUX to generate the same concept:

  1. Load a FLUX text2img workflow. (We’ll assume flux [dev] model loaded.)
  2. Find CLIPTextEncodeFlux node UI. Enter in t5xxl: “A majestic castle on a hilltop overlooking a lake at sunset. The scene is painted in epic fantasy style with golden light and dramatic clouds.” In clip_l: “castle, hill, lake, sunset, epic, fantasy”.
  3. Negative (depending on workflow, might be a separate node or part of the same): “low quality, blurry, people, text”.
  4. Ensure Flux model is loaded in UNet (the loader might have loaded a .safetensors for flux dev).
  5. Resolution: FLUX dev can do 768×432 or bigger; try 768×432 for speed.
  6. Sampler: use Euler or DPM++ SDE, ~25 steps, CFG ~7.
  7. Execute. FLUX will process (taking a bit to load T5). The output appears, likely very coherent with our description.
  8. Compare it with SDXL’s output. Perhaps FLUX gave even more vibrant detail or placed things slightly differently because of how it understood the prompt. If needed, refine the wording and re-run.

4.4 Troubleshooting ComfyUI Outputs

  • If nothing is appearing, check the Console (if running via command line, you’ll see logs) or the ComfyUI interface’s status. It might be a missing model error.
  • If you see red nodes, that indicates a missing connection or resource. Click them to read the error. For example, a red text encoder node might say it can’t find the model file for the encoder.
  • For performance: ComfyUI might use more VRAM because everything is explicit. If you run out of memory, consider smaller resolution or enabling settings like “auto CUDA graph” or half precision (some workflows use float16 automatically).
  • You can right-click the canvas > Queue Prompt or use a queue if you want to batch multiple prompts in ComfyUI, but that’s advanced usage.

5. Advanced Techniques and Tips

Finally, let’s cover some advanced techniques that apply to all models and interfaces:

5.1 Using LoRAs for Styles or Subjects

We touched on LoRAs in A1111; in ComfyUI, it’s similar conceptually but via nodes. You can load multiple LoRAs at once to mix effects. A common use is adding a LoRA for a specific art style or a specific character face to an existing model.

  • In A1111: you can chain LoRA tags, e.g. <lora:charlieChaplin:0.8> <lora:pencilSketch:0.5> to apply two LoRAs.
  • In ComfyUI: you’d use multiple “Load LoRA” nodes feeding into the model (some ComfyUI LoRA nodes allow loading two at once, or you just stack them by feeding model -> lora1 -> lora2 -> sampler).
  • LoRAs can significantly change the output. If your base model is SDXL and you use a SDXL LoRA trained on a style, you might achieve things that base SDXL couldn’t do out-of-the-box. E.g., a LoRA for a specific anime style can make SDXL produce near-NAI-level anime art, even though SDXL wasn’t primarily trained on it.
  • Keep LoRA weights moderate. If a LoRA is trained well, 0.8 is often enough. If you only want a slight influence, 0.3-0.5.
  • Some LoRAs are “character LoRAs” which bring in a likeness of a person or fictional character. Use these ethically and note that high weights might force the face but distort the rest of composition.
  • FLUX LoRAs: Few exist yet publicly, but you can train LoRAs on FLUX too. According to community, FLUX LoRAs tend to be very consistent and high-qualit】. Usage would be similar: (In ComfyUI Forge or others, load LoRA, apply to flux model).

5.2 Embeddings (Textual Inversion)

Textual Inversion embeddings (like we mentioned “EasyNegative”) are custom “words” that the model learns. Many 1.5 embeddings are out there for celebrity faces, styles, etc. They are used by just including the token in prompt, after you load them in the UI.

  • In A1111, put the file in embeddings folder. The name of the file (minus extension) is the token. Use it in prompt like a word.
  • In ComfyUI, use a “Load Textual Embedding” node which will output an embedding tensor to plug into the Text Encode. Or simply the CLIPTextEncode node in Comfy might automatically pick it up if you type the token (depending on node implementation).
  • SDXL embeddings: Must be trained specifically on SDXL (embedding from 1.5 won’t directly translate).
  • FLUX embeddings: Likely not applicable due to different text encoder (one could train a pseudo-embedding for CLIP part, but T5 makes it tricky).
  • Use embeddings sparingly – e.g. a style embedding might conflict with your other tokens. They’re great for specific things though (like invoking a particular artist style that wasn’t in training data).

5.3 Model Merging and Checkpoint Mixes

If you want the flexibility of one model plus style of another, you can try merging:

  • In AUTOMATIC1111: go to the Checkpoint Merger tab. Select Primary model A, Secondary model B, set a ratio (0.2 means 20% of B into A). Choose a merge method (Weighted sum is fine for basic). Click Merge and save a new checkpoint. Then use that new checkpoint for generation. Many popular models are actually merges of several others to balance strengths.
  • In ComfyUI: as mentioned, use a Merge node in the grap】. For example, take SD1.5 and a custom model, merge halfway to reduce some overfitting.
  • Merging can produce great hybrids but is trial and error. Also, merging SDXL with SD1.5 is generally not feasible (architectures differ). Merging SDXL with another SDXL-based or merging FLUX with something else is also not standard. Stick to merging within model families.
  • A simpler alternative: use LoRA merges or just prompt techniques to mimic another model’s style (less hassle than managing new merged files).

5.4 Inpainting and Outpainting

Both A1111 and ComfyUI can do inpainting (editing part of an image):

  • A1111: Switch to the Inpaint tab. Upload an image or send result to inpaint. Mask the area you want to change (draw a mask). Then prompt for what you want in that area. Use the inpainting model if available (SD1.5 has a special inpainting model, SDXL as well).
    • E.g. you generated a portrait but the hands are weird – you can mask the hand and prompt “a well-formed hand”.
  • ComfyUI: You can set up a node workflow that takes an init image and mask, uses a KSampler in image-to-image mode with mask.
  • FLUX inpainting: FLUX has a [fill] variant for filling in areas (Flux Fill). If using ComfyUI, you might load the flux fill model for inpainting task】.
  • Outpainting: expanding beyond the original image. In A1111, you can use the Canvas or simply create a larger blank image, paste your image in one area, mask the new areas, and inpaint those. Or use the separate Infinite canvas tools. ComfyUI can do similar with outpainting node setups (like using larger noise and an overlay).
  • For outpainting, ensure your prompt includes the style of the original so it matches.

5.5 Prompt Iteration and Evolution

No prompt is perfect first try. A recommended approach:

  1. Start simple – just subject and one style element. Generate a few.
  2. Add details incrementally. See how each affects output. This way you learn each model’s behavior.
  3. Use Prompt weighting (for SD1.5/SDXL) or rephrasing (for FLUX) to push the image closer to what you imagine.
  4. If the image is almost there except one thing (say the background is too busy), you can try adding busy background to negative or explicitly say “simple background” in positive.
  5. Keep notes of prompts that work well. You can save prompt text along with images for reference.

5.6 Community Resources

Leverage the community:

  • Check out galleries on sites like CivitAI or Lexica for prompts that achieved certain looks.
  • There are subreddits and Discord communities where people share prompt tips (r/StableDiffusion, r/ComfyUI, etc.).
  • The models themselves often have documentation or example prompts. For SDXL, StabilityAI’s Github had sample prompts. For FLUX, the getimg.ai blog and others have guide】.

5.7 Safety and Ethical Use

A gentle reminder: these models can generate virtually anything – be mindful of not producing disallowed or harmful content. Both A1111 and ComfyUI rely on the user to use them responsibly. Follow the usage guidelines of the model (some models have certain terms that they avoid due to training). Also, credit artists if you heavily use their style, and avoid explicitly copying living artists’ works verbatim in prompts if it’s against their wishes.


Conclusion and Next Steps

You now have a comprehensive overview of how to get the best results when prompting Stable Diffusion 1.5, SDXL, and FLUX.1, across two powerful interfaces (AUTOMATIC1111 and ComfyUI). To recap:

  • Adapt your prompting style to the model: use tag-based prompts and weight syntax for SD1.5, descriptive sentences (possibly dual prompts) for SDXL, and natural language with clear wording for FLU】.
  • Leverage positive and negative prompts to guide the model, and adjust CFG/steps to balance fidelity vs. creativity.
  • Use the right tools in the UI: in A1111, the straightforward txt2img/inpaint panels with options for samplers and CFG; in ComfyUI, build or load workflows that unlock advanced capabilities (refiners, multiple ControlNets, etc.).
  • Take advantage of features like LoRAs to inject styles, embeddings for fine control, and ControlNet to impose structure or follow references.
  • Experiment and iterate: Each model has strengths – e.g., FLUX for text and complex scene】, SDXL for high-res detail, 1.5 for variety with countless community tweaks. Don’t be afraid to try the same prompt idea in all three and see which you prefer, then refine from there.

We included visual examples to illustrate differences and a structured breakdown for clarity. Use this guide as a reference as you create – perhaps keep it open while you work in one window and the UI in another.

Happy prompting, and may your imagination come to life in vibrant detail!

Back to blog