A Coding Information to Excessive-High quality Picture Technology, Management, and Enhancing Utilizing HuggingFace Diffusers

By admin2010

February 21, 2026

3

On this tutorial, we design a sensible image-generation workflow utilizing the Diffusers library. We begin by stabilizing the setting, then generate high-quality pictures from textual content prompts utilizing Steady Diffusion with an optimized scheduler. We speed up inference with a LoRA-based latent consistency method, information composition with ControlNet underneath edge conditioning, and eventually carry out localized edits through inpainting. Additionally, we deal with real-world methods that steadiness picture high quality, velocity, and controllability.

!pip -q uninstall -y pillow Pillow || true
!pip -q set up --upgrade --force-reinstall "pillow<12.0"
!pip -q set up --upgrade diffusers transformers speed up safetensors huggingface_hub opencv-python


import os, math, random
import torch
import numpy as np
import cv2
from PIL import Picture, ImageDraw, ImageFilter
from diffusers import (
   StableDiffusionPipeline,
   StableDiffusionInpaintPipeline,
   ControlNetModel,
   StableDiffusionControlNetPipeline,
   UniPCMultistepScheduler,
)

We put together a clear and appropriate runtime by resolving dependency conflicts and putting in all required libraries. We guarantee picture processing works reliably by pinning the right Pillow model and loading the Diffusers ecosystem. We additionally import all core modules wanted for technology, management, and inpainting workflows.

def seed_everything(seed=42):
   random.seed(seed)
   np.random.seed(seed)
   torch.manual_seed(seed)
   torch.cuda.manual_seed_all(seed)


def to_grid(pictures, cols=2, bg=255):
   if isinstance(pictures, Picture.Picture):
       pictures = [images]
   w, h = pictures[0].measurement
   rows = math.ceil(len(pictures) / cols)
   grid = Picture.new("RGB", (cols*w, rows*h), (bg, bg, bg))
   for i, im in enumerate(pictures):
       grid.paste(im, ((i % cols)*w, (i // cols)*h))
   return grid


gadget = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.float16 if gadget == "cuda" else torch.float32
print("gadget:", gadget, "| dtype:", dtype)

We outline utility features to make sure reproducibility and to arrange visible outputs effectively. We set world random seeds so our generations stay constant throughout runs. We additionally detect the out there {hardware} and configure precision to optimize efficiency on the GPU or CPU.

seed_everything(7)
BASE_MODEL = "runwayml/stable-diffusion-v1-5"


pipe = StableDiffusionPipeline.from_pretrained(
   BASE_MODEL,
   torch_dtype=dtype,
   safety_checker=None,
).to(gadget)


pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)


if gadget == "cuda":
   pipe.enable_attention_slicing()
   pipe.enable_vae_slicing()


immediate = "a cinematic picture of a futuristic avenue market at nightfall, ultra-detailed, 35mm, volumetric lighting"
negative_prompt = "blurry, low high quality, deformed, watermark, textual content"


img_text = pipe(
   immediate=immediate,
   negative_prompt=negative_prompt,
   num_inference_steps=25,
   guidance_scale=6.5,
   width=768,
   peak=512,
).pictures[0]

We initialize the bottom Steady Diffusion pipeline and swap to a extra environment friendly UniPC scheduler. We generate a high-quality picture instantly from a textual content immediate utilizing fastidiously chosen steering and backbone settings. This establishes a robust baseline for subsequent enhancements in velocity and management.

LCM_LORA = "latent-consistency/lcm-lora-sdv1-5"
pipe.load_lora_weights(LCM_LORA)


strive:
   pipe.fuse_lora()
   lora_fused = True
besides Exception as e:
   lora_fused = False
   print("LoRA fuse skipped:", e)


fast_prompt = "a clear product picture of a minimal smartwatch on a reflective floor, studio lighting"
fast_images = []
for steps in [4, 6, 8]:
   fast_images.append(
       pipe(
           immediate=fast_prompt,
           negative_prompt=negative_prompt,
           num_inference_steps=steps,
           guidance_scale=1.5,
           width=768,
           peak=512,
       ).pictures[0]
   )


grid_fast = to_grid(fast_images, cols=3)
print("LoRA fused:", lora_fused)


W, H = 768, 512
structure = Picture.new("RGB", (W, H), "white")
draw = ImageDraw.Draw(structure)
draw.rectangle([40, 80, 340, 460], define="black", width=6)
draw.ellipse([430, 110, 720, 400], define="black", width=6)
draw.line([0, 420, W, 420], fill="black", width=5)


edges = cv2.Canny(np.array(structure), 80, 160)
edges = np.stack([edges]*3, axis=-1)
canny_image = Picture.fromarray(edges)


CONTROLNET = "lllyasviel/sd-controlnet-canny"
controlnet = ControlNetModel.from_pretrained(
   CONTROLNET,
   torch_dtype=dtype,
).to(gadget)


cn_pipe = StableDiffusionControlNetPipeline.from_pretrained(
   BASE_MODEL,
   controlnet=controlnet,
   torch_dtype=dtype,
   safety_checker=None,
).to(gadget)


cn_pipe.scheduler = UniPCMultistepScheduler.from_config(cn_pipe.scheduler.config)


if gadget == "cuda":
   cn_pipe.enable_attention_slicing()
   cn_pipe.enable_vae_slicing()


cn_prompt = "a contemporary cafe inside, architectural render, smooth daylight, excessive element"
img_controlnet = cn_pipe(
   immediate=cn_prompt,
   negative_prompt=negative_prompt,
   picture=canny_image,
   num_inference_steps=25,
   guidance_scale=6.5,
   controlnet_conditioning_scale=1.0,
).pictures[0]

We speed up inference by loading and fusing a LoRA adapter and reveal quick sampling with only a few diffusion steps. We then assemble a structural conditioning picture and apply ControlNet to information the structure of the generated scene. This enables us to protect composition whereas nonetheless benefiting from artistic textual content steering.

masks = Picture.new("L", img_controlnet.measurement, 0)
mask_draw = ImageDraw.Draw(masks)
mask_draw.rectangle([60, 90, 320, 170], fill=255)
masks = masks.filter(ImageFilter.GaussianBlur(2))


inpaint_pipe = StableDiffusionInpaintPipeline.from_pretrained(
   BASE_MODEL,
   torch_dtype=dtype,
   safety_checker=None,
).to(gadget)


inpaint_pipe.scheduler = UniPCMultistepScheduler.from_config(inpaint_pipe.scheduler.config)


if gadget == "cuda":
   inpaint_pipe.enable_attention_slicing()
   inpaint_pipe.enable_vae_slicing()


inpaint_prompt = "a glowing neon signal that claims 'CAFÉ', cyberpunk fashion, practical lighting"


img_inpaint = inpaint_pipe(
   immediate=inpaint_prompt,
   negative_prompt=negative_prompt,
   picture=img_controlnet,
   mask_image=masks,
   num_inference_steps=30,
   guidance_scale=7.0,
).pictures[0]


os.makedirs("outputs", exist_ok=True)
img_text.save("outputs/text2img.png")
grid_fast.save("outputs/lora_fast_grid.png")
structure.save("outputs/structure.png")
canny_image.save("outputs/canny.png")
img_controlnet.save("outputs/controlnet.png")
masks.save("outputs/masks.png")
img_inpaint.save("outputs/inpaint.png")


print("Saved outputs:", sorted(os.listdir("outputs")))
print("Carried out.")

We create a masks to isolate a selected area and apply inpainting to change solely that a part of the picture. We refine the chosen space utilizing a focused immediate whereas maintaining the remaining intact. Lastly, we save all intermediate and ultimate outputs to disk for inspection and reuse.

In conclusion, we demonstrated how a single Diffusers pipeline can evolve into a versatile, production-ready picture technology system. We defined learn how to transfer from pure text-to-image technology to quick sampling, structural management, and focused picture modifying with out altering frameworks or tooling. This tutorial highlights how we are able to mix schedulers, LoRA adapters, ControlNet, and inpainting to create controllable and environment friendly generative pipelines which might be simple to increase for extra superior artistic or utilized use circumstances.

Try the Full Codes right here. Additionally, be happy to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as properly.

A Coding Information to Excessive-High quality Picture Technology, Management, and Enhancing Utilizing HuggingFace Diffusers

Budgets, Throttling & Mannequin Tiering

Microsoft has a brand new plan to show what’s actual and what’s AI on-line

Google AI Releases Gemini 3.1 Professional with 1 Million Token Context and 77.1 % ARC-AGI-2 Reasoning for AI Brokers

LEAVE A REPLY Cancel reply

Most Popular

Small traders, or shrimps, are shopping for BTC. However it’s the whales who preserve rallies going.

Crypto’s native M2 cash provide is falling and killing liquidity

How did we get to ICE?

Billionaire Stanley Druckenmiller Pours $290,836,000 Into Two Property, Exits Publicity To Three Main US Banks

Recent Comments

ABOUT US

POPULAR POSTS

Small traders, or shrimps, are shopping for BTC. However it’s the whales who preserve rallies going.

Crypto’s native M2 cash provide is falling and killing liquidity

How did we get to ICE?

POPULAR CATEGORY