Programmatic automation suite for running ComfyUI with Flux and WAN models on GPU hosts, delivering reproducible workflows, config-driven batch generation, and LoRA management that transforms a manual UI-driven system into a scalable production pipeline for image and video asset creation.
Generative image workflows are often fragile, manual, and difficult to scale. This project provides a programmatic automation layer over ComfyUI, delivering reproducible workflows, batch generation, and LoRA management that turns a UI-driven system into a scalable production pipeline suitable for enterprise creative operations.
The generative AI market demands enterprise-grade tooling that bridges the gap between creative experimentation and production deployment. Most generative AI tools require manual interaction for each output, making them unsuitable for campaigns requiring hundreds or thousands of consistent assets. By abstracting ComfyUI's node-graph complexity behind a configuration-driven API, we enable creative teams to scale from single images to massive batch runs while maintaining brand consistency, quality standards, and full reproducibility across every generated asset. The platform supports the complete spectrum of modern generative models including Stable Diffusion XL, Flux Dev, Flux Schnell, and WAN video generation models, each accessible through the same unified programmatic interface.
Every workflow is serialized as a JSON artifact containing the complete node graph, model references, sampler configuration, seed values, and LoRA weight assignments. This serialization enables version control of generation configurations, deterministic reproduction of any previously generated output, and A/B testing of prompt strategies with measurable quality comparisons. For organizations operating in regulated industries where provenance documentation is mandatory, the metadata chain provides an auditable record connecting every generated asset to its exact generation parameters.
Our pipeline supports the latest generation of open-source diffusion models, each optimized for different production requirements and abstracted behind a unified configuration interface that handles architectural differences transparently.
The platform provides unified workflow generation across multiple model architectures with fundamentally different internal designs. Flux Dev leverages a transformer-based diffusion architecture (DiT) with dual text encoder conditioning from both CLIP and T5-XXL, delivering superior text rendering, spatial coherence, and prompt adherence compared to prior UNet-based models. The workflow builder constructs the appropriate node graph for Flux's unique architecture automatically, connecting the DualCLIPLoader, UNETLoader, and VAEDecode nodes in the correct topology with proper conditioning chain routing. Flux Schnell offers four-step generation through a distilled model variant, enabling rapid iteration cycles where creative teams can explore prompt variations at ten times the speed of standard inference without sacrificing compositional accuracy.
Stable Diffusion XL remains available for workflows requiring the extensive ecosystem of community-trained LoRA adapters, ControlNet pose models, and IP-Adapter image conditioning. Our abstraction layer handles the differences in tokenizer configuration, VAE selection, resolution requirements, and sampling schedules transparently, so operators switch between model families by changing a single configuration parameter in the YAML manifest. The system validates model-sampler compatibility before queuing, preventing wasted GPU cycles on misconfigured workflows that would fail silently or produce degraded output.
The WAN video pipeline extends image automation into motion content production. Starting from a text prompt, a reference image, or a combination of both, the system generates frame sequences using video-optimized diffusion models that maintain temporal coherence across the duration of the clip. The pipeline manages the complete frame lifecycle: initial keyframe generation, inter-frame interpolation using dedicated motion models, temporal smoothing to eliminate flicker artifacts, and final assembly into standard MP4 or WebM container formats with configurable codec settings.
Frame interpolation models increase the effective frame rate from the sparse keyframes produced by the diffusion pass, producing smooth motion at 24 or 30 frames per second from an initial set of as few as eight to sixteen generated keyframes. The interpolation model is itself configurable, allowing selection between RIFE-based and flow-based interpolators depending on the motion characteristics of the content. This end-to-end pipeline enables marketing teams to produce short-form video content for social media, product demonstrations, and advertising without traditional video production workflows, stock footage licensing, or dedicated motion graphics personnel.
The ComfyUI API manager handles server health polling, queue depth monitoring, prompt submission via the WebSocket and REST APIs, and generation status tracking with automatic retry on transient failures. The manager maintains a persistent connection pool and implements exponential backoff for server overload conditions, ensuring reliable operation even when running multi-hour batch campaigns that generate thousands of images sequentially.
Config-driven batch output reads YAML manifests defining prompt templates, model selections, sampler parameters, and output specifications. Variable substitution enables a single campaign file to define hundreds of prompt variations across style, subject, and composition dimensions. Output organization, metadata tagging, and progress checkpointing are handled automatically, enabling overnight generation runs that resume cleanly after interruption.
Built-in support for LoRA adapter hot-swapping, checkpoint versioning, VAE encoder selection, and CLIP text encoder variant management. LoRA download scripts handle CivitAI and HuggingFace model retrieval with hash verification. The config file declares model dependencies per workflow, and the system validates that all required weights are available locally before queuing any generation, preventing mid-batch failures from missing model files.
End-to-end video production from text or image prompts through keyframe generation, temporal smoothing, frame interpolation, and final container assembly. The pipeline manages VRAM allocation across the generation and interpolation stages, ensuring that video workflows run successfully on consumer-grade GPUs by serializing memory-intensive operations rather than attempting parallel execution.
Every generation run produces a saved workflow JSON artifact containing the complete node graph, all parameter values, and model hashes. These artifacts are version-controlled alongside the automation code, enabling deterministic reproduction of any historical output. Compliance teams can trace any published asset back to its exact generation configuration, satisfying audit requirements for AI-generated content provenance.
Operational scripts handle ComfyUI server lifecycle management including startup, graceful shutdown, model preloading, and VRAM cache clearing between workflow switches. Additional scripts manage LoRA weight downloads from CivitAI with metadata extraction, model directory organization by architecture family, and disk space monitoring to prevent generation failures from storage exhaustion.
From batch image campaigns to AI-generated video, every workflow is driven by YAML configuration files and Python orchestration rather than manual UI interaction, enabling CI/CD integration and headless production operation.
The batch engine reads a YAML configuration file containing prompt templates, model selections, sampler parameters, and output specifications. Each configuration defines a generation campaign as a declarative manifest: the model checkpoint and LoRA stack, the sampler algorithm and step count, the classifier-free guidance scale, the output resolution and aspect ratio, and a set of prompt templates with variable placeholders. The engine iterates through all combinations of prompt variables, substituting values for each generation pass. A campaign requiring five hundred product images across ten style variations and five aspect ratios is defined in a single YAML file and executed without supervision. Progress tracking writes checkpoint files after each successful generation, enabling interrupted runs to resume from the last completed output rather than restarting from the beginning.
The ComfyUI API client is built on aiohttp for non-blocking I/O, enabling the orchestration layer to monitor generation progress, manage the output download queue, and handle error recovery concurrently. WebSocket connections provide real-time progress events during generation, allowing the batch engine to display accurate completion estimates and detect stalled generations that need to be requeued. The async architecture also enables parallel operation across multiple ComfyUI server instances when horizontal scaling is required for high-volume campaigns, distributing work across GPU nodes while maintaining centralized progress tracking and output aggregation.
Programmatic workflow generation with full ComfyUI API control eliminates manual UI interaction and enables CI/CD integration for creative pipelines. The abstraction handles the complexity of constructing valid node graphs for different model architectures, so operators define what they want generated rather than how the nodes should be wired. This separation of intent from implementation is the key architectural insight that makes the platform accessible to non-technical creative directors.
Saved workflow JSON artifacts with complete metadata chains — checkpoint hashes, sampler settings, seed values, prompt text, and LoRA weight assignments — ensure every generation can be exactly replicated at any future date on any compatible hardware. This reproducibility is a hard requirement for regulated industries and a significant differentiator against competitors whose generation processes are inherently non-deterministic.
A unified interface across Flux, SDXL, and WAN video models with LoRA hot-swapping and per-workflow configuration means the platform adapts to new model releases without architectural changes. As the open-source generative model ecosystem evolves rapidly, this flexibility ensures the automation layer remains relevant and capable regardless of which model family dominates the next generation of creative tooling.
Running generation on local GPU hardware with RTX 3090 and RTX 4060 Ti cards eliminates per-image API fees that accumulate rapidly at production scale. At a thousand images per day, cloud generation services cost ten to fifty times more than amortized local hardware. The platform's local-first architecture turns creative asset generation from a variable operating expense into a fixed capital investment with predictable, declining unit economics.
Measurable improvements in creative production throughput, cost efficiency, and brand consistency across real-world batch generation campaigns.
Production speed vs manual ComfyUI operation
Per-image API fees with local GPU execution
Reproducible outputs with saved workflow artifacts
Model architectures supported from single config
Core ComfyUI API client and Flux workflow builders
Model, server, and campaign YAML configurations
Operational scripts and saved workflow JSON artifacts
Learn how we can automate your generative content workflows and scale your creative production pipeline.