Generate AI Videos on Your Own GPU — WAN 2.1 + ComfyUI, Pre-Configured
AI video generation models like WAN 2.1 are open-source and free to run. The actual barrier is setup — downloading 14 billion parameter model weights, installing ComfyUI, configuring custom nodes, resolving dependency conflicts, and ensuring the correct workflow templates are in place. On a fresh VM, this takes hours.
We built two pre-configured VM images that skip the entire process. Every model, every custom node, and every workflow template is already installed. You deploy the VM, open your browser, load a template, and generate video.
This guide covers both images: one for text-to-video, and one that adds image-to-video on top of it.
Two Images, Two Capabilities
| Image Name | What It Does | Models Included |
|---|---|---|
| Barrack ComfyUI - WAN T2V 14B | Text-to-video generation | WAN 2.1 T2V-14B (fp8), UMT5-XXL text encoder, WAN VAE |
| Barrack ComfyUI - WAN T2V-I2V | Text-to-video + image-to-video generation | Everything above, plus WAN 2.1 I2V-14B models |
If you only need text-to-video, deploy the T2V 14B image. If you also want to generate video from a reference image, deploy the T2V-I2V image.
Both images ship with:
- ComfyUI — node-based visual interface for building generation workflows, serving on port 8188
- Gradio interface — simpler alternative web interface, serving on port 7860
- ComfyUI-WanVideoWrapper — custom node package for WAN 2.1 integration
- ComfyUI-VideoHelperSuite — custom node for video output and combining
- Pre-built workflow templates — ready to load and use immediately
- Ubuntu 22.04 with NVIDIA drivers and CUDA pre-configured
All services start automatically on boot.
Compatible GPUs
Both images are available on the following GPUs:
| GPU | VRAM |
|---|---|
| RTX A6000 | 48 GB |
| L40 | 48 GB |
| A100 PCIe | 80 GB |
| H100 PCIe | 80 GB |
| H100 PCIe NVLink | 80 GB |
The WAN 2.1 T2V-14B model in fp8 quantization fits within 48GB VRAM. Higher-VRAM GPUs like the A100 and H100 provide additional headroom and faster generation. Generation time for a standard clip is approximately 7–10 minutes depending on resolution, frame count, and GPU selected.
Step 1 — Create an Account
Go to barrack.ai/signup. Register with email or Google OAuth.
Complete your billing profile at My Account: full name, billing address, postal code, and country. Your billing currency is assigned based on your country.
Purchase credits. Minimum deposit: $5.00 (USD), €5.00 (EUR), or ₹100.00 (INR).
Step 2 — Deploy the VM
- Go to barrack.ai/dashboard
- Select your GPU — both images are compatible with RTX A6000, L40, A100 PCIe, H100 PCIe, and H100 PCIe NVLink
- Set GPU count to 1
- In the OS Image dropdown, select either:
- Barrack ComfyUI - WAN T2V 14B (text-to-video only)
- Barrack ComfyUI - WAN T2V-I2V (text-to-video + image-to-video)
- Create or select an SSH key
- Click Deploy
The VM enters provisioning within seconds. Wait until the status shows Active.
Prefer API deployment? See the API deployment documentation.
Step 3 — Find Your IP Address
- Go to barrack.ai/dashboard
- Click the dropdown at the top of the page
- Select your instance
- Click Details
- Your public IP address is displayed there
Public IP is automatically enabled for this image.
Step 4 — Open ComfyUI
Open your browser and navigate to:
http://YOUR_PUBLIC_IP:8188
This loads the ComfyUI interface. You will see an empty canvas — this is expected. ComfyUI requires you to load a workflow template before you can generate anything.
Alternative interface: A Gradio-based interface is also available at
http://YOUR_PUBLIC_IP:7860.
Step 5 — Load a Workflow Template
Workflow templates are pre-installed on the VM. To load one:
- Click Workflow in the top menu
- Click Browse templates or Load
- Select a template based on what you want to generate
Available Templates for T2V (Text-to-Video)
wanvideo_T2V_example_02— standard text-to-video generationwanvideo_long_T2V_example_01— extended duration text-to-video
Additional Templates for T2V-I2V (Image-to-Video)
If you deployed the T2V-I2V image, you also have access to image-to-video templates that accept a reference image as input and generate video from it.
Step 6 — Generate a Video
After loading a template:
- Locate the text prompt node in the workflow
- Enter your prompt describing the video you want to generate
- Click Queue Prompt or the Run button
- Wait for generation to complete — approximately 7–10 minutes for a standard clip
- The output video appears in the output node and is saved to the VM's filesystem
Example Prompt
A cat sitting on a windowsill watching rain fall outside, soft lighting, cinematic, 4K
The model generates video at up to 720p resolution with smooth motion and temporal coherence.
SSH Access
Connect to the VM via SSH if you need terminal access:
ssh ubuntu@YOUR_PUBLIC_IP
Generated videos are saved in the ComfyUI output directory:
ls ~/ComfyUI/output/
You can download generated files using scp:
scp ubuntu@YOUR_PUBLIC_IP:~/ComfyUI/output/your_video.mp4 ./
When to Use This
- Content creation — generate short-form video clips for social media, ads, or product demos
- Prototyping — test visual concepts before investing in production shoots
- Private generation — no content moderation filters, no data uploaded to third-party services
- Cost control — per-minute billing, no per-generation fees, no monthly subscriptions
- Custom workflows — ComfyUI's node system lets you build and modify generation pipelines
WAN 2.1 vs Closed Alternatives
WAN 2.1 is open-source (Apache 2.0) and runs entirely on your infrastructure. Closed alternatives like Sora 2, Veo 3.1, and Seedance 2.0 require subscriptions, impose content restrictions, and process your prompts on external servers. With WAN 2.1 on your own VM, your prompts and outputs stay on your machine.
Resources
- WAN 2.1 — Hugging Face
- ComfyUI — GitHub
- ComfyUI-WanVideoWrapper — GitHub
- Barrack AI Deployment Guide
- Barrack AI API Documentation
Frequently Asked Questions
What is WAN 2.1?
WAN 2.1 is an open-source AI video generation model developed by Alibaba. The T2V-14B variant has 14 billion parameters and generates video from text prompts. The I2V variant generates video from a reference image combined with a text prompt. Both are released under the Apache 2.0 license.
What is the difference between the T2V 14B and T2V-I2V images?
The T2V 14B image supports text-to-video generation only — you type a text prompt and the model generates a video clip. The T2V-I2V image includes everything in the T2V image plus image-to-video models, allowing you to provide a reference image as input and generate video based on it.
Which GPUs can run WAN 2.1 video generation?
Both images are available on RTX A6000 (48GB), L40 (48GB), A100 PCIe (80GB), H100 PCIe (80GB), and H100 PCIe NVLink (80GB). The model in fp8 quantization fits within 48GB VRAM. Higher-VRAM GPUs provide faster generation times.
How long does video generation take?
Approximately 7–10 minutes per clip on an RTX A6000, depending on resolution and frame count. Higher-VRAM GPUs like the H100 reduce generation time.
Do I need to install anything after deploying the VM?
No. All models are pre-downloaded, ComfyUI is configured, custom nodes are installed, and workflow templates are ready to load. All services start automatically on boot.
Does my data leave the VM?
No. All generation runs locally on your GPU. No prompts, images, or output videos are sent to any external service.
What resolution and duration can I generate?
WAN 2.1 T2V-14B generates video at up to 720p resolution. Duration depends on the workflow template — standard templates produce short clips, and the long T2V template supports extended sequences.
How is this different from Sora, Veo, or Seedance?
Sora 2, Veo 3.1, and Seedance 2.0 are closed-source services that require subscriptions, impose content moderation, and process your data on external servers. WAN 2.1 is open-source (Apache 2.0), runs entirely on your VM, has no content restrictions, and incurs no per-generation fees.
What does it cost?
Barrack AI uses per-minute billing with no contracts. You pay only for the time your VM is running. There are no per-generation fees and no monthly subscriptions. H100 PCIe starts at $1.99/hr.
How do I deploy via API instead of the dashboard?
Barrack AI provides a full deployment API. See the API documentation for programmatic instance creation, management, and termination.
Last updated: February 23, 2026
Barrack AI provides GPU cloud instances for AI workloads — per-minute billing, no contracts. Learn more →
