Playing with Wan2.1 1.3B Text2Video on Mac

This is for testing and generating video from text prompts using Wan2.1-T2V-1.3B (small model) on Mac, these instructions won't work on Windows, so you can try plugging them into ChatGPT and converting them to be windows-specific perhaps if necessary.

See recommended text2video models here, or all possible models here.

System Notes

Llama 405B model requires at least 2TB disk space, and probably 2TB RAM, plus 80GB VRAM to run, for text output (not even video)!
In comparison, Wan2.1 only needs ~32-64GB VRAM, that's all I know.

Installing Base Stuff

In case you don't have it, install iTerm2 on Mac. That is the terminal app, not sure how familiar you are with that.

Next, install homebrew via the terminal. Ideally read how to do it there, but I have pasted the command they suggested here:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Homebrew makes it easy to install packages on the Mac from the terminal, not sure how familiar you are with that, but adding for completeness. Can skip this step if you already have it.

Install conda for installing python environments/packages:

brew install --cask miniconda
echo 'export PATH="/opt/homebrew/Caskroom/miniconda/base/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile

Installing HuggingFace CLI

HuggingFace is the main place open source AI models are stored/shared on the web.

First, install the huggingface CLI:

brew install huggingface-cli

Then, create an account at https://huggingface.co, and visit https://huggingface.co/settings/tokens and create an access token (click "Write" to make it be able to do everything for now).

Copy the token and place it into a file in a local file called .env (this should be kept secret like a password):

HF_TOKEN=your-token-value

Save it there simply to be able to find it later safely.

Next, login to huggingface CLI:

huggingface-cli login

When it asks for these, paste in your token, and press n for "no git credentials":

Enter your token (input will not be visible):
Add token as git credential? (Y/n) n

Installing Wan2.1 Stuff

Next we'll install everything.

conda create -n wan python=3.10
conda activate wan
conda install -c conda-forge numpy pytorch torchvision torchaudio diffusers imageio
pip install accelerate

Now download git repo:

git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1
pip install -r requirements.txt

Download Wan 2.1 model with huggingface-cli (it is many GB, so it will take a while to download, maybe even over an hour.):

huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./wan-t2v-1.3b

This will save the model weights and configuration files to ./wan-t2v-1.3b.

Save the python code to generate.py file (modify it if you'd like):

import sys
import torch
import imageio
from diffusers import DiffusionPipeline

def main():
  if len(sys.argv) < 3:
    print("Usage: python generate_wan.py '<prompt>' <output_file>")
    sys.exit(1)

  prompt = sys.argv[1]
  output_file = sys.argv[2]

  pipe = DiffusionPipeline.from_pretrained(
    "./wan-t2v-1.3b",
    torch_dtype=torch.float16,
    variant="fp16"
  ).to("mps")  # Use "cpu" if MPS fails

  print(f"Generating video for prompt: {prompt}")
  video = pipe(prompt, num_inference_steps=20).videos[0]

  imageio.mimsave(output_file, [frame for frame in video], fps=8)
  print(f"Saved video to {output_file}")

if __name__ == "__main__":
  main()

Then finally, run the python code:

python generate.py "A dragon flying over a snowy mountain" dragon.mp4

If you run into memory issues, you can:

Lower num_inference_steps=10
Change .to("mps") to .to("cpu")

lancejpollard/wan2.1-readme.md

Playing with Wan2.1 1.3B Text2Video on Mac

System Notes

Installing Base Stuff

Installing HuggingFace CLI

Installing Wan2.1 Stuff