This is for testing and generating video from text prompts using Wan2.1-T2V-1.3B (small model) on Mac, these instructions won't work on Windows, so you can try plugging them into ChatGPT and converting them to be windows-specific perhaps if necessary.
See recommended text2video models here, or all possible models here.
- Llama 405B model requires at least 2TB disk space, and probably 2TB RAM, plus 80GB VRAM to run, for text output (not even video)!
- In comparison, Wan2.1 only needs ~32-64GB VRAM, that's all I know.
In case you don't have it, install iTerm2 on Mac. That is the terminal app, not sure how familiar you are with that.
Next, install homebrew via the terminal. Ideally read how to do it there, but I have pasted the command they suggested here:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Homebrew makes it easy to install packages on the Mac from the terminal, not sure how familiar you are with that, but adding for completeness. Can skip this step if you already have it.
Install conda
for installing python environments/packages:
brew install --cask miniconda
echo 'export PATH="/opt/homebrew/Caskroom/miniconda/base/bin:$PATH"' >> ~/.bash_profile
source ~/.bash_profile
HuggingFace is the main place open source AI models are stored/shared on the web.
First, install the huggingface CLI:
brew install huggingface-cli
Then, create an account at https://huggingface.co, and visit https://huggingface.co/settings/tokens and create an access token (click "Write" to make it be able to do everything for now).
Copy the token and place it into a file in a local file called .env
(this should be kept secret like a password):
HF_TOKEN=your-token-value
Save it there simply to be able to find it later safely.
Next, login to huggingface CLI:
huggingface-cli login
When it asks for these, paste in your token, and press n
for "no git credentials":
Enter your token (input will not be visible):
Add token as git credential? (Y/n) n
Next we'll install everything.
conda create -n wan python=3.10
conda activate wan
conda install -c conda-forge numpy pytorch torchvision torchaudio diffusers imageio
pip install accelerate
Now download git repo:
git clone https://github.com/Wan-Video/Wan2.1.git
cd Wan2.1
pip install -r requirements.txt
Download Wan 2.1 model with huggingface-cli (it is many GB, so it will take a while to download, maybe even over an hour.):
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir ./wan-t2v-1.3b
This will save the model weights and configuration files to ./wan-t2v-1.3b.
Save the python code to generate.py
file (modify it if you'd like):
import sys
import torch
import imageio
from diffusers import DiffusionPipeline
def main():
if len(sys.argv) < 3:
print("Usage: python generate_wan.py '<prompt>' <output_file>")
sys.exit(1)
prompt = sys.argv[1]
output_file = sys.argv[2]
pipe = DiffusionPipeline.from_pretrained(
"./wan-t2v-1.3b",
torch_dtype=torch.float16,
variant="fp16"
).to("mps") # Use "cpu" if MPS fails
print(f"Generating video for prompt: {prompt}")
video = pipe(prompt, num_inference_steps=20).videos[0]
imageio.mimsave(output_file, [frame for frame in video], fps=8)
print(f"Saved video to {output_file}")
if __name__ == "__main__":
main()
Then finally, run the python code:
python generate.py "A dragon flying over a snowy mountain" dragon.mp4
If you run into memory issues, you can:
- Lower
num_inference_steps=10
- Change
.to("mps")
to.to("cpu")