Skip to content

Instantly share code, notes, and snippets.

@antarr
Created July 2, 2025 16:38
Show Gist options
  • Save antarr/6859e5d1ffd04b0745324e025facedfe to your computer and use it in GitHub Desktop.
Save antarr/6859e5d1ffd04b0745324e025facedfe to your computer and use it in GitHub Desktop.
# Llama 3 System Requirements Tables

Llama 3 System Requirements Tables

Llama 3.3 Requirements

Variant Name VRAM Requirement Recommended Configuration Best Use Case
70b 43GB Mac Studio (M2 Ultra 128GB) General-purpose inference
70b-instruct-fp16 141GB Mac Studio Cluster (2x M2 Ultra 192GB) High-precision fine-tuning and training
70b-instruct-q2_K 26GB Mac Studio (M1/M2 Ultra 64GB) Lightweight inference with reduced precision
70b-instruct-q3KM 34GB Mac Studio (M1/M2 Ultra 64GB) Balanced performance and efficiency
70b-instruct-q3KS 31GB Mac Studio (M1/M2 Ultra 64GB) Lower memory, faster inference tasks
70b-instruct-q4_0 40GB Mac Studio (M1/M2 Ultra 64GB) High-speed, mid-precision inference
70b-instruct-q4_1 44GB Mac Studio (M2 Ultra 128GB) Precision-critical inference tasks
70b-instruct-q4KM 43GB Mac Studio (M2 Ultra 128GB) Optimized for larger models with precision
70b-instruct-q4KS 40GB Mac Studio (M1/M2 Ultra 64GB) Standard performance inference tasks
70b-instruct-q5_0 49GB Mac Studio (M2 Ultra 128GB) High-efficiency inference tasks
70b-instruct-q5_1 53GB Mac Studio (M2 Ultra 128GB) Complex inference and light training
70b-instruct-q5KM 50GB Mac Studio (M2 Ultra 128GB) Memory-intensive inference tasks
70b-instruct-q6_K 58GB Mac Studio (M2 Ultra 128GB) Large-scale precision and training
70b-instruct-q8_0 75GB Mac Studio (M2 Ultra 128GB) Heavy-duty inference and fine-tuning

Llama 3.2 Requirements

Variant Name VRAM Requirement Recommended Configuration Best Use Case
1b 1.3GB Any M-series (8GB+) Lightweight inference tasks
3b 2.0GB Any M-series (8GB+) General-purpose inference
1b-instruct-fp16 2.5GB Any M-series (8GB+) Fine-tuning and precision-critical tasks
1b-instruct-q2_K 581MB Any M-series (8GB+) Reduced precision, memory-efficient inference
1b-instruct-q3KL 733MB Any M-series (8GB+) Efficient inference with balanced precision
1b-instruct-q3KM 691MB Any M-series (8GB+) Smaller, balanced precision tasks
1b-instruct-q3KS 642MB Any M-series (8GB+) Lower memory, lightweight inference
1b-instruct-q4_0 771MB Any M-series (8GB+) Mid-precision inference tasks
1b-instruct-q4_1 832MB Any M-series (8GB+) Precision-critical small models
1b-instruct-q4KM 808MB Any M-series (8GB+) Balanced, memory-optimized tasks
1b-instruct-q4KS 776MB Any M-series (8GB+) Lightweight inference with precision
1b-instruct-q5_0 893MB Any M-series (8GB+) Higher-efficiency inference tasks
1b-instruct-q5_1 953MB Any M-series (8GB+) Small models with complex inference
1b-instruct-q5KM 912MB Any M-series (8GB+) Memory-optimized, efficient inference
1b-instruct-q5KS 893MB Any M-series (8GB+) Low memory, efficient inference
1b-instruct-q6_K 1.0GB Any M-series (8GB+) Medium memory, balanced inference
1b-instruct-q8_0 1.3GB Any M-series (8GB+) Standard inference for small models
3b-instruct-fp16 6.4GB Any M-series (8GB+) Fine-tuning and precision-critical tasks
3b-instruct-q2_K 1.4GB Any M-series (8GB+) Reduced precision, lightweight inference
3b-instruct-q3KL 1.8GB Any M-series (8GB+) Balanced precision inference tasks
3b-instruct-q3KM 1.7GB Any M-series (8GB+) Efficient, memory-optimized inference
3b-instruct-q3KS 1.5GB Any M-series (8GB+) Lightweight, small batch inference
3b-instruct-q4_0 1.9GB Any M-series (8GB+) Mid-precision general inference
3b-instruct-q4_1 2.1GB Any M-series (8GB+) Higher precision, small tasks
3b-instruct-q4KM 2.0GB Any M-series (8GB+) Memory-optimized small models
3b-instruct-q4KS 1.9GB Any M-series (8GB+) Mid-memory general inference
3b-instruct-q5_0 2.3GB Any M-series (8GB+) High-efficiency inference tasks
3b-instruct-q5_1 2.4GB Any M-series (8GB+) Fine-tuned, higher complexity tasks
3b-instruct-q5KM 2.3GB Any M-series (8GB+) Efficient inference with optimization
3b-instruct-q5KS 2.3GB Any M-series (8GB+) High efficiency, balanced memory tasks
3b-instruct-q6_K 2.6GB Any M-series (8GB+) Balanced precision for small tasks
3b-instruct-q8_0 3.4GB Any M-series (8GB+) High-memory inference and tasks

Llama 3.1 Requirements

Variant Name VRAM Requirement Recommended Configuration Best Use Case
8b 4.9GB Any M-series (8GB+) General-purpose inference
70b 43GB Mac Studio (M2 Ultra 128GB) Large-scale inference
405b 243GB Mac Studio Cluster (4x M2 Ultra 192GB) Large-scale model training
405b-instruct-fp16 812GB Mac Studio Cluster (11x M2 Ultra 192GB) Precision-critical, fine-tuning tasks
405b-instruct-q2_K 149GB Mac Studio Cluster (2x M2 Ultra 192GB) Memory-optimized inference
405b-instruct-q3KL 213GB Mac Studio Cluster (3x M2 Ultra 192GB) Balanced precision for large-scale tasks
405b-instruct-q3KM 195GB Mac Studio Cluster (3x M2 Ultra 192GB) High-efficiency large-scale inference
405b-instruct-q3KS 175GB Mac Studio Cluster (2x M2 Ultra 192GB) Efficient inference with lower precision
405b-instruct-q4_0 229GB Mac Studio Cluster (3x M2 Ultra 192GB) Mid-precision for large models
405b-instruct-q4_1 254GB Mac Studio Cluster (4x M2 Ultra 192GB) High-precision inference
405b-instruct-q4KM 243GB Mac Studio Cluster (4x M2 Ultra 192GB) Optimized precision for large models
405b-instruct-q4KS 231GB Mac Studio Cluster (3x M2 Ultra 192GB) Balanced memory with precision inference
405b-instruct-q5_0 279GB Mac Studio Cluster (4x M2 Ultra 192GB) High-efficiency large-scale tasks
405b-instruct-q5_1 305GB Mac Studio Cluster (4x M2 Ultra 192GB) Complex inference and fine-tuning
405b-instruct-q5KM 287GB Mac Studio Cluster (4x M2 Ultra 192GB) Memory-intensive training and inference
405b-instruct-q5KS 279GB Mac Studio Cluster (4x M2 Ultra 192GB) Efficient training with lower memory
405b-instruct-q6_K 333GB Mac Studio Cluster (5x M2 Ultra 192GB) High-performance training for large models
405b-instruct-q8_0 431GB Mac Studio Cluster (6x M2 Ultra 192GB) Heavy-duty, precision-critical training
70b-instruct-fp16 141GB Mac Studio Cluster (2x M2 Ultra 192GB) Fine-tuning and high-precision inference
70b-instruct-q2_K 26GB Mac Studio (M1/M2 Ultra 64GB) Lightweight inference
70b-instruct-q3KL 37GB Mac Studio (M1/M2 Ultra 64GB) Balanced precision inference
70b-instruct-q3KM 34GB Mac Studio (M1/M2 Ultra 64GB) Efficient inference with memory savings
70b-instruct-q3KS 31GB Mac Studio (M1/M2 Ultra 64GB) Lightweight, low-memory inference
70b-instruct-q4_0 40GB Mac Studio (M1/M2 Ultra 64GB) Mid-precision general inference
70b-instruct-q4KM 43GB Mac Studio (M2 Ultra 128GB) Precision-critical large models
70b-instruct-q4KS 40GB Mac Studio (M1/M2 Ultra 64GB) Memory-optimized mid-scale inference
70b-instruct-q5_0 49GB Mac Studio (M2 Ultra 128GB) Efficient high-memory tasks
70b-instruct-q5_1 53GB Mac Studio (M2 Ultra 128GB) Complex inference tasks
70b-instruct-q5KM 50GB Mac Studio (M2 Ultra 128GB) Memory-efficient inference
70b-instruct-q5KS 49GB Mac Studio (M2 Ultra 128GB) Efficient, large-scale inference
70b-instruct-q6_K 58GB Mac Studio (M2 Ultra 128GB) High-efficiency precision tasks
70b-instruct-q8_0 75GB Mac Studio (M2 Ultra 128GB) Heavy-duty, large-scale inference
8b-instruct-fp16 16GB M-series Pro (16GB+) Fine-tuning tasks
8b-instruct-q2_K 3.2GB Any M-series (8GB+) Lightweight precision tasks
8b-instruct-q3KL 4.3GB Any M-series (8GB+) Balanced precision and memory tasks
8b-instruct-q3KM 4.0GB Any M-series (8GB+) Efficient small-scale inference
8b-instruct-q3KS 3.7GB Any M-series (8GB+) Lightweight low-memory inference
8b-instruct-q4_0 4.7GB Any M-series (8GB+) Mid-scale inference
8b-instruct-q4_1 5.1GB Any M-series (8GB+) Precision-critical small models
8b-instruct-q4KM 4.9GB Any M-series (8GB+) Balanced memory with precision inference
8b-instruct-q4KS 4.7GB Any M-series (8GB+) Mid-precision small-scale inference
8b-instruct-q5_0 5.6GB Any M-series (8GB+) Efficient mid-scale inference tasks
8b-instruct-q5_1 6.1GB Any M-series (8GB+) Complex, small-scale inference
8b-instruct-q6_K 6.6GB Any M-series (8GB+) Balanced precision and memory tasks
8b-instruct-q8_0 8.5GB M-series (16GB+) Large-scale, memory-intensive inference

Llama 3 Requirements

Variant Name VRAM Requirement Recommended Configuration Best Use Case
8b 4.7GB Any M-series (8GB+) General-purpose inference
70b 40GB Mac Studio (M1/M2 Ultra 64GB) Large-scale inference
70b-instruct 40GB Mac Studio (M1/M2 Ultra 64GB) Instruction-tuned inference tasks
70b-instruct-fp16 141GB Mac Studio Cluster (2x M2 Ultra 192GB) Precision-critical, fine-tuning tasks
70b-instruct-q2_K 26GB Mac Studio (M1/M2 Ultra 64GB) Lightweight inference
70b-instruct-q3KL 37GB Mac Studio (M1/M2 Ultra 64GB) Balanced precision inference
70b-instruct-q3KM 34GB Mac Studio (M1/M2 Ultra 64GB) Efficient inference with memory savings
70b-instruct-q3KS 31GB Mac Studio (M1/M2 Ultra 64GB) Lightweight, low-memory inference
70b-instruct-q4_0 40GB Mac Studio (M1/M2 Ultra 64GB) Mid-precision general inference
70b-instruct-q4_1 44GB Mac Studio (M2 Ultra 128GB) High-precision inference tasks
70b-instruct-q4KM 43GB Mac Studio (M2 Ultra 128GB) Optimized for larger models with precision
70b-instruct-q4KS 40GB Mac Studio (M1/M2 Ultra 64GB) Memory-optimized mid-scale inference
70b-instruct-q5_0 49GB Mac Studio (M2 Ultra 128GB) High-efficiency inference tasks
70b-instruct-q5_1 53GB Mac Studio (M2 Ultra 128GB) Complex inference tasks
70b-instruct-q5KM 50GB Mac Studio (M2 Ultra 128GB) Memory-efficient inference
70b-instruct-q5KS 49GB Mac Studio (M2 Ultra 128GB) Efficient, large-scale inference
70b-instruct-q6_K 58GB Mac Studio (M2 Ultra 128GB) High-efficiency precision tasks
70b-instruct-q8_0 75GB Mac Studio (M2 Ultra 128GB) Heavy-duty, large-scale inference
8b-instruct-fp16 16GB M-series Pro (16GB+) Fine-tuning tasks
8b-instruct-q2_K 3.2GB Any M-series (8GB+) Lightweight precision tasks
8b-instruct-q3KL 4.3GB Any M-series (8GB+) Balanced precision and memory tasks
8b-instruct-q3KM 4.0GB Any M-series (8GB+) Efficient small-scale inference
8b-instruct-q3KS 3.7GB Any M-series (8GB+) Lightweight low-memory inference
8b-instruct-q4_0 4.7GB Any M-series (8GB+) Mid-scale inference
8b-instruct-q4_1 5.1GB Any M-series (8GB+) Precision-critical small models
8b-instruct-q4KM 4.9GB Any M-series (8GB+) Balanced memory with precision inference
8b-instruct-q4KS 4.7GB Any M-series (8GB+) Mid-precision small-scale inference
8b-instruct-q5_0 5.6GB Any M-series (8GB+) Efficient mid-scale inference tasks
8b-instruct-q5_1 6.1GB Any M-series (8GB+) Complex, small-scale inference
8b-instruct-q6_K 6.6GB Any M-series (8GB+) Balanced precision and memory tasks
8b-instruct-q8_0 8.5GB M-series (16GB+) Large-scale, memory-intensive inference
70b-text 40GB Mac Studio (M1/M2 Ultra 64GB) Text-specific large-scale inference
70b-text-fp16 141GB Mac Studio Cluster (2x M2 Ultra 192GB) Text fine-tuning with high precision
70b-text-q2_K 26GB Mac Studio (M1/M2 Ultra 64GB) Text inference with reduced precision
70b-text-q3KL 37GB Mac Studio (M1/M2 Ultra 64GB) Balanced text inference
70b-text-q3KM 34GB Mac Studio (M1/M2 Ultra 64GB) Efficient text inference
70b-text-q3KS 31GB Mac Studio (M1/M2 Ultra 64GB) Lightweight, low-memory text tasks
70b-text-q4_0 40GB Mac Studio (M1/M2 Ultra 64GB) Text inference with mid-precision
70b-text-q4_1 44GB Mac Studio (M2 Ultra 128GB) Precision-critical text tasks
70b-text-q4KM 43GB Mac Studio (M2 Ultra 128GB) Memory-efficient text inference
70b-text-q4KS 40GB Mac Studio (M1/M2 Ultra 64GB) Optimized text inference
70b-text-q5_0 49GB Mac Studio (M2 Ultra 128GB) Efficient text inference
70b-text-q5_1 53GB Mac Studio (M2 Ultra 128GB) Complex text-specific inference tasks
70b-text-q6_K 58GB Mac Studio (M2 Ultra 128GB) High-efficiency text tasks
70b-text-q8_0 75GB Mac Studio (M2 Ultra 128GB) Heavy-duty, precision text inference
8b-text 4.7GB Any M-series (8GB+) Text-specific general-purpose inference
instruct 4.7GB Any M-series (8GB+) General-purpose instruction tuning
text 4.7GB Any M-series (8GB+) General-purpose text tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment