Skip to content

Instantly share code, notes, and snippets.

@chottokun
Last active April 13, 2025 14:06
Show Gist options
  • Save chottokun/b33c20f1476603004e4e48bafab2f4da to your computer and use it in GitHub Desktop.
Save chottokun/b33c20f1476603004e4e48bafab2f4da to your computer and use it in GitHub Desktop.
finetuning_gemma3_gal.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/chottokun/b33c20f1476603004e4e48bafab2f4da/finetuning_gemma3_gal.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"Unslothのコードを改変しています。オリジナルの説明とは動作が異なります。\n",
"\n",
"コードの公開に感謝いたします。"
],
"metadata": {
"id": "DLhRnVkqUCuL"
}
},
{
"cell_type": "markdown",
"metadata": {
"id": "jxpPkxkm8lmy"
},
"source": [
"To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n",
"<div class=\"align-center\">\n",
"<a href=\"https://unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
"<a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord button.png\" width=\"145\"></a>\n",
"<a href=\"https://docs.unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐\n",
"</div>\n",
"\n",
"To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).\n",
"\n",
"You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HFSXqSQQ8lmz"
},
"source": [
"### News"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "kwZ_0JTu8lmz"
},
"source": [
"**Read our [Gemma 3 blog](https://unsloth.ai/blog/gemma3) for what's new in Unsloth and our [Reasoning blog](https://unsloth.ai/blog/r1-reasoning) on how to train reasoning models.**\n",
"\n",
"Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n"
]
},
{
"cell_type": "code",
"source": [
"from google.colab import userdata\n",
"HF_TOKEN = userdata.get('HF_TOKEN')"
],
"metadata": {
"id": "uyWKQDAh-6bG"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "m9zhkYH_8lmz"
},
"source": [
"### Installation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "halU7n858lmz"
},
"outputs": [],
"source": [
"%%capture\n",
"import os\n",
"if \"COLAB_\" not in \"\".join(os.environ.keys()):\n",
" !pip install unsloth vllm\n",
"else:\n",
" # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]\n",
" !pip install --no-deps unsloth vllm\n",
"# Install latest Hugging Face for Gemma-3!\n",
"!pip install --no-deps git+https://github.com/huggingface/[email protected]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "QQoubKva8lm0"
},
"outputs": [],
"source": [
"#@title Colab Extra Install { display-mode: \"form\" }\n",
"%%capture\n",
"import os\n",
"if \"COLAB_\" not in \"\".join(os.environ.keys()):\n",
" !pip install unsloth vllm\n",
"else:\n",
" !pip install --no-deps unsloth vllm\n",
" # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]\n",
" # Skip restarting message in Colab\n",
" import sys, re, requests; modules = list(sys.modules.keys())\n",
" for x in modules: sys.modules.pop(x) if \"PIL\" in x or \"google\" in x else None\n",
" !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft \"trl==0.15.2\" triton cut_cross_entropy unsloth_zoo\n",
" !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer\n",
"\n",
" # vLLM requirements - vLLM breaks Colab due to reinstalling numpy\n",
" f = requests.get(\"https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt\").content\n",
" with open(\"vllm_requirements.txt\", \"wb\") as file:\n",
" file.write(re.sub(rb\"(transformers|numpy|xformers)[^\\n]{1,}\\n\", b\"\", f))\n",
" !pip install -r vllm_requirements.txt"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TGMWlrRdzwgf"
},
"source": [
"### Unsloth\n",
"\n",
"`FastModel` supports loading nearly any model now! This includes Vision and Text models!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-Xbb0cuLzwgf"
},
"outputs": [],
"source": [
"from unsloth import FastModel\n",
"import torch\n",
"\n",
"fourbit_models = [\n",
" # 4bit dynamic quants for superior accuracy and low memory use\n",
" \"unsloth/gemma-3-1b-it-unsloth-bnb-4bit\",\n",
" \"unsloth/gemma-3-4b-it-unsloth-bnb-4bit\",\n",
" \"unsloth/gemma-3-12b-it-unsloth-bnb-4bit\",\n",
" \"unsloth/gemma-3-27b-it-unsloth-bnb-4bit\",\n",
"\n",
" # Other popular models!\n",
" \"unsloth/Llama-3.1-8B\",\n",
" \"unsloth/Llama-3.2-3B\",\n",
" \"unsloth/Llama-3.3-70B\",\n",
" \"unsloth/mistral-7b-instruct-v0.3\",\n",
" \"unsloth/Phi-4\",\n",
"] # More models at https://huggingface.co/unsloth\n",
"\n",
"model, tokenizer = FastModel.from_pretrained(\n",
" model_name = \"unsloth/gemma-3-27b-it-unsloth-bnb-4bit\",\n",
" max_seq_length = 2048, # Choose any for long context!\n",
" load_in_4bit = True, # 4 bit quantization to reduce memory\n",
" load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory\n",
" full_finetuning = False, # [NEW!] We have full finetuning now!\n",
" # token = \"hf_...\", # use one if using gated models\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SXd9bTZd1aaL"
},
"source": [
"We now add LoRA adapters so we only need to update a small amount of parameters!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "6bZsfBuZDeCL"
},
"outputs": [],
"source": [
"model = FastModel.get_peft_model(\n",
" model,\n",
" finetune_vision_layers = False, # Turn off for just text!\n",
" finetune_language_layers = True, # Should leave on!\n",
" finetune_attention_modules = True, # Attention good for GRPO\n",
" finetune_mlp_modules = True, # SHould leave on always!\n",
"\n",
" r = 8, # Larger = higher accuracy, but might overfit\n",
" lora_alpha = 8, # Recommended alpha == r at least\n",
" lora_dropout = 0,\n",
" bias = \"none\",\n",
" random_state = 3407,\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vITh0KVJ10qX"
},
"source": [
"<a name=\"Data\"></a>\n",
"### Data Prep\n",
"We now use the `Gemma-3` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-3 renders multi turn conversations like below:\n",
"\n",
"```\n",
"<bos><start_of_turn>user\n",
"Hello!<end_of_turn>\n",
"<start_of_turn>model\n",
"Hey there!<end_of_turn>\n",
"```\n",
"\n",
"We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3` and more."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LjY75GoYUCB8"
},
"outputs": [],
"source": [
"from unsloth.chat_templates import get_chat_template\n",
"tokenizer = get_chat_template(\n",
" tokenizer,\n",
" chat_template = \"gemma-3\",\n",
")"
]
},
{
"cell_type": "code",
"source": [
"from unsloth.chat_templates import get_chat_template\n",
"from datasets import load_dataset # Add this line\n",
"import datasets\n",
"\n",
"\n",
"# データセットの読み込み (split=\"train\" を指定)\n",
"dataset = load_dataset('Chottokun/databricks-dolly-15k-ja-gal', split=\"train\")\n",
"\n",
"# conversations形式に変換\n",
"def format_conversations(example):\n",
" return {\n",
" \"conversations\": [\n",
" {\"role\": \"user\", \"content\": example[\"instruction\"]},\n",
" {\"role\": \"assistant\", \"content\": example[\"output\"]},\n",
" ]\n",
" }\n",
"\n",
"# 全ての訓練データを conversations 形式に変換\n",
"dataset = dataset.map(format_conversations)\n",
"\n",
"# DatasetDictに格納 (訓練データのみ)\n",
"dataset = datasets.DatasetDict({\n",
" 'train': dataset # 元の dataset は既に訓練データのみ\n",
"})\n",
"\n",
"# chat_templateの適用\n",
"tokenizer = get_chat_template(tokenizer, chat_template=\"gemma-3\")\n",
"\n",
"def apply_chat_template(examples):\n",
" texts = tokenizer.apply_chat_template(examples[\"conversations\"])\n",
" return {\"text\": texts}\n",
"\n",
"dataset = dataset.map(apply_chat_template, batched=True)"
],
"metadata": {
"id": "baOrOPssBJrf"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"dataset['train'].to_pandas().head()"
],
"metadata": {
"id": "sPAzJW_oBSe8"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Mkq4RvEq7FQr"
},
"outputs": [],
"source": [
"# from datasets import load_dataset\n",
"# dataset = load_dataset(\"mlabonne/FineTome-100k\", split = \"train\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "K9CBpiISFa6C"
},
"source": [
"We now use `standardize_data_formats` to try converting datasets to the correct format for finetuning purposes!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "reoBXmAn7HlN"
},
"outputs": [],
"source": [
"# from unsloth.chat_templates import standardize_data_formats\n",
"# dataset = standardize_data_formats(pirate_dataset)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6i5Sx9In7vHi"
},
"source": [
"Let's see how row 100 looks like!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "dzE1OEXi7s3P"
},
"outputs": [],
"source": [
"dataset['train'][100]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8Xs0LXio7rfd"
},
"source": [
"We now have to apply the chat template for `Gemma-3` onto the conversations, and save it to `text`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1ahE8Ys37JDJ"
},
"outputs": [],
"source": [
"def apply_chat_template(examples):\n",
" texts = tokenizer.apply_chat_template(examples[\"conversations\"])\n",
" return { \"text\" : texts }\n",
"pass\n",
"dataset = dataset.map(apply_chat_template, batched = True)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ndDUB23CGAC5"
},
"source": [
"Let's see how the chat template did! Notice `Gemma-3` default adds a `<bos>`!"
]
},
{
"cell_type": "code",
"source": [
"dataset['train'][100][\"text\"]"
],
"metadata": {
"id": "FyNcnqdyC_ho"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "idAEIeSQ3xdS"
},
"source": [
"<a name=\"Train\"></a>\n",
"### Train the model\n",
"Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "95_Nn-89DhsL"
},
"outputs": [],
"source": [
"from trl import SFTTrainer, SFTConfig\n",
"trainer = SFTTrainer(\n",
" model = model,\n",
" tokenizer = tokenizer,\n",
" train_dataset = dataset['train'],\n",
" eval_dataset = None, # Can set up evaluation!\n",
" args = SFTConfig(\n",
" dataset_text_field = \"text\",\n",
" per_device_train_batch_size = 2,\n",
" gradient_accumulation_steps = 4, # Use GA to mimic batch size!\n",
" warmup_steps = 5,\n",
" # num_train_epochs = 1, # Set this for 1 full training run.\n",
" max_steps = 30,\n",
" learning_rate = 2e-4, # Reduce to 2e-5 for long training runs\n",
" logging_steps = 1,\n",
" optim = \"adamw_8bit\",\n",
" weight_decay = 0.01,\n",
" lr_scheduler_type = \"linear\",\n",
" seed = 3407,\n",
" report_to = \"none\", # Use this for WandB etc\n",
" ),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "C_sGp5XlG6dq"
},
"source": [
"We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "juQiExuBG5Bt"
},
"outputs": [],
"source": [
"from unsloth.chat_templates import train_on_responses_only\n",
"trainer = train_on_responses_only(\n",
" trainer,\n",
" instruction_part = \"<start_of_turn>user\\n\",\n",
" response_part = \"<start_of_turn>model\\n\",\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Dv1NBUozV78l"
},
"source": [
"Let's verify masking the instruction part is done! Let's print the 100th row again:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LtsMVtlkUhja"
},
"outputs": [],
"source": [
"tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4Kyjy__m9KY3"
},
"source": [
"Now let's print the masked out example - you should see only the answer is present:"
]
},
{
"cell_type": "code",
"source": [
"# from unsloth.chat_templates import train_on_responses_only\n",
"\n",
"# # Modify the dataset to include \"labels\" key.\n",
"# # The labels are the same as the input_ids for text generation\n",
"# def add_labels_to_dataset(dataset):\n",
"# def map_function(example):\n",
"# example['labels'] = example['input_ids']\n",
"# return example\n",
"# return dataset.map(map_function)\n",
"\n",
"# # Apply train_on_responses_only as before\n",
"# trainer = train_on_responses_only(\n",
"# trainer,\n",
"# instruction_part=\"<start_of_turn>user\\n\",\n",
"# response_part=\"<start_of_turn>model\\n\",\n",
"# )\n",
"\n",
"# # Now, add the \"labels\" key to the dataset\n",
"# trainer.train_dataset = add_labels_to_dataset(trainer.train_dataset)"
],
"metadata": {
"id": "RgVs-iVsDZM4"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_rD6fl8EUxnG"
},
"outputs": [],
"source": [
"# tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"cellView": "form",
"id": "2ejIt2xSNKKp"
},
"outputs": [],
"source": [
"# @title Show current memory stats\n",
"gpu_stats = torch.cuda.get_device_properties(0)\n",
"start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n",
"max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n",
"print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n",
"print(f\"{start_gpu_memory} GB of memory reserved.\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CNP1Uidk9mrz"
},
"source": [
"Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "yqxqAZ7KJ4oL"
},
"outputs": [],
"source": [
"trainer_stats = trainer.train()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pCqnaKmlO1U9"
},
"outputs": [],
"source": [
"# @title Show final memory and time stats\n",
"used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n",
"used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n",
"used_percentage = round(used_memory / max_memory * 100, 3)\n",
"lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n",
"print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n",
"print(\n",
" f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n",
")\n",
"print(f\"Peak reserved memory = {used_memory} GB.\")\n",
"print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n",
"print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n",
"print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ekOmTR1hSNcr"
},
"source": [
"<a name=\"Inference\"></a>\n",
"### Inference\n",
"Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "kR3gIAX-SM2q"
},
"outputs": [],
"source": [
"from unsloth.chat_templates import get_chat_template\n",
"tokenizer = get_chat_template(\n",
" tokenizer,\n",
" chat_template = \"gemma-3\",\n",
")\n",
"messages = [{\n",
" \"role\": \"user\",\n",
" \"content\": [{\n",
" \"type\" : \"text\",\n",
" \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n",
" }]\n",
"}]\n",
"text = tokenizer.apply_chat_template(\n",
" messages,\n",
" add_generation_prompt = True, # Must add for generation\n",
")\n",
"outputs = model.generate(\n",
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n",
" max_new_tokens = 64, # Increase for longer outputs!\n",
" # Recommended Gemma-3 settings!\n",
" temperature = 1.0, top_p = 0.95, top_k = 64,\n",
")\n",
"tokenizer.batch_decode(outputs)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CrSvZObor0lY"
},
"source": [
" You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "e2pEuRb1r2Vg"
},
"outputs": [],
"source": [
"messages = [{\n",
" \"role\": \"user\",\n",
" \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n",
"}]\n",
"text = tokenizer.apply_chat_template(\n",
" messages,\n",
" add_generation_prompt = True, # Must add for generation\n",
")\n",
"\n",
"from transformers import TextStreamer\n",
"_ = model.generate(\n",
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n",
" max_new_tokens = 64, # Increase for longer outputs!\n",
" # Recommended Gemma-3 settings!\n",
" temperature = 1.0, top_p = 0.95, top_k = 64,\n",
" streamer = TextStreamer(tokenizer, skip_prompt = True),\n",
")"
]
},
{
"cell_type": "code",
"source": [
"messages = [{\n",
" \"role\": \"user\",\n",
" \"content\": [{\"type\" : \"text\", \"text\" : \"まどか☆マギカで一番かわいいのは?\",}]\n",
"}]\n",
"text = tokenizer.apply_chat_template(\n",
" messages,\n",
" add_generation_prompt = True, # Must add for generation\n",
")\n",
"\n",
"from transformers import TextStreamer\n",
"_ = model.generate(\n",
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n",
" max_new_tokens = 64, # Increase for longer outputs!\n",
" # Recommended Gemma-3 settings!\n",
" temperature = 1.0, top_p = 0.95, top_k = 64,\n",
" streamer = TextStreamer(tokenizer, skip_prompt = True),\n",
")"
],
"metadata": {
"id": "PzoER0Y5M6A6"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "uMuVrWbjAzhc"
},
"source": [
"<a name=\"Save\"></a>\n",
"### Saving, loading finetuned models\n",
"To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.\n",
"\n",
"**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!"
]
},
{
"cell_type": "code",
"source": [
"hub_model_id = \"Chottokun/gemma-3-gal\""
],
"metadata": {
"id": "p53tkNAS-btH"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "upcOlWe7A1vc"
},
"outputs": [],
"source": [
"model.save_pretrained(\"gemma-3\") # Local saving\n",
"tokenizer.save_pretrained(\"gemma-3\")\n",
"model.push_to_hub(hub_model_id, token = HF_TOKEN) # Online saving\n",
"tokenizer.push_to_hub(hub_model_id, token = HF_TOKEN) # Online saving"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "AEEcJ4qfC7Lp"
},
"source": [
"Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "MKX_XKs_BNZR"
},
"outputs": [],
"source": [
"if False:\n",
" from unsloth import FastModel\n",
" model, tokenizer = FastModel.from_pretrained(\n",
" model_name = \"lora_model\", # YOUR MODEL YOU USED FOR TRAINING\n",
" max_seq_length = 2048,\n",
" load_in_4bit = True,\n",
" )\n",
"\n",
"messages = [{\n",
" \"role\": \"user\",\n",
" \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-3?\",}]\n",
"}]\n",
"text = tokenizer.apply_chat_template(\n",
" messages,\n",
" add_generation_prompt = True, # Must add for generation\n",
")\n",
"\n",
"from transformers import TextStreamer\n",
"_ = model.generate(\n",
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n",
" max_new_tokens = 64, # Increase for longer outputs!\n",
" # Recommended Gemma-3 settings!\n",
" temperature = 1.0, top_p = 0.95, top_k = 64,\n",
" streamer = TextStreamer(tokenizer, skip_prompt = True),\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "f422JgM9sdVT"
},
"source": [
"### Saving to float16 for VLLM\n",
"\n",
"We also support saving to `float16` directly for deployment! We save it in the folder `gemma-3-finetune`. Set `if False` to `if True` to let it run!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iHjt_SMYsd3P"
},
"outputs": [],
"source": [
"if False: # Change to True to save finetune!\n",
" model.save_pretrained_merged(\"gemma-3-finetune\", tokenizer)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "z6O48DbNIAr0"
},
"source": [
"If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZV-CiKPrIFG0"
},
"outputs": [],
"source": [
"if False: # Change to True to upload finetune\n",
" model.push_to_hub_merged(\n",
" \"HF_ACCOUNT/gemma-3-finetune\", tokenizer,\n",
" token = \"hf_...\"\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TCv4vXHd61i7"
},
"source": [
"### GGUF / llama.cpp Conversion\n",
"To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FqfebeAdT073"
},
"outputs": [],
"source": [
"if False: # Change to True to save to GGUF\n",
" model.save_pretrained_gguf(\n",
" \"gemma-3-finetune\",\n",
" quantization_type = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Q974YEVPI7JS"
},
"source": [
"Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZgcJIhJ0I_es"
},
"outputs": [],
"source": [
"if False: # Change to True to upload GGUF\n",
" model.push_to_hub_gguf(\n",
" \"gemma-3-finetune\",\n",
" quantization_type = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n",
" repo_id = \"HF_ACCOUNT/gemma-finetune-gguf\",\n",
" token = \"hf_...\",\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "SMI_OTs_8lm6"
},
"source": [
"Now, use the `gemma-3-finetune.gguf` file or `gemma-3-finetune-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)\n",
"\n",
"And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n",
"\n",
"Some other links:\n",
"1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n",
"2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n",
"3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n",
"6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!\n",
"\n",
"<div class=\"align-center\">\n",
" <a href=\"https://unsloth.ai\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n",
" <a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord.png\" width=\"145\"></a>\n",
" <a href=\"https://docs.unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a>\n",
"\n",
" Join Discord if you need help + ⭐️ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐️\n",
"</div>\n"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "A100",
"provenance": [],
"machine_shape": "hm",
"include_colab_link": true
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment