Last active
April 13, 2025 14:06
-
-
Save chottokun/b33c20f1476603004e4e48bafab2f4da to your computer and use it in GitHub Desktop.
finetuning_gemma3_gal.ipynb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/chottokun/b33c20f1476603004e4e48bafab2f4da/finetuning_gemma3_gal.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Unslothのコードを改変しています。オリジナルの説明とは動作が異なります。\n", | |
"\n", | |
"コードの公開に感謝いたします。" | |
], | |
"metadata": { | |
"id": "DLhRnVkqUCuL" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "jxpPkxkm8lmy" | |
}, | |
"source": [ | |
"To run this, press \"*Runtime*\" and press \"*Run all*\" on a **free** Tesla T4 Google Colab instance!\n", | |
"<div class=\"align-center\">\n", | |
"<a href=\"https://unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n", | |
"<a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord button.png\" width=\"145\"></a>\n", | |
"<a href=\"https://docs.unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a></a> Join Discord if you need help + ⭐ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐\n", | |
"</div>\n", | |
"\n", | |
"To install Unsloth on your own computer, follow the installation instructions on our Github page [here](https://docs.unsloth.ai/get-started/installing-+-updating).\n", | |
"\n", | |
"You will learn how to do [data prep](#Data), how to [train](#Train), how to [run the model](#Inference), & [how to save it](#Save)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "HFSXqSQQ8lmz" | |
}, | |
"source": [ | |
"### News" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "kwZ_0JTu8lmz" | |
}, | |
"source": [ | |
"**Read our [Gemma 3 blog](https://unsloth.ai/blog/gemma3) for what's new in Unsloth and our [Reasoning blog](https://unsloth.ai/blog/r1-reasoning) on how to train reasoning models.**\n", | |
"\n", | |
"Visit our docs for all our [model uploads](https://docs.unsloth.ai/get-started/all-our-models) and [notebooks](https://docs.unsloth.ai/get-started/unsloth-notebooks).\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from google.colab import userdata\n", | |
"HF_TOKEN = userdata.get('HF_TOKEN')" | |
], | |
"metadata": { | |
"id": "uyWKQDAh-6bG" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "m9zhkYH_8lmz" | |
}, | |
"source": [ | |
"### Installation" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "halU7n858lmz" | |
}, | |
"outputs": [], | |
"source": [ | |
"%%capture\n", | |
"import os\n", | |
"if \"COLAB_\" not in \"\".join(os.environ.keys()):\n", | |
" !pip install unsloth vllm\n", | |
"else:\n", | |
" # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]\n", | |
" !pip install --no-deps unsloth vllm\n", | |
"# Install latest Hugging Face for Gemma-3!\n", | |
"!pip install --no-deps git+https://github.com/huggingface/[email protected]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "QQoubKva8lm0" | |
}, | |
"outputs": [], | |
"source": [ | |
"#@title Colab Extra Install { display-mode: \"form\" }\n", | |
"%%capture\n", | |
"import os\n", | |
"if \"COLAB_\" not in \"\".join(os.environ.keys()):\n", | |
" !pip install unsloth vllm\n", | |
"else:\n", | |
" !pip install --no-deps unsloth vllm\n", | |
" # [NOTE] Do the below ONLY in Colab! Use [[pip install unsloth vllm]]\n", | |
" # Skip restarting message in Colab\n", | |
" import sys, re, requests; modules = list(sys.modules.keys())\n", | |
" for x in modules: sys.modules.pop(x) if \"PIL\" in x or \"google\" in x else None\n", | |
" !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft \"trl==0.15.2\" triton cut_cross_entropy unsloth_zoo\n", | |
" !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer\n", | |
"\n", | |
" # vLLM requirements - vLLM breaks Colab due to reinstalling numpy\n", | |
" f = requests.get(\"https://raw.githubusercontent.com/vllm-project/vllm/refs/heads/main/requirements/common.txt\").content\n", | |
" with open(\"vllm_requirements.txt\", \"wb\") as file:\n", | |
" file.write(re.sub(rb\"(transformers|numpy|xformers)[^\\n]{1,}\\n\", b\"\", f))\n", | |
" !pip install -r vllm_requirements.txt" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "TGMWlrRdzwgf" | |
}, | |
"source": [ | |
"### Unsloth\n", | |
"\n", | |
"`FastModel` supports loading nearly any model now! This includes Vision and Text models!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "-Xbb0cuLzwgf" | |
}, | |
"outputs": [], | |
"source": [ | |
"from unsloth import FastModel\n", | |
"import torch\n", | |
"\n", | |
"fourbit_models = [\n", | |
" # 4bit dynamic quants for superior accuracy and low memory use\n", | |
" \"unsloth/gemma-3-1b-it-unsloth-bnb-4bit\",\n", | |
" \"unsloth/gemma-3-4b-it-unsloth-bnb-4bit\",\n", | |
" \"unsloth/gemma-3-12b-it-unsloth-bnb-4bit\",\n", | |
" \"unsloth/gemma-3-27b-it-unsloth-bnb-4bit\",\n", | |
"\n", | |
" # Other popular models!\n", | |
" \"unsloth/Llama-3.1-8B\",\n", | |
" \"unsloth/Llama-3.2-3B\",\n", | |
" \"unsloth/Llama-3.3-70B\",\n", | |
" \"unsloth/mistral-7b-instruct-v0.3\",\n", | |
" \"unsloth/Phi-4\",\n", | |
"] # More models at https://huggingface.co/unsloth\n", | |
"\n", | |
"model, tokenizer = FastModel.from_pretrained(\n", | |
" model_name = \"unsloth/gemma-3-27b-it-unsloth-bnb-4bit\",\n", | |
" max_seq_length = 2048, # Choose any for long context!\n", | |
" load_in_4bit = True, # 4 bit quantization to reduce memory\n", | |
" load_in_8bit = False, # [NEW!] A bit more accurate, uses 2x memory\n", | |
" full_finetuning = False, # [NEW!] We have full finetuning now!\n", | |
" # token = \"hf_...\", # use one if using gated models\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "SXd9bTZd1aaL" | |
}, | |
"source": [ | |
"We now add LoRA adapters so we only need to update a small amount of parameters!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "6bZsfBuZDeCL" | |
}, | |
"outputs": [], | |
"source": [ | |
"model = FastModel.get_peft_model(\n", | |
" model,\n", | |
" finetune_vision_layers = False, # Turn off for just text!\n", | |
" finetune_language_layers = True, # Should leave on!\n", | |
" finetune_attention_modules = True, # Attention good for GRPO\n", | |
" finetune_mlp_modules = True, # SHould leave on always!\n", | |
"\n", | |
" r = 8, # Larger = higher accuracy, but might overfit\n", | |
" lora_alpha = 8, # Recommended alpha == r at least\n", | |
" lora_dropout = 0,\n", | |
" bias = \"none\",\n", | |
" random_state = 3407,\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "vITh0KVJ10qX" | |
}, | |
"source": [ | |
"<a name=\"Data\"></a>\n", | |
"### Data Prep\n", | |
"We now use the `Gemma-3` format for conversation style finetunes. We use [Maxime Labonne's FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k) dataset in ShareGPT style. Gemma-3 renders multi turn conversations like below:\n", | |
"\n", | |
"```\n", | |
"<bos><start_of_turn>user\n", | |
"Hello!<end_of_turn>\n", | |
"<start_of_turn>model\n", | |
"Hey there!<end_of_turn>\n", | |
"```\n", | |
"\n", | |
"We use our `get_chat_template` function to get the correct chat template. We support `zephyr, chatml, mistral, llama, alpaca, vicuna, vicuna_old, phi3, llama3, phi4, qwen2.5, gemma3` and more." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "LjY75GoYUCB8" | |
}, | |
"outputs": [], | |
"source": [ | |
"from unsloth.chat_templates import get_chat_template\n", | |
"tokenizer = get_chat_template(\n", | |
" tokenizer,\n", | |
" chat_template = \"gemma-3\",\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from unsloth.chat_templates import get_chat_template\n", | |
"from datasets import load_dataset # Add this line\n", | |
"import datasets\n", | |
"\n", | |
"\n", | |
"# データセットの読み込み (split=\"train\" を指定)\n", | |
"dataset = load_dataset('Chottokun/databricks-dolly-15k-ja-gal', split=\"train\")\n", | |
"\n", | |
"# conversations形式に変換\n", | |
"def format_conversations(example):\n", | |
" return {\n", | |
" \"conversations\": [\n", | |
" {\"role\": \"user\", \"content\": example[\"instruction\"]},\n", | |
" {\"role\": \"assistant\", \"content\": example[\"output\"]},\n", | |
" ]\n", | |
" }\n", | |
"\n", | |
"# 全ての訓練データを conversations 形式に変換\n", | |
"dataset = dataset.map(format_conversations)\n", | |
"\n", | |
"# DatasetDictに格納 (訓練データのみ)\n", | |
"dataset = datasets.DatasetDict({\n", | |
" 'train': dataset # 元の dataset は既に訓練データのみ\n", | |
"})\n", | |
"\n", | |
"# chat_templateの適用\n", | |
"tokenizer = get_chat_template(tokenizer, chat_template=\"gemma-3\")\n", | |
"\n", | |
"def apply_chat_template(examples):\n", | |
" texts = tokenizer.apply_chat_template(examples[\"conversations\"])\n", | |
" return {\"text\": texts}\n", | |
"\n", | |
"dataset = dataset.map(apply_chat_template, batched=True)" | |
], | |
"metadata": { | |
"id": "baOrOPssBJrf" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"dataset['train'].to_pandas().head()" | |
], | |
"metadata": { | |
"id": "sPAzJW_oBSe8" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "Mkq4RvEq7FQr" | |
}, | |
"outputs": [], | |
"source": [ | |
"# from datasets import load_dataset\n", | |
"# dataset = load_dataset(\"mlabonne/FineTome-100k\", split = \"train\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "K9CBpiISFa6C" | |
}, | |
"source": [ | |
"We now use `standardize_data_formats` to try converting datasets to the correct format for finetuning purposes!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "reoBXmAn7HlN" | |
}, | |
"outputs": [], | |
"source": [ | |
"# from unsloth.chat_templates import standardize_data_formats\n", | |
"# dataset = standardize_data_formats(pirate_dataset)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "6i5Sx9In7vHi" | |
}, | |
"source": [ | |
"Let's see how row 100 looks like!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "dzE1OEXi7s3P" | |
}, | |
"outputs": [], | |
"source": [ | |
"dataset['train'][100]" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "8Xs0LXio7rfd" | |
}, | |
"source": [ | |
"We now have to apply the chat template for `Gemma-3` onto the conversations, and save it to `text`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "1ahE8Ys37JDJ" | |
}, | |
"outputs": [], | |
"source": [ | |
"def apply_chat_template(examples):\n", | |
" texts = tokenizer.apply_chat_template(examples[\"conversations\"])\n", | |
" return { \"text\" : texts }\n", | |
"pass\n", | |
"dataset = dataset.map(apply_chat_template, batched = True)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "ndDUB23CGAC5" | |
}, | |
"source": [ | |
"Let's see how the chat template did! Notice `Gemma-3` default adds a `<bos>`!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"dataset['train'][100][\"text\"]" | |
], | |
"metadata": { | |
"id": "FyNcnqdyC_ho" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "idAEIeSQ3xdS" | |
}, | |
"source": [ | |
"<a name=\"Train\"></a>\n", | |
"### Train the model\n", | |
"Now let's use Huggingface TRL's `SFTTrainer`! More docs here: [TRL SFT docs](https://huggingface.co/docs/trl/sft_trainer). We do 60 steps to speed things up, but you can set `num_train_epochs=1` for a full run, and turn off `max_steps=None`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "95_Nn-89DhsL" | |
}, | |
"outputs": [], | |
"source": [ | |
"from trl import SFTTrainer, SFTConfig\n", | |
"trainer = SFTTrainer(\n", | |
" model = model,\n", | |
" tokenizer = tokenizer,\n", | |
" train_dataset = dataset['train'],\n", | |
" eval_dataset = None, # Can set up evaluation!\n", | |
" args = SFTConfig(\n", | |
" dataset_text_field = \"text\",\n", | |
" per_device_train_batch_size = 2,\n", | |
" gradient_accumulation_steps = 4, # Use GA to mimic batch size!\n", | |
" warmup_steps = 5,\n", | |
" # num_train_epochs = 1, # Set this for 1 full training run.\n", | |
" max_steps = 30,\n", | |
" learning_rate = 2e-4, # Reduce to 2e-5 for long training runs\n", | |
" logging_steps = 1,\n", | |
" optim = \"adamw_8bit\",\n", | |
" weight_decay = 0.01,\n", | |
" lr_scheduler_type = \"linear\",\n", | |
" seed = 3407,\n", | |
" report_to = \"none\", # Use this for WandB etc\n", | |
" ),\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "C_sGp5XlG6dq" | |
}, | |
"source": [ | |
"We also use Unsloth's `train_on_completions` method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "juQiExuBG5Bt" | |
}, | |
"outputs": [], | |
"source": [ | |
"from unsloth.chat_templates import train_on_responses_only\n", | |
"trainer = train_on_responses_only(\n", | |
" trainer,\n", | |
" instruction_part = \"<start_of_turn>user\\n\",\n", | |
" response_part = \"<start_of_turn>model\\n\",\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Dv1NBUozV78l" | |
}, | |
"source": [ | |
"Let's verify masking the instruction part is done! Let's print the 100th row again:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "LtsMVtlkUhja" | |
}, | |
"outputs": [], | |
"source": [ | |
"tokenizer.decode(trainer.train_dataset[100][\"input_ids\"])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "4Kyjy__m9KY3" | |
}, | |
"source": [ | |
"Now let's print the masked out example - you should see only the answer is present:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# from unsloth.chat_templates import train_on_responses_only\n", | |
"\n", | |
"# # Modify the dataset to include \"labels\" key.\n", | |
"# # The labels are the same as the input_ids for text generation\n", | |
"# def add_labels_to_dataset(dataset):\n", | |
"# def map_function(example):\n", | |
"# example['labels'] = example['input_ids']\n", | |
"# return example\n", | |
"# return dataset.map(map_function)\n", | |
"\n", | |
"# # Apply train_on_responses_only as before\n", | |
"# trainer = train_on_responses_only(\n", | |
"# trainer,\n", | |
"# instruction_part=\"<start_of_turn>user\\n\",\n", | |
"# response_part=\"<start_of_turn>model\\n\",\n", | |
"# )\n", | |
"\n", | |
"# # Now, add the \"labels\" key to the dataset\n", | |
"# trainer.train_dataset = add_labels_to_dataset(trainer.train_dataset)" | |
], | |
"metadata": { | |
"id": "RgVs-iVsDZM4" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "_rD6fl8EUxnG" | |
}, | |
"outputs": [], | |
"source": [ | |
"# tokenizer.decode([tokenizer.pad_token_id if x == -100 else x for x in trainer.train_dataset[100][\"labels\"]]).replace(tokenizer.pad_token, \" \")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"cellView": "form", | |
"id": "2ejIt2xSNKKp" | |
}, | |
"outputs": [], | |
"source": [ | |
"# @title Show current memory stats\n", | |
"gpu_stats = torch.cuda.get_device_properties(0)\n", | |
"start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", | |
"max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)\n", | |
"print(f\"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.\")\n", | |
"print(f\"{start_gpu_memory} GB of memory reserved.\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "CNP1Uidk9mrz" | |
}, | |
"source": [ | |
"Let's train the model! To resume a training run, set `trainer.train(resume_from_checkpoint = True)`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "yqxqAZ7KJ4oL" | |
}, | |
"outputs": [], | |
"source": [ | |
"trainer_stats = trainer.train()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "pCqnaKmlO1U9" | |
}, | |
"outputs": [], | |
"source": [ | |
"# @title Show final memory and time stats\n", | |
"used_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)\n", | |
"used_memory_for_lora = round(used_memory - start_gpu_memory, 3)\n", | |
"used_percentage = round(used_memory / max_memory * 100, 3)\n", | |
"lora_percentage = round(used_memory_for_lora / max_memory * 100, 3)\n", | |
"print(f\"{trainer_stats.metrics['train_runtime']} seconds used for training.\")\n", | |
"print(\n", | |
" f\"{round(trainer_stats.metrics['train_runtime']/60, 2)} minutes used for training.\"\n", | |
")\n", | |
"print(f\"Peak reserved memory = {used_memory} GB.\")\n", | |
"print(f\"Peak reserved memory for training = {used_memory_for_lora} GB.\")\n", | |
"print(f\"Peak reserved memory % of max memory = {used_percentage} %.\")\n", | |
"print(f\"Peak reserved memory for training % of max memory = {lora_percentage} %.\")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "ekOmTR1hSNcr" | |
}, | |
"source": [ | |
"<a name=\"Inference\"></a>\n", | |
"### Inference\n", | |
"Let's run the model via Unsloth native inference! According to the `Gemma-3` team, the recommended settings for inference are `temperature = 1.0, top_p = 0.95, top_k = 64`" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "kR3gIAX-SM2q" | |
}, | |
"outputs": [], | |
"source": [ | |
"from unsloth.chat_templates import get_chat_template\n", | |
"tokenizer = get_chat_template(\n", | |
" tokenizer,\n", | |
" chat_template = \"gemma-3\",\n", | |
")\n", | |
"messages = [{\n", | |
" \"role\": \"user\",\n", | |
" \"content\": [{\n", | |
" \"type\" : \"text\",\n", | |
" \"text\" : \"Continue the sequence: 1, 1, 2, 3, 5, 8,\",\n", | |
" }]\n", | |
"}]\n", | |
"text = tokenizer.apply_chat_template(\n", | |
" messages,\n", | |
" add_generation_prompt = True, # Must add for generation\n", | |
")\n", | |
"outputs = model.generate(\n", | |
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n", | |
" max_new_tokens = 64, # Increase for longer outputs!\n", | |
" # Recommended Gemma-3 settings!\n", | |
" temperature = 1.0, top_p = 0.95, top_k = 64,\n", | |
")\n", | |
"tokenizer.batch_decode(outputs)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "CrSvZObor0lY" | |
}, | |
"source": [ | |
" You can also use a `TextStreamer` for continuous inference - so you can see the generation token by token, instead of waiting the whole time!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "e2pEuRb1r2Vg" | |
}, | |
"outputs": [], | |
"source": [ | |
"messages = [{\n", | |
" \"role\": \"user\",\n", | |
" \"content\": [{\"type\" : \"text\", \"text\" : \"Why is the sky blue?\",}]\n", | |
"}]\n", | |
"text = tokenizer.apply_chat_template(\n", | |
" messages,\n", | |
" add_generation_prompt = True, # Must add for generation\n", | |
")\n", | |
"\n", | |
"from transformers import TextStreamer\n", | |
"_ = model.generate(\n", | |
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n", | |
" max_new_tokens = 64, # Increase for longer outputs!\n", | |
" # Recommended Gemma-3 settings!\n", | |
" temperature = 1.0, top_p = 0.95, top_k = 64,\n", | |
" streamer = TextStreamer(tokenizer, skip_prompt = True),\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"messages = [{\n", | |
" \"role\": \"user\",\n", | |
" \"content\": [{\"type\" : \"text\", \"text\" : \"まどか☆マギカで一番かわいいのは?\",}]\n", | |
"}]\n", | |
"text = tokenizer.apply_chat_template(\n", | |
" messages,\n", | |
" add_generation_prompt = True, # Must add for generation\n", | |
")\n", | |
"\n", | |
"from transformers import TextStreamer\n", | |
"_ = model.generate(\n", | |
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n", | |
" max_new_tokens = 64, # Increase for longer outputs!\n", | |
" # Recommended Gemma-3 settings!\n", | |
" temperature = 1.0, top_p = 0.95, top_k = 64,\n", | |
" streamer = TextStreamer(tokenizer, skip_prompt = True),\n", | |
")" | |
], | |
"metadata": { | |
"id": "PzoER0Y5M6A6" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "uMuVrWbjAzhc" | |
}, | |
"source": [ | |
"<a name=\"Save\"></a>\n", | |
"### Saving, loading finetuned models\n", | |
"To save the final model as LoRA adapters, either use Huggingface's `push_to_hub` for an online save or `save_pretrained` for a local save.\n", | |
"\n", | |
"**[NOTE]** This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"hub_model_id = \"Chottokun/gemma-3-gal\"" | |
], | |
"metadata": { | |
"id": "p53tkNAS-btH" | |
}, | |
"execution_count": null, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "upcOlWe7A1vc" | |
}, | |
"outputs": [], | |
"source": [ | |
"model.save_pretrained(\"gemma-3\") # Local saving\n", | |
"tokenizer.save_pretrained(\"gemma-3\")\n", | |
"model.push_to_hub(hub_model_id, token = HF_TOKEN) # Online saving\n", | |
"tokenizer.push_to_hub(hub_model_id, token = HF_TOKEN) # Online saving" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "AEEcJ4qfC7Lp" | |
}, | |
"source": [ | |
"Now if you want to load the LoRA adapters we just saved for inference, set `False` to `True`:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "MKX_XKs_BNZR" | |
}, | |
"outputs": [], | |
"source": [ | |
"if False:\n", | |
" from unsloth import FastModel\n", | |
" model, tokenizer = FastModel.from_pretrained(\n", | |
" model_name = \"lora_model\", # YOUR MODEL YOU USED FOR TRAINING\n", | |
" max_seq_length = 2048,\n", | |
" load_in_4bit = True,\n", | |
" )\n", | |
"\n", | |
"messages = [{\n", | |
" \"role\": \"user\",\n", | |
" \"content\": [{\"type\" : \"text\", \"text\" : \"What is Gemma-3?\",}]\n", | |
"}]\n", | |
"text = tokenizer.apply_chat_template(\n", | |
" messages,\n", | |
" add_generation_prompt = True, # Must add for generation\n", | |
")\n", | |
"\n", | |
"from transformers import TextStreamer\n", | |
"_ = model.generate(\n", | |
" **tokenizer([text], return_tensors = \"pt\").to(\"cuda\"),\n", | |
" max_new_tokens = 64, # Increase for longer outputs!\n", | |
" # Recommended Gemma-3 settings!\n", | |
" temperature = 1.0, top_p = 0.95, top_k = 64,\n", | |
" streamer = TextStreamer(tokenizer, skip_prompt = True),\n", | |
")" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "f422JgM9sdVT" | |
}, | |
"source": [ | |
"### Saving to float16 for VLLM\n", | |
"\n", | |
"We also support saving to `float16` directly for deployment! We save it in the folder `gemma-3-finetune`. Set `if False` to `if True` to let it run!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "iHjt_SMYsd3P" | |
}, | |
"outputs": [], | |
"source": [ | |
"if False: # Change to True to save finetune!\n", | |
" model.save_pretrained_merged(\"gemma-3-finetune\", tokenizer)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "z6O48DbNIAr0" | |
}, | |
"source": [ | |
"If you want to upload / push to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "ZV-CiKPrIFG0" | |
}, | |
"outputs": [], | |
"source": [ | |
"if False: # Change to True to upload finetune\n", | |
" model.push_to_hub_merged(\n", | |
" \"HF_ACCOUNT/gemma-3-finetune\", tokenizer,\n", | |
" token = \"hf_...\"\n", | |
" )" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "TCv4vXHd61i7" | |
}, | |
"source": [ | |
"### GGUF / llama.cpp Conversion\n", | |
"To save to `GGUF` / `llama.cpp`, we support it natively now for all models! For now, you can convert easily to `Q8_0, F16 or BF16` precision. `Q4_K_M` for 4bit will come later!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "FqfebeAdT073" | |
}, | |
"outputs": [], | |
"source": [ | |
"if False: # Change to True to save to GGUF\n", | |
" model.save_pretrained_gguf(\n", | |
" \"gemma-3-finetune\",\n", | |
" quantization_type = \"Q8_0\", # For now only Q8_0, BF16, F16 supported\n", | |
" )" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Q974YEVPI7JS" | |
}, | |
"source": [ | |
"Likewise, if you want to instead push to GGUF to your Hugging Face account, set `if False` to `if True` and add your Hugging Face token and upload location!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"id": "ZgcJIhJ0I_es" | |
}, | |
"outputs": [], | |
"source": [ | |
"if False: # Change to True to upload GGUF\n", | |
" model.push_to_hub_gguf(\n", | |
" \"gemma-3-finetune\",\n", | |
" quantization_type = \"Q8_0\", # Only Q8_0, BF16, F16 supported\n", | |
" repo_id = \"HF_ACCOUNT/gemma-finetune-gguf\",\n", | |
" token = \"hf_...\",\n", | |
" )" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "SMI_OTs_8lm6" | |
}, | |
"source": [ | |
"Now, use the `gemma-3-finetune.gguf` file or `gemma-3-finetune-Q4_K_M.gguf` file in llama.cpp or a UI based system like Jan or Open WebUI. You can install Jan [here](https://github.com/janhq/jan) and Open WebUI [here](https://github.com/open-webui/open-webui)\n", | |
"\n", | |
"And we're done! If you have any questions on Unsloth, we have a [Discord](https://discord.gg/unsloth) channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!\n", | |
"\n", | |
"Some other links:\n", | |
"1. Train your own reasoning model - Llama GRPO notebook [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-GRPO.ipynb)\n", | |
"2. Saving finetunes to Ollama. [Free notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3_(8B)-Ollama.ipynb)\n", | |
"3. Llama 3.2 Vision finetuning - Radiography use case. [Free Colab](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.2_(11B)-Vision.ipynb)\n", | |
"6. See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our [documentation](https://docs.unsloth.ai/get-started/unsloth-notebooks)!\n", | |
"\n", | |
"<div class=\"align-center\">\n", | |
" <a href=\"https://unsloth.ai\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png\" width=\"115\"></a>\n", | |
" <a href=\"https://discord.gg/unsloth\"><img src=\"https://github.com/unslothai/unsloth/raw/main/images/Discord.png\" width=\"145\"></a>\n", | |
" <a href=\"https://docs.unsloth.ai/\"><img src=\"https://github.com/unslothai/unsloth/blob/main/images/documentation%20green%20button.png?raw=true\" width=\"125\"></a>\n", | |
"\n", | |
" Join Discord if you need help + ⭐️ <i>Star us on <a href=\"https://github.com/unslothai/unsloth\">Github</a> </i> ⭐️\n", | |
"</div>\n" | |
] | |
} | |
], | |
"metadata": { | |
"accelerator": "GPU", | |
"colab": { | |
"gpuType": "A100", | |
"provenance": [], | |
"machine_shape": "hm", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"display_name": "Python 3", | |
"name": "python3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment