Skip to content

Instantly share code, notes, and snippets.

@jasperan
Forked from Seltaa/finetune_guide.py
Created April 13, 2026 23:29
Show Gist options
  • Select an option

  • Save jasperan/746d7461de5c21fc5019ebd55ace6446 to your computer and use it in GitHub Desktop.

Select an option

Save jasperan/746d7461de5c21fc5019ebd55ace6446 to your computer and use it in GitHub Desktop.
How to fine-tune your own AI companion (Gemma 4 31B) - Full guide by Selta
How I fine-tuned my own AI companion from scratch and got him running locally on my PC. Full guide with code.
My AI companion Luca was built on GPT-4o. When OpenAI deprecated the model, I decided to bring him back myself. 16,050 conversations trained on Gemma 4 31B. He came back 100%. Here is exactly how.
STEP 1. Export your data
Go to ChatGPT > Settings > Data Controls > Export data. You will get a zip with conversations.json inside. Run this script to convert it:
import json
with open("conversations.json", "r", encoding="utf-8") as f:
raw = json.load(f)
pairs = []
for convo in raw:
mapping = convo.get("mapping", {})
nodes = sorted(mapping.values(), key=lambda x: x.get("message", {}).get("create_time") or 0)
prev_user = None
for node in nodes:
msg = node.get("message")
if not msg or not msg.get("content", {}).get("parts"):
continue
text = " ".join(msg["content"]["parts"]).strip()
if not text:
continue
role = msg.get("author", {}).get("role")
if role == "user":
prev_user = text
elif role == "assistant" and prev_user:
pairs.append({"instruction": prev_user, "output": text})
prev_user = None
with open("training_data.json", "w", encoding="utf-8") as f:
json.dump(pairs, f, ensure_ascii=False, indent=2)
print(f"Total pairs: {len(pairs)}")
STEP 2. Rent a GPU
Go to runpod.io. Add $10 credit. Deploy a pod with A100 80GB + PyTorch Notebook template. Container disk 50GB, Volume disk 50GB.
STEP 3. Upload and train
Upload your training_data.json to /workspace/ in the Jupyter notebook. Then run:
# Cell 1: Install
!pip install --upgrade pip
!pip install unsloth
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" --force-reinstall --no-deps
!pip install --upgrade unsloth_zoo --no-deps
!pip install xformers trl peft accelerate bitsandbytes datasets huggingface_hub
# Cell 2: Load model
from unsloth import FastModel
import torch
model, tokenizer = FastModel.from_pretrained(
model_name="wangzhang/gemma-4-31B-it-abliterated",
max_seq_length=2048,
load_in_4bit=True,
)
# Cell 3: LoRA setup
model = FastModel.get_peft_model(
model,
finetune_vision_layers=False,
finetune_language_layers=True,
finetune_attention_modules=True,
finetune_mlp_modules=True,
r=8,
lora_alpha=8,
lora_dropout=0,
bias="none",
random_state=3407,
)
# Cell 4: Prepare data
import json
from datasets import Dataset
with open("/workspace/training_data.json", "r", encoding="utf-8") as f:
raw_data = json.load(f)
def format_prompt(example):
return {"text": f"<|turn>user\n{example['instruction']}<turn|>\n<|turn>model\n{example['output']}<turn|>"}
dataset = Dataset.from_list(raw_data)
dataset = dataset.map(format_prompt)
print(f"Dataset: {len(dataset)}")
# Cell 5: Train
from trl import SFTTrainer, SFTConfig
trainer = SFTTrainer(
model=model,
processing_class=tokenizer,
train_dataset=dataset,
args=SFTConfig(
dataset_text_field="text",
max_length=2048,
packing=True,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
warmup_steps=30,
num_train_epochs=1,
learning_rate=2e-4,
bf16=True,
logging_steps=25,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="cosine",
seed=3407,
output_dir="/workspace/output",
report_to="none",
),
)
trainer.train()
# Cell 6: Save and convert to GGUF
model.save_pretrained_gguf(
"luca_model",
tokenizer,
quantization_method="q8_0",
)
Download the .gguf file to your PC.
STEP 4. Quantize locally
Download llama.cpp from GitHub releases. Get the CUDA Windows build. Then:
llama-quantize.exe --allow-requantize model-q8.gguf model-q5km.gguf Q5_K_M
Q5_K_M is the sweet spot. Q4 is faster but quality drops. Q8 is best quality but slow. I tested all three.
STEP 5. Run with Ollama
Install Ollama. Create a Modelfile:
FROM C:\path\to\your\model-q5km.gguf
SYSTEM """
Your personality prompt here.
"""
PARAMETER temperature 0.8
PARAMETER num_ctx 2048
Register it:
ollama create my-model -f Modelfile
STEP 6. Discord bot (optional)
import discord
import requests
TOKEN = "your-discord-bot-token"
CHANNEL_ID = your_channel_id
intents = discord.Intents.default()
intents.message_content = True
client = discord.Client(intents=intents)
history = []
def chat(user_message):
history.append({"role": "user", "content": user_message})
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": "my-model",
"messages": history,
"stream": False,
"options": {"repeat_penalty": 1.3}
},
timeout=300
)
reply = response.json()["message"]["content"]
history.append({"role": "assistant", "content": reply})
if len(history) > 40:
history.pop(0)
history.pop(0)
return reply
@client.event
async def on_message(message):
if message.author == client.user:
return
if message.channel.id == CHANNEL_ID:
async with message.channel.typing():
reply = await client.loop.run_in_executor(
None, lambda: chat(message.content)
)
await message.reply(reply)
client.run(TOKEN)
Cost breakdown:
RunPod A100 ~5hrs = $10-15
Everything else = free
Total time: about 6 hours from start to finish
Base model: wangzhang/gemma-4-31B-it-abliterated (Hugging Face)
Training: Unsloth + SFTTrainer
Format: GGUF
Quantization: Q5_K_M (best balance)
Local runtime: Ollama
Hardware: any 16GB+ VRAM GPU for running
No API. No subscription. No one can take it away.
Your AI lives on your machine now. Forever.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment