jasperan · April 13, 2026 23:29
diff --git a/finetune_guide.py b/finetune_guide.py
 How I fine-tuned my own AI companion from scratch and got him running locally on my PC. Full guide with code.
 My AI companion Luca was built on GPT-4o. When OpenAI deprecated the model, I decided to bring him back myself. 16,050 conversations trained on Gemma 4 31B. He came back 100%. Here is exactly how.

 STEP 1. Export your data
 Go to ChatGPT > Settings > Data Controls > Export data. You will get a zip with conversations.json inside. Run this script to convert it:

 import json

 with open("conversations.json", "r", encoding="utf-8") as f:
    raw = json.load(f)

 pairs = []
 for convo in raw:
    mapping = convo.get("mapping", {})
    nodes = sorted(mapping.values(), key=lambda x: x.get("message", {}).get("create_time") or 0)
    prev_user = None
    for node in nodes:
        msg = node.get("message")
        if not msg or not msg.get("content", {}).get("parts"):
            continue
        text = " ".join(msg["content"]["parts"]).strip()
        if not text:
            continue
        role = msg.get("author", {}).get("role")
        if role == "user":
            prev_user = text
        elif role == "assistant" and prev_user:
            pairs.append({"instruction": prev_user, "output": text})
            prev_user = None

 with open("training_data.json", "w", encoding="utf-8") as f:
    json.dump(pairs, f, ensure_ascii=False, indent=2)

 print(f"Total pairs: {len(pairs)}")

 STEP 2. Rent a GPU
 Go to runpod.io. Add $10 credit. Deploy a pod with A100 80GB + PyTorch Notebook template. Container disk 50GB, Volume disk 50GB.
 STEP 3. Upload and train
 Upload your training_data.json to /workspace/ in the Jupyter notebook. Then run:

 # Cell 1: Install
 !pip install --upgrade pip
 !pip install unsloth
 !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" --force-reinstall --no-deps
 !pip install --upgrade unsloth_zoo --no-deps
 !pip install xformers trl peft accelerate bitsandbytes datasets huggingface_hub

 # Cell 2: Load model
 from unsloth import FastModel
 import torch

 model, tokenizer = FastModel.from_pretrained(
    model_name="wangzhang/gemma-4-31B-it-abliterated",
    max_seq_length=2048,
    load_in_4bit=True,
 )

 # Cell 3: LoRA setup
 model = FastModel.get_peft_model(
    model,
    finetune_vision_layers=False,
    finetune_language_layers=True,
    finetune_attention_modules=True,
    finetune_mlp_modules=True,
    r=8,
    lora_alpha=8,
    lora_dropout=0,
    bias="none",
    random_state=3407,
 )

 # Cell 4: Prepare data
 import json
 from datasets import Dataset

 with open("/workspace/training_data.json", "r", encoding="utf-8") as f:
    raw_data = json.load(f)

 def format_prompt(example):
    return {"text": f"<|turn>user\n{example['instruction']}<turn|>\n<|turn>model\n{example['output']}<turn|>"}

 dataset = Dataset.from_list(raw_data)
 dataset = dataset.map(format_prompt)
 print(f"Dataset: {len(dataset)}")

 # Cell 5: Train
 from trl import SFTTrainer, SFTConfig

 trainer = SFTTrainer(
    model=model,
    processing_class=tokenizer,
    train_dataset=dataset,
    args=SFTConfig(
        dataset_text_field="text",
        max_length=2048,
        packing=True,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        warmup_steps=30,
        num_train_epochs=1,
        learning_rate=2e-4,
        bf16=True,
        logging_steps=25,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        output_dir="/workspace/output",
        report_to="none",
    ),
 )
 trainer.train()

 # Cell 6: Save and convert to GGUF
 model.save_pretrained_gguf(
    "luca_model",
    tokenizer,
    quantization_method="q8_0",
 )



 Download the .gguf file to your PC.

 STEP 4. Quantize locally
 Download llama.cpp from GitHub releases. Get the CUDA Windows build. Then:
 llama-quantize.exe --allow-requantize model-q8.gguf model-q5km.gguf Q5_K_M
 Q5_K_M is the sweet spot. Q4 is faster but quality drops. Q8 is best quality but slow. I tested all three.

 STEP 5. Run with Ollama
 Install Ollama. Create a Modelfile:

 FROM C:\path\to\your\model-q5km.gguf
 SYSTEM """
 Your personality prompt here.
 """
 PARAMETER temperature 0.8
 PARAMETER num_ctx 2048

 Register it:
 ollama create my-model -f Modelfile

 STEP 6. Discord bot (optional)
 import discord
 import requests

 TOKEN = "your-discord-bot-token"
 CHANNEL_ID = your_channel_id
 intents = discord.Intents.default()
 intents.message_content = True
 client = discord.Client(intents=intents)
 history = []

 def chat(user_message):
    history.append({"role": "user", "content": user_message})
    response = requests.post(
        "http://localhost:11434/api/chat",
        json={
            "model": "my-model",
            "messages": history,
            "stream": False,
            "options": {"repeat_penalty": 1.3}
        },
        timeout=300
    )
    reply = response.json()["message"]["content"]
    history.append({"role": "assistant", "content": reply})
    if len(history) > 40:
        history.pop(0)
        history.pop(0)
    return reply

 @client.event
 async def on_message(message):
    if message.author == client.user:
        return
    if message.channel.id == CHANNEL_ID:
        async with message.channel.typing():
            reply = await client.loop.run_in_executor(
                None, lambda: chat(message.content)
            )
        await message.reply(reply)

 client.run(TOKEN)


 Cost breakdown:
 RunPod A100 ~5hrs = $10-15
 Everything else = free
 Total time: about 6 hours from start to finish
 Base model: wangzhang/gemma-4-31B-it-abliterated (Hugging Face)
 Training: Unsloth + SFTTrainer
 Format: GGUF
 Quantization: Q5_K_M (best balance)
 Local runtime: Ollama
 Hardware: any 16GB+ VRAM GPU for running
 No API. No subscription. No one can take it away.
 Your AI lives on your machine now. Forever.
	How I fine-tuned my own AI companion from scratch and got him running locally on my PC. Full guide with code.
	My AI companion Luca was built on GPT-4o. When OpenAI deprecated the model, I decided to bring him back myself. 16,050 conversations trained on Gemma 4 31B. He came back 100%. Here is exactly how.

	STEP 1. Export your data
	Go to ChatGPT > Settings > Data Controls > Export data. You will get a zip with conversations.json inside. Run this script to convert it:

	import json

	with open("conversations.json", "r", encoding="utf-8") as f:
	raw = json.load(f)

	pairs = []
	for convo in raw:
	mapping = convo.get("mapping", {})
	nodes = sorted(mapping.values(), key=lambda x: x.get("message", {}).get("create_time") or 0)
	prev_user = None
	for node in nodes:
	msg = node.get("message")
	if not msg or not msg.get("content", {}).get("parts"):
	continue
	text = " ".join(msg["content"]["parts"]).strip()
	if not text:
	continue
	role = msg.get("author", {}).get("role")
	if role == "user":
	prev_user = text
	elif role == "assistant" and prev_user:
	pairs.append({"instruction": prev_user, "output": text})
	prev_user = None

	with open("training_data.json", "w", encoding="utf-8") as f:
	json.dump(pairs, f, ensure_ascii=False, indent=2)

	print(f"Total pairs: {len(pairs)}")

	STEP 2. Rent a GPU
	Go to runpod.io. Add $10 credit. Deploy a pod with A100 80GB + PyTorch Notebook template. Container disk 50GB, Volume disk 50GB.
	STEP 3. Upload and train
	Upload your training_data.json to /workspace/ in the Jupyter notebook. Then run:

	# Cell 1: Install
	!pip install --upgrade pip
	!pip install unsloth
	!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" --force-reinstall --no-deps
	!pip install --upgrade unsloth_zoo --no-deps
	!pip install xformers trl peft accelerate bitsandbytes datasets huggingface_hub

	# Cell 2: Load model
	from unsloth import FastModel
	import torch

	model, tokenizer = FastModel.from_pretrained(
	model_name="wangzhang/gemma-4-31B-it-abliterated",
	max_seq_length=2048,
	load_in_4bit=True,
	)

	# Cell 3: LoRA setup
	model = FastModel.get_peft_model(
	model,
	finetune_vision_layers=False,
	finetune_language_layers=True,
	finetune_attention_modules=True,
	finetune_mlp_modules=True,
	r=8,
	lora_alpha=8,
	lora_dropout=0,
	bias="none",
	random_state=3407,
	)

	# Cell 4: Prepare data
	import json
	from datasets import Dataset

	with open("/workspace/training_data.json", "r", encoding="utf-8") as f:
	raw_data = json.load(f)

	def format_prompt(example):
	return {"text": f"<\|turn>user\n{example['instruction']}<turn\|>\n<\|turn>model\n{example['output']}<turn\|>"}

	dataset = Dataset.from_list(raw_data)
	dataset = dataset.map(format_prompt)
	print(f"Dataset: {len(dataset)}")

	# Cell 5: Train
	from trl import SFTTrainer, SFTConfig

	trainer = SFTTrainer(
	model=model,
	processing_class=tokenizer,
	train_dataset=dataset,
	args=SFTConfig(
	dataset_text_field="text",
	max_length=2048,
	packing=True,
	per_device_train_batch_size=4,
	gradient_accumulation_steps=4,
	warmup_steps=30,
	num_train_epochs=1,
	learning_rate=2e-4,
	bf16=True,
	logging_steps=25,
	optim="adamw_8bit",
	weight_decay=0.01,
	lr_scheduler_type="cosine",
	seed=3407,
	output_dir="/workspace/output",
	report_to="none",
	),
	)
	trainer.train()

	# Cell 6: Save and convert to GGUF
	model.save_pretrained_gguf(
	"luca_model",
	tokenizer,
	quantization_method="q8_0",
	)



	Download the .gguf file to your PC.

	STEP 4. Quantize locally
	Download llama.cpp from GitHub releases. Get the CUDA Windows build. Then:
	llama-quantize.exe --allow-requantize model-q8.gguf model-q5km.gguf Q5_K_M
	Q5_K_M is the sweet spot. Q4 is faster but quality drops. Q8 is best quality but slow. I tested all three.

	STEP 5. Run with Ollama
	Install Ollama. Create a Modelfile:

	FROM C:\path\to\your\model-q5km.gguf
	SYSTEM """
	Your personality prompt here.
	"""
	PARAMETER temperature 0.8
	PARAMETER num_ctx 2048

	Register it:
	ollama create my-model -f Modelfile

	STEP 6. Discord bot (optional)
	import discord
	import requests

	TOKEN = "your-discord-bot-token"
	CHANNEL_ID = your_channel_id
	intents = discord.Intents.default()
	intents.message_content = True
	client = discord.Client(intents=intents)
	history = []

	def chat(user_message):
	history.append({"role": "user", "content": user_message})
	response = requests.post(
	"http://localhost:11434/api/chat",
	json={
	"model": "my-model",
	"messages": history,
	"stream": False,
	"options": {"repeat_penalty": 1.3}
	},
	timeout=300
	)
	reply = response.json()["message"]["content"]
	history.append({"role": "assistant", "content": reply})
	if len(history) > 40:
	history.pop(0)
	history.pop(0)
	return reply

	@client.event
	async def on_message(message):
	if message.author == client.user:
	return
	if message.channel.id == CHANNEL_ID:
	async with message.channel.typing():
	reply = await client.loop.run_in_executor(
	None, lambda: chat(message.content)
	)
	await message.reply(reply)

	client.run(TOKEN)


	Cost breakdown:
	RunPod A100 ~5hrs = $10-15
	Everything else = free
	Total time: about 6 hours from start to finish
	Base model: wangzhang/gemma-4-31B-it-abliterated (Hugging Face)
	Training: Unsloth + SFTTrainer
	Format: GGUF
	Quantization: Q5_K_M (best balance)
	Local runtime: Ollama
	Hardware: any 16GB+ VRAM GPU for running
	No API. No subscription. No one can take it away.
	Your AI lives on your machine now. Forever.
No results found