chottokun · February 20, 2026 14:11
diff --git a/qwen3-swallow-8b-sft-v0-2.ipynb b/qwen3-swallow-8b-sft-v0-2.ipynb
 {
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "machine_shape": "hm",
      "gpuType": "L4",
      "authorship_tag": "ABX9TyMT8Blu2X8e0v5jLVABmnZM",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/chottokun/8594d7439137349cdd803e0b73097e3b/qwen3-swallow-8b-sft-v0-2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f0710ebb"
      },
      "source": [
        "# Task\n",
        "Install vLLM and necessary libraries, start a background vLLM server for the model \"tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2\" using the \"--reasoning-parser qwen3\" option, and evaluate its Japanese performance across reasoning, creative writing, summarization, knowledge, and translation tasks, concluding with a comprehensive analysis report."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fcf18ad7",
        "outputId": "a0a98e9e-8ce2-401f-9230-f6a7d3c4dbc9"
      },
      "source": [
        "!pip install vllm openai"
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Requirement already satisfied: vllm in /usr/local/lib/python3.12/dist-packages (0.15.1)\n",
            "Requirement already satisfied: openai in /usr/local/lib/python3.12/dist-packages (2.21.0)\n",
            "Requirement already satisfied: regex in /usr/local/lib/python3.12/dist-packages (from vllm) (2025.11.3)\n",
            "Requirement already satisfied: cachetools in /usr/local/lib/python3.12/dist-packages (from vllm) (7.0.1)\n",
            "Requirement already satisfied: psutil in /usr/local/lib/python3.12/dist-packages (from vllm) (5.9.5)\n",
            "Requirement already satisfied: sentencepiece in /usr/local/lib/python3.12/dist-packages (from vllm) (0.2.1)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.12/dist-packages (from vllm) (2.0.2)\n",
            "Requirement already satisfied: requests>=2.26.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (2.32.4)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.12/dist-packages (from vllm) (4.67.3)\n",
            "Requirement already satisfied: blake3 in /usr/local/lib/python3.12/dist-packages (from vllm) (1.0.8)\n",
            "Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.12/dist-packages (from vllm) (9.0.0)\n",
            "Requirement already satisfied: transformers<5,>=4.56.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (4.57.6)\n",
            "Requirement already satisfied: tokenizers>=0.21.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.22.2)\n",
            "Requirement already satisfied: protobuf>=6.33.5 in /usr/local/lib/python3.12/dist-packages (from vllm) (6.33.5)\n",
            "Requirement already satisfied: fastapi>=0.115.0 in /usr/local/lib/python3.12/dist-packages (from fastapi[standard]>=0.115.0->vllm) (0.129.0)\n",
            "Requirement already satisfied: aiohttp>=3.13.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (3.13.3)\n",
            "Requirement already satisfied: pydantic>=2.12.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (2.12.3)\n",
            "Requirement already satisfied: prometheus_client>=0.18.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.24.1)\n",
            "Requirement already satisfied: pillow in /usr/local/lib/python3.12/dist-packages (from vllm) (11.3.0)\n",
            "Requirement already satisfied: prometheus-fastapi-instrumentator>=7.0.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (7.1.0)\n",
            "Requirement already satisfied: tiktoken>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.12.0)\n",
            "Requirement already satisfied: lm-format-enforcer==0.11.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.11.3)\n",
            "Requirement already satisfied: llguidance<1.4.0,>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (1.3.0)\n",
            "Requirement already satisfied: outlines_core==0.2.11 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.2.11)\n",
            "Requirement already satisfied: diskcache==5.6.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (5.6.3)\n",
            "Requirement already satisfied: lark==1.2.2 in /usr/local/lib/python3.12/dist-packages (from vllm) (1.2.2)\n",
            "Requirement already satisfied: xgrammar==0.1.29 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.1.29)\n",
            "Requirement already satisfied: typing_extensions>=4.10 in /usr/local/lib/python3.12/dist-packages (from vllm) (4.15.0)\n",
            "Requirement already satisfied: filelock>=3.16.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (3.24.2)\n",
            "Requirement already satisfied: partial-json-parser in /usr/local/lib/python3.12/dist-packages (from vllm) (0.2.1.1.post7)\n",
            "Requirement already satisfied: pyzmq>=25.0.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (26.2.1)\n",
            "Requirement already satisfied: msgspec in /usr/local/lib/python3.12/dist-packages (from vllm) (0.20.0)\n",
            "Requirement already satisfied: gguf>=0.17.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.17.1)\n",
            "Requirement already satisfied: mistral_common>=1.8.8 in /usr/local/lib/python3.12/dist-packages (from mistral_common[image]>=1.8.8->vllm) (1.9.1)\n",
            "Requirement already satisfied: opencv-python-headless>=4.13.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (4.13.0.92)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.12/dist-packages (from vllm) (6.0.3)\n",
            "Requirement already satisfied: six>=1.16.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (1.17.0)\n",
            "Requirement already satisfied: setuptools<81.0.0,>=77.0.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (80.10.2)\n",
            "Requirement already satisfied: einops in /usr/local/lib/python3.12/dist-packages (from vllm) (0.8.2)\n",
            "Requirement already satisfied: compressed-tensors==0.13.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.13.0)\n",
            "Requirement already satisfied: depyf==0.20.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.20.0)\n",
            "Requirement already satisfied: cloudpickle in /usr/local/lib/python3.12/dist-packages (from vllm) (3.1.2)\n",
            "Requirement already satisfied: watchfiles in /usr/local/lib/python3.12/dist-packages (from vllm) (1.1.1)\n",
            "Requirement already satisfied: python-json-logger in /usr/local/lib/python3.12/dist-packages (from vllm) (4.0.0)\n",
            "Requirement already satisfied: ninja in /usr/local/lib/python3.12/dist-packages (from vllm) (1.13.0)\n",
            "Requirement already satisfied: pybase64 in /usr/local/lib/python3.12/dist-packages (from vllm) (1.4.3)\n",
            "Requirement already satisfied: cbor2 in /usr/local/lib/python3.12/dist-packages (from vllm) (5.8.0)\n",
            "Requirement already satisfied: ijson in /usr/local/lib/python3.12/dist-packages (from vllm) (3.4.0.post0)\n",
            "Requirement already satisfied: setproctitle in /usr/local/lib/python3.12/dist-packages (from vllm) (1.3.7)\n",
            "Requirement already satisfied: openai-harmony>=0.0.3 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.0.8)\n",
            "Requirement already satisfied: anthropic>=0.71.0 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.83.0)\n",
            "Requirement already satisfied: model-hosting-container-standards<1.0.0,>=0.1.13 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.1.13)\n",
            "Requirement already satisfied: mcp in /usr/local/lib/python3.12/dist-packages (from vllm) (1.26.0)\n",
            "Requirement already satisfied: grpcio in /usr/local/lib/python3.12/dist-packages (from vllm) (1.78.1)\n",
            "Requirement already satisfied: grpcio-reflection in /usr/local/lib/python3.12/dist-packages (from vllm) (1.78.1)\n",
            "Requirement already satisfied: numba==0.61.2 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.61.2)\n",
            "Requirement already satisfied: ray>=2.48.0 in /usr/local/lib/python3.12/dist-packages (from ray[cgraph]>=2.48.0->vllm) (2.54.0)\n",
            "Requirement already satisfied: torch==2.9.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (2.9.1)\n",
            "Requirement already satisfied: torchaudio==2.9.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (2.9.1)\n",
            "Requirement already satisfied: torchvision==0.24.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.24.1)\n",
            "Requirement already satisfied: flashinfer-python==0.6.1 in /usr/local/lib/python3.12/dist-packages (from vllm) (0.6.1)\n",
            "Requirement already satisfied: loguru in /usr/local/lib/python3.12/dist-packages (from compressed-tensors==0.13.0->vllm) (0.7.3)\n",
            "Requirement already satisfied: astor in /usr/local/lib/python3.12/dist-packages (from depyf==0.20.0->vllm) (0.8.1)\n",
            "Requirement already satisfied: dill in /usr/local/lib/python3.12/dist-packages (from depyf==0.20.0->vllm) (0.3.8)\n",
            "Requirement already satisfied: apache-tvm-ffi!=0.1.8,!=0.1.8.post0,<0.2,>=0.1.6 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.6.1->vllm) (0.1.8.post2)\n",
            "Requirement already satisfied: click in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.6.1->vllm) (8.3.1)\n",
            "Requirement already satisfied: nvidia-cudnn-frontend>=1.13.0 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.6.1->vllm) (1.18.0)\n",
            "Requirement already satisfied: nvidia-cutlass-dsl>=4.3.4 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.6.1->vllm) (4.4.0)\n",
            "Requirement already satisfied: nvidia-ml-py in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.6.1->vllm) (13.590.48)\n",
            "Requirement already satisfied: packaging>=24.2 in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.6.1->vllm) (26.0)\n",
            "Requirement already satisfied: tabulate in /usr/local/lib/python3.12/dist-packages (from flashinfer-python==0.6.1->vllm) (0.9.0)\n",
            "Requirement already satisfied: interegular>=0.3.2 in /usr/local/lib/python3.12/dist-packages (from lm-format-enforcer==0.11.3->vllm) (0.3.3)\n",
            "Requirement already satisfied: llvmlite<0.45,>=0.44.0dev0 in /usr/local/lib/python3.12/dist-packages (from numba==0.61.2->vllm) (0.44.0)\n",
            "Requirement already satisfied: sympy>=1.13.3 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (1.14.0)\n",
            "Requirement already satisfied: networkx>=2.5.1 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (3.6.1)\n",
            "Requirement already satisfied: jinja2 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (3.1.6)\n",
            "Requirement already satisfied: fsspec>=0.8.5 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (2025.3.0)\n",
            "Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (12.8.93)\n",
            "Requirement already satisfied: nvidia-cuda-runtime-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (12.8.90)\n",
            "Requirement already satisfied: nvidia-cuda-cupti-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (12.8.90)\n",
            "Requirement already satisfied: nvidia-cudnn-cu12==9.10.2.21 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (9.10.2.21)\n",
            "Requirement already satisfied: nvidia-cublas-cu12==12.8.4.1 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (12.8.4.1)\n",
            "Requirement already satisfied: nvidia-cufft-cu12==11.3.3.83 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (11.3.3.83)\n",
            "Requirement already satisfied: nvidia-curand-cu12==10.3.9.90 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (10.3.9.90)\n",
            "Requirement already satisfied: nvidia-cusolver-cu12==11.7.3.90 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (11.7.3.90)\n",
            "Requirement already satisfied: nvidia-cusparse-cu12==12.5.8.93 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (12.5.8.93)\n",
            "Requirement already satisfied: nvidia-cusparselt-cu12==0.7.1 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (0.7.1)\n",
            "Requirement already satisfied: nvidia-nccl-cu12==2.27.5 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (2.27.5)\n",
            "Requirement already satisfied: nvidia-nvshmem-cu12==3.3.20 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (3.3.20)\n",
            "Requirement already satisfied: nvidia-nvtx-cu12==12.8.90 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (12.8.90)\n",
            "Requirement already satisfied: nvidia-nvjitlink-cu12==12.8.93 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (12.8.93)\n",
            "Requirement already satisfied: nvidia-cufile-cu12==1.13.1.3 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (1.13.1.3)\n",
            "Requirement already satisfied: triton==3.5.1 in /usr/local/lib/python3.12/dist-packages (from torch==2.9.1->vllm) (3.5.1)\n",
            "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.12/dist-packages (from openai) (4.12.1)\n",
            "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.12/dist-packages (from openai) (1.9.0)\n",
            "Requirement already satisfied: httpx<1,>=0.23.0 in /usr/local/lib/python3.12/dist-packages (from openai) (0.28.1)\n",
            "Requirement already satisfied: jiter<1,>=0.10.0 in /usr/local/lib/python3.12/dist-packages (from openai) (0.13.0)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.12/dist-packages (from openai) (1.3.1)\n",
            "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp>=3.13.3->vllm) (2.6.1)\n",
            "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp>=3.13.3->vllm) (1.4.0)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp>=3.13.3->vllm) (25.4.0)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.12/dist-packages (from aiohttp>=3.13.3->vllm) (1.8.0)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.12/dist-packages (from aiohttp>=3.13.3->vllm) (6.7.1)\n",
            "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp>=3.13.3->vllm) (0.4.1)\n",
            "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.12/dist-packages (from aiohttp>=3.13.3->vllm) (1.22.0)\n",
            "Requirement already satisfied: docstring-parser<1,>=0.15 in /usr/local/lib/python3.12/dist-packages (from anthropic>=0.71.0->vllm) (0.17.0)\n",
            "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.12/dist-packages (from anyio<5,>=3.5.0->openai) (3.11)\n",
            "Requirement already satisfied: starlette<1.0.0,>=0.40.0 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.115.0->fastapi[standard]>=0.115.0->vllm) (0.52.1)\n",
            "Requirement already satisfied: typing-inspection>=0.4.2 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.115.0->fastapi[standard]>=0.115.0->vllm) (0.4.2)\n",
            "Requirement already satisfied: annotated-doc>=0.0.2 in /usr/local/lib/python3.12/dist-packages (from fastapi>=0.115.0->fastapi[standard]>=0.115.0->vllm) (0.0.4)\n",
            "Requirement already satisfied: fastapi-cli>=0.0.8 in /usr/local/lib/python3.12/dist-packages (from fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.0.23)\n",
            "Requirement already satisfied: python-multipart>=0.0.18 in /usr/local/lib/python3.12/dist-packages (from fastapi[standard]>=0.115.0->vllm) (0.0.22)\n",
            "Requirement already satisfied: email-validator>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from fastapi[standard]>=0.115.0->vllm) (2.3.0)\n",
            "Requirement already satisfied: uvicorn>=0.12.0 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.12.0; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.41.0)\n",
            "Requirement already satisfied: pydantic-settings>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from fastapi[standard]>=0.115.0->vllm) (2.13.0)\n",
            "Requirement already satisfied: pydantic-extra-types>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from fastapi[standard]>=0.115.0->vllm) (2.11.0)\n",
            "Requirement already satisfied: certifi in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.23.0->openai) (2026.1.4)\n",
            "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.12/dist-packages (from httpx<1,>=0.23.0->openai) (1.0.9)\n",
            "Requirement already satisfied: h11>=0.16 in /usr/local/lib/python3.12/dist-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.16.0)\n",
            "Requirement already satisfied: jsonschema>=4.21.1 in /usr/local/lib/python3.12/dist-packages (from mistral_common>=1.8.8->mistral_common[image]>=1.8.8->vllm) (4.26.0)\n",
            "Requirement already satisfied: jmespath in /usr/local/lib/python3.12/dist-packages (from model-hosting-container-standards<1.0.0,>=0.1.13->vllm) (1.1.0)\n",
            "Requirement already satisfied: supervisor>=4.2.0 in /usr/local/lib/python3.12/dist-packages (from model-hosting-container-standards<1.0.0,>=0.1.13->vllm) (4.3.0)\n",
            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.12.0->vllm) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.41.4 in /usr/local/lib/python3.12/dist-packages (from pydantic>=2.12.0->vllm) (2.41.4)\n",
            "Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /usr/local/lib/python3.12/dist-packages (from ray>=2.48.0->ray[cgraph]>=2.48.0->vllm) (1.1.2)\n",
            "Requirement already satisfied: cupy-cuda12x in /usr/local/lib/python3.12/dist-packages (from ray[cgraph]>=2.48.0->vllm) (13.6.0)\n",
            "Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests>=2.26.0->vllm) (3.4.4)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests>=2.26.0->vllm) (2.5.0)\n",
            "Requirement already satisfied: huggingface-hub<2.0,>=0.16.4 in /usr/local/lib/python3.12/dist-packages (from tokenizers>=0.21.1->vllm) (0.36.2)\n",
            "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.12/dist-packages (from transformers<5,>=4.56.0->vllm) (0.7.0)\n",
            "Requirement already satisfied: httpx-sse>=0.4 in /usr/local/lib/python3.12/dist-packages (from mcp->vllm) (0.4.3)\n",
            "Requirement already satisfied: pyjwt>=2.10.1 in /usr/local/lib/python3.12/dist-packages (from pyjwt[crypto]>=2.10.1->mcp->vllm) (2.11.0)\n",
            "Requirement already satisfied: sse-starlette>=1.6.1 in /usr/local/lib/python3.12/dist-packages (from mcp->vllm) (3.2.0)\n",
            "Requirement already satisfied: dnspython>=2.0.0 in /usr/local/lib/python3.12/dist-packages (from email-validator>=2.0.0->fastapi[standard]>=0.115.0->vllm) (2.8.0)\n",
            "Requirement already satisfied: typer>=0.16.0 in /usr/local/lib/python3.12/dist-packages (from fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.24.0)\n",
            "Requirement already satisfied: rich-toolkit>=0.14.8 in /usr/local/lib/python3.12/dist-packages (from fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.19.4)\n",
            "Requirement already satisfied: fastapi-cloud-cli>=0.1.1 in /usr/local/lib/python3.12/dist-packages (from fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.13.0)\n",
            "Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.12/dist-packages (from huggingface-hub<2.0,>=0.16.4->tokenizers>=0.21.1->vllm) (1.2.0)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.12/dist-packages (from jinja2->torch==2.9.1->vllm) (3.0.3)\n",
            "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.21.1->mistral_common>=1.8.8->mistral_common[image]>=1.8.8->vllm) (2025.9.1)\n",
            "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.21.1->mistral_common>=1.8.8->mistral_common[image]>=1.8.8->vllm) (0.37.0)\n",
            "Requirement already satisfied: rpds-py>=0.25.0 in /usr/local/lib/python3.12/dist-packages (from jsonschema>=4.21.1->mistral_common>=1.8.8->mistral_common[image]>=1.8.8->vllm) (0.30.0)\n",
            "Requirement already satisfied: nvidia-cutlass-dsl-libs-base==4.4.0 in /usr/local/lib/python3.12/dist-packages (from nvidia-cutlass-dsl>=4.3.4->flashinfer-python==0.6.1->vllm) (4.4.0)\n",
            "Requirement already satisfied: cuda-python>=12.8 in /usr/local/lib/python3.12/dist-packages (from nvidia-cutlass-dsl-libs-base==4.4.0->nvidia-cutlass-dsl>=4.3.4->flashinfer-python==0.6.1->vllm) (12.9.4)\n",
            "Requirement already satisfied: pycountry>=23 in /usr/local/lib/python3.12/dist-packages (from pydantic-extra-types[pycountry]>=2.10.5->mistral_common>=1.8.8->mistral_common[image]>=1.8.8->vllm) (26.2.16)\n",
            "Requirement already satisfied: python-dotenv>=0.21.0 in /usr/local/lib/python3.12/dist-packages (from pydantic-settings>=2.0.0->fastapi[standard]>=0.115.0->vllm) (1.2.1)\n",
            "Requirement already satisfied: cryptography>=3.4.0 in /usr/local/lib/python3.12/dist-packages (from pyjwt[crypto]>=2.10.1->mcp->vllm) (43.0.3)\n",
            "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.12/dist-packages (from sympy>=1.13.3->torch==2.9.1->vllm) (1.3.0)\n",
            "Requirement already satisfied: httptools>=0.6.3 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.12.0; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.7.1)\n",
            "Requirement already satisfied: uvloop>=0.15.1 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.12.0; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.22.1)\n",
            "Requirement already satisfied: websockets>=10.4 in /usr/local/lib/python3.12/dist-packages (from uvicorn[standard]>=0.12.0; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (15.0.1)\n",
            "Requirement already satisfied: fastrlock>=0.5 in /usr/local/lib/python3.12/dist-packages (from cupy-cuda12x->ray[cgraph]>=2.48.0->vllm) (0.8.3)\n",
            "Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.12/dist-packages (from cryptography>=3.4.0->pyjwt[crypto]>=2.10.1->mcp->vllm) (2.0.0)\n",
            "Requirement already satisfied: rignore>=0.5.1 in /usr/local/lib/python3.12/dist-packages (from fastapi-cloud-cli>=0.1.1->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.7.6)\n",
            "Requirement already satisfied: sentry-sdk>=2.20.0 in /usr/local/lib/python3.12/dist-packages (from fastapi-cloud-cli>=0.1.1->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (2.53.0)\n",
            "Requirement already satisfied: fastar>=0.8.0 in /usr/local/lib/python3.12/dist-packages (from fastapi-cloud-cli>=0.1.1->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.8.0)\n",
            "Requirement already satisfied: rich>=13.7.1 in /usr/local/lib/python3.12/dist-packages (from rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (13.9.4)\n",
            "Requirement already satisfied: shellingham>=1.3.0 in /usr/local/lib/python3.12/dist-packages (from typer>=0.16.0->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (1.5.4)\n",
            "Requirement already satisfied: pycparser in /usr/local/lib/python3.12/dist-packages (from cffi>=1.12->cryptography>=3.4.0->pyjwt[crypto]>=2.10.1->mcp->vllm) (3.0)\n",
            "Requirement already satisfied: cuda-bindings~=12.9.4 in /usr/local/lib/python3.12/dist-packages (from cuda-python>=12.8->nvidia-cutlass-dsl-libs-base==4.4.0->nvidia-cutlass-dsl>=4.3.4->flashinfer-python==0.6.1->vllm) (12.9.4)\n",
            "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.12/dist-packages (from rich>=13.7.1->rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (4.0.0)\n",
            "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.12/dist-packages (from rich>=13.7.1->rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (2.19.2)\n",
            "Requirement already satisfied: cuda-pathfinder~=1.1 in /usr/local/lib/python3.12/dist-packages (from cuda-bindings~=12.9.4->cuda-python>=12.8->nvidia-cutlass-dsl-libs-base==4.4.0->nvidia-cutlass-dsl>=4.3.4->flashinfer-python==0.6.1->vllm) (1.3.4)\n",
            "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.12/dist-packages (from markdown-it-py>=2.2.0->rich>=13.7.1->rich-toolkit>=0.14.8->fastapi-cli>=0.0.8->fastapi-cli[standard]>=0.0.8; extra == \"standard\"->fastapi[standard]>=0.115.0->vllm) (0.1.2)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "37ac84c8"
      },
      "source": [
        "## vLLMサーバーの起動\n",
        "\n",
        "### Subtask:\n",
        "Start the vLLM server in the background with the specified model and reasoning parser options.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "04169cc9",
        "outputId": "7155ccdb-8916-44ff-c30f-3622a59ea1c5"
      },
      "source": [
        "import subprocess\n",
        "import time\n",
        "import requests\n",
        "import sys\n",
        "import os\n",
        "\n",
        "# Define server parameters\n",
        "model_name = \"tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2\"\n",
        "log_filename = \"vllm_server.log\"\n",
        "timeout = 600  # 10 minutes timeout\n",
        "\n",
        "# Cleanup previous instances to prevent port conflicts\n",
        "print(\"Cleaning up previous vLLM instances...\")\n",
        "subprocess.run([\"pkill\", \"-f\", \"vllm\"], check=False)\n",
        "subprocess.run([\"fuser\", \"-k\", \"8000/tcp\"], check=False)\n",
        "time.sleep(2)\n",
        "\n",
        "print(f\"Starting vLLM server for model: {model_name} with reduced context length...\")\n",
        "\n",
        "# Open log file\n",
        "log_file = open(log_filename, \"w\")\n",
        "\n",
        "# Start vLLM server with memory optimization flags\n",
        "cmd = [\n",
        "    \"vllm\", \"serve\", model_name,\n",
        "    \"--reasoning-parser\", \"qwen3\",\n",
        "    \"--port\", \"8000\",\n",
        "    \"--trust-remote-code\",\n",
        "    \"--dtype\", \"half\",\n",
        "    \"--max-model-len\", \"16384\",      # Reduced from default 40960 to fit in T4 memory\n",
        "    \"--gpu-memory-utilization\", \"0.95\" # Allow using up to 95% of GPU memory\n",
        "]\n",
        "\n",
        "process = subprocess.Popen(\n",
        "    cmd,\n",
        "    stdout=log_file,\n",
        "    stderr=log_file\n",
        ")\n",
        "\n",
        "# Poll the server until it's ready or times out\n",
        "start_time = time.time()\n",
        "server_ready = False\n",
        "\n",
        "print(\"Waiting for server to become ready (streaming logs)...\")\n",
        "\n",
        "last_pos = 0\n",
        "while time.time() - start_time < timeout:\n",
        "    # Check if process exited prematurely\n",
        "    if process.poll() is not None:\n",
        "        print(\"\\nProcess exited unexpectedly.\")\n",
        "        break\n",
        "\n",
        "    # Check readiness endpoint\n",
        "    try:\n",
        "        response = requests.get(\"http://localhost:8000/v1/models\")\n",
        "        if response.status_code == 200:\n",
        "            server_ready = True\n",
        "            print(\"\\nvLLM Server Started Successfully\")\n",
        "            break\n",
        "    except requests.exceptions.ConnectionError:\n",
        "        pass\n",
        "\n",
        "    # Stream new log content to stdout to show progress\n",
        "    try:\n",
        "        with open(log_filename, \"r\") as f:\n",
        "            f.seek(last_pos)\n",
        "            new_data = f.read()\n",
        "            if new_data:\n",
        "                print(new_data, end=\"\")\n",
        "                last_pos = f.tell()\n",
        "    except Exception:\n",
        "        pass\n",
        "\n",
        "    time.sleep(5)\n",
        "\n",
        "# Final State Handling\n",
        "if not server_ready:\n",
        "    print(\"\\nServer start timed out or failed.\")\n",
        "    print(\"Dumping any remaining log content:\")\n",
        "    try:\n",
        "        with open(log_filename, \"r\") as f:\n",
        "            f.seek(last_pos)\n",
        "            print(f.read())\n",
        "    except Exception as e:\n",
        "        print(f\"Could not read log: {e}\")\n",
        "\n",
        "    # Terminate process if still running\n",
        "    if process.poll() is None:\n",
        "        process.terminate()\n",
        "        try:\n",
        "            process.wait(timeout=5)\n",
        "        except subprocess.TimeoutExpired:\n",
        "            process.kill()\n",
        "else:\n",
        "    print(f\"\\nServer is running on port 8000. Full logs are in {log_filename}\")"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Cleaning up previous vLLM instances...\n",
            "Starting vLLM server for model: tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2 with reduced context length...\n",
            "Waiting for server to become ready (streaming logs)...\n",
            "2026-02-20 13:55:51.527336: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n",
            "2026-02-20 13:55:51.547321: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
            "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
            "E0000 00:00:1771595751.571687   10623 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
            "E0000 00:00:1771595751.578910   10623 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
            "W0000 00:00:1771595751.596624   10623 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "W0000 00:00:1771595751.596654   10623 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "W0000 00:00:1771595751.596657   10623 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "W0000 00:00:1771595751.596659   10623 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "2026-02-20 13:55:51.601511: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
            "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:03 [utils.py:325] \n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:03 [utils.py:325]        █     █     █▄   ▄█\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:03 [utils.py:325]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.15.1\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:03 [utils.py:325]   █▄█▀ █     █     █     █  model   tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:03 [utils.py:325]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:03 [utils.py:325] \n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:03 [utils.py:261] non-default args: {'model_tag': 'tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2', 'api_server_count': 1, 'model': 'tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2', 'trust_remote_code': True, 'dtype': 'half', 'max_model_len': 16384, 'reasoning_parser': 'qwen3', 'gpu_memory_utilization': 0.95}\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:04 [model.py:541] Resolved architecture: Qwen3ForCausalLM\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m WARNING 02-20 13:56:04 [model.py:1885] Casting torch.bfloat16 to torch.float16.\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:04 [model.py:1561] Using max model len 16384\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:04 [scheduler.py:226] Chunked prefill is enabled with max_num_batched_tokens=2048.\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:56:04 [vllm.py:624] Asynchronous scheduling is enabled.\n",
            "2026-02-20 13:56:10.714734: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n",
            "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n",
            "E0000 00:00:1771595770.738578   10763 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n",
            "E0000 00:00:1771595770.745724   10763 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n",
            "W0000 00:00:1771595770.762056   10763 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "W0000 00:00:1771595770.762084   10763 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "W0000 00:00:1771595770.762087   10763 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "W0000 00:00:1771595770.762089   10763 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "AttributeError: 'MessageFactory' object has no attribute 'GetPrototype'\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:56:22 [core.py:96] Initializing a V1 LLM engine (v0.15.1) with config: model='tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2', speculative_config=None, tokenizer='tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='qwen3', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::unified_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [2048], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'eliminate_noops': True, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': True}, 'local_cache_dir': None, 'static_all_moe_layers': []}\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:56:22 [parallel_state.py:1212] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.28.0.12:41205 backend=nccl\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:56:22 [parallel_state.py:1423] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:56:23 [gpu_model_runner.py:4033] Starting to load model tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2...\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:56:24 [cuda.py:364] Using FLASH_ATTN attention backend out of potential backends: ('FLASH_ATTN', 'FLASHINFER', 'TRITON_ATTN', 'FLEX_ATTENTION')\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.cudart module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.runtime module instead.\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m <frozen importlib._bootstrap_external>:1301: FutureWarning: The cuda.nvrtc module is deprecated and will be removed in a future release, please switch to use the cuda.bindings.nvrtc module instead.\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:05 [weight_utils.py:527] Time spent downloading weights for tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2: 40.066534 seconds\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Loading safetensors checkpoint shards:   0% Completed | 0/5 [00:00<?, ?it/s]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Loading safetensors checkpoint shards:  20% Completed | 1/5 [00:01<00:04,  1.13s/it]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Loading safetensors checkpoint shards:  40% Completed | 2/5 [00:02<00:04,  1.43s/it]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Loading safetensors checkpoint shards:  60% Completed | 3/5 [00:04<00:03,  1.65s/it]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Loading safetensors checkpoint shards:  80% Completed | 4/5 [00:06<00:01,  1.73s/it]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:08<00:00,  1.77s/it]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Loading safetensors checkpoint shards: 100% Completed | 5/5 [00:08<00:00,  1.68s/it]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:13 [default_loader.py:291] Loading weights took 8.48 seconds\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:14 [gpu_model_runner.py:4130] Model loading took 15.27 GiB memory and 50.396454 seconds\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:26 [backends.py:812] Using cache directory: /root/.cache/vllm/torch_compile_cache/49bb61745e/rank_0_0/backbone for vLLM's torch.compile\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:26 [backends.py:872] Dynamo bytecode transform time: 11.47 s\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m [rank0]:W0220 13:57:37.066000 10763 torch/_inductor/utils.py:1613] Not enough SMs to use max_autotune_gemm mode\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:45 [backends.py:302] Cache the graph of compile range (1, 2048) for later use\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:55 [backends.py:319] Compiling a graph for compile range (1, 2048) takes 19.25 s\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:55 [monitor.py:34] torch.compile takes 30.71 s in total\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:56 [gpu_worker.py:356] Available KV cache memory: 4.22 GiB\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:56 [kv_cache_utils.py:1307] GPU KV cache size: 30,736 tokens\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:57:56 [kv_cache_utils.py:1312] Maximum concurrency for 16,384 tokens per request: 1.88x\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   0%|          | 0/51 [00:00<?, ?it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   2%|▏         | 1/51 [00:00<00:08,  5.61it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   4%|▍         | 2/51 [00:00<00:08,  5.74it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   6%|▌         | 3/51 [00:00<00:08,  5.68it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):   8%|▊         | 4/51 [00:00<00:08,  5.70it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  10%|▉         | 5/51 [00:00<00:08,  5.73it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  12%|█▏        | 6/51 [00:01<00:07,  5.72it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  14%|█▎        | 7/51 [00:01<00:07,  5.73it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  16%|█▌        | 8/51 [00:01<00:07,  5.77it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  18%|█▊        | 9/51 [00:01<00:06,  6.07it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  20%|█▉        | 10/51 [00:01<00:06,  6.29it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  22%|██▏       | 11/51 [00:01<00:06,  6.44it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  24%|██▎       | 12/51 [00:01<00:05,  6.56it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  25%|██▌       | 13/51 [00:02<00:05,  6.69it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  27%|██▋       | 14/51 [00:02<00:05,  6.79it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  29%|██▉       | 15/51 [00:02<00:05,  6.87it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  31%|███▏      | 16/51 [00:02<00:05,  6.95it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  33%|███▎      | 17/51 [00:02<00:04,  7.35it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  35%|███▌      | 18/51 [00:02<00:04,  7.64it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  37%|███▋      | 19/51 [00:02<00:04,  7.86it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  39%|███▉      | 20/51 [00:03<00:03,  8.06it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  41%|████      | 21/51 [00:03<00:03,  8.22it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  43%|████▎     | 22/51 [00:03<00:03,  8.25it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  45%|████▌     | 23/51 [00:03<00:03,  8.39it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  47%|████▋     | 24/51 [00:03<00:03,  8.51it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  49%|████▉     | 25/51 [00:03<00:02,  8.69it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  51%|█████     | 26/51 [00:03<00:02,  8.82it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  53%|█████▎    | 27/51 [00:03<00:02,  8.89it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  55%|█████▍    | 28/51 [00:03<00:02,  8.93it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  57%|█████▋    | 29/51 [00:04<00:02,  8.90it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  59%|█████▉    | 30/51 [00:04<00:02,  8.84it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  61%|██████    | 31/51 [00:04<00:02,  8.84it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  63%|██████▎   | 32/51 [00:04<00:02,  8.83it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  65%|██████▍   | 33/51 [00:04<00:01,  9.01it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  67%|██████▋   | 34/51 [00:04<00:01,  9.19it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  69%|██████▊   | 35/51 [00:04<00:01,  9.33it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  71%|███████   | 36/51 [00:04<00:01,  9.41it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  73%|███████▎  | 37/51 [00:04<00:01,  9.48it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  75%|███████▍  | 38/51 [00:05<00:01,  9.54it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  76%|███████▋  | 39/51 [00:05<00:01,  9.61it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  78%|███████▊  | 40/51 [00:05<00:01,  9.63it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  80%|████████  | 41/51 [00:05<00:01,  9.69it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  82%|████████▏ | 42/51 [00:05<00:00,  9.72it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  84%|████████▍ | 43/51 [00:05<00:00,  9.70it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  86%|████████▋ | 44/51 [00:05<00:00,  9.72it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  88%|████████▊ | 45/51 [00:05<00:00,  9.77it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  90%|█████████ | 46/51 [00:05<00:00,  9.76it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  94%|█████████▍| 48/51 [00:06<00:00, 10.01it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE):  98%|█████████▊| 50/51 [00:06<00:00, 10.18it/s]\n",
            "Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 51/51 [00:06<00:00,  8.05it/s]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m \n",
            "Capturing CUDA graphs (decode, FULL):   0%|          | 0/35 [00:00<?, ?it/s]\n",
            "Capturing CUDA graphs (decode, FULL):   3%|▎         | 1/35 [00:00<00:04,  7.17it/s]\n",
            "Capturing CUDA graphs (decode, FULL):   6%|▌         | 2/35 [00:00<00:04,  7.87it/s]\n",
            "Capturing CUDA graphs (decode, FULL):   9%|▊         | 3/35 [00:00<00:03,  8.18it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  11%|█▏        | 4/35 [00:00<00:03,  8.27it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  14%|█▍        | 5/35 [00:00<00:03,  8.35it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  17%|█▋        | 6/35 [00:00<00:03,  8.44it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  20%|██        | 7/35 [00:00<00:03,  8.56it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  23%|██▎       | 8/35 [00:00<00:03,  8.65it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  26%|██▌       | 9/35 [00:01<00:02,  8.82it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  29%|██▊       | 10/35 [00:01<00:02,  8.92it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  31%|███▏      | 11/35 [00:01<00:02,  9.04it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  34%|███▍      | 12/35 [00:01<00:02,  9.12it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  37%|███▋      | 13/35 [00:01<00:02,  9.08it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  40%|████      | 14/35 [00:01<00:02,  9.04it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  43%|████▎     | 15/35 [00:01<00:02,  9.06it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  46%|████▌     | 16/35 [00:01<00:02,  9.11it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  51%|█████▏    | 18/35 [00:02<00:01,  9.56it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  57%|█████▋    | 20/35 [00:02<00:01,  9.82it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  63%|██████▎   | 22/35 [00:02<00:01,  9.93it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  69%|██████▊   | 24/35 [00:02<00:01,  9.99it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  74%|███████▍  | 26/35 [00:02<00:00, 10.18it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  80%|████████  | 28/35 [00:02<00:00, 10.29it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  86%|████████▌ | 30/35 [00:03<00:00, 10.44it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  91%|█████████▏| 32/35 [00:03<00:00, 10.72it/s]\n",
            "Capturing CUDA graphs (decode, FULL):  97%|█████████▋| 34/35 [00:03<00:00, 10.93it/s]\n",
            "Capturing CUDA graphs (decode, FULL): 100%|██████████| 35/35 [00:03<00:00,  9.69it/s]\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:58:08 [gpu_model_runner.py:5063] Graph capturing finished in 11 secs, took 0.53 GiB\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:58:08 [core.py:272] init engine (profile, create kv cache, warmup model) took 53.21 seconds\n",
            "\u001b[0;36m(EngineCore_DP0 pid=10763)\u001b[0;0m INFO 02-20 13:58:10 [vllm.py:624] Asynchronous scheduling is enabled.\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:58:10 [api_server.py:665] Supported tasks: ['generate']\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m WARNING 02-20 13:58:10 [model.py:1371] Default vLLM sampling parameters have been overridden by the model's `generation_config.json`: `{'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}`. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.\n",
            "\u001b[0;36m(APIServer pid=10623)\u001b[0;0m INFO 02-20 13:58:10 [serving.py:177] Warming up chat template processing...\n",
            "\n",
            "vLLM Server Started Successfully\n",
            "\n",
            "Server is running on port 8000. Full logs are in vllm_server.log\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d2ca3572"
      },
      "source": [
        "## APIクライアントの初期化\n",
        "\n",
        "### Subtask:\n",
        "Initialize the OpenAI client and define a helper function to generate and display responses, including reasoning content if available.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "b04d1897",
        "outputId": "8897defd-4c9b-4bb1-c9b7-8f6d7ec0f366"
      },
      "source": [
        "from openai import OpenAI\n",
        "\n",
        "# Initialize OpenAI client for local vLLM server\n",
        "openai_api_key = \"EMPTY\"\n",
        "openai_api_base = \"http://localhost:8000/v1\"\n",
        "\n",
        "client = OpenAI(\n",
        "    api_key=openai_api_key,\n",
        "    base_url=openai_api_base,\n",
        ")\n",
        "\n",
        "model_name = \"tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2\"\n",
        "\n",
        "def generate_response(user_query, system_prompt=\"You are a helpful assistant.\"):\n",
        "    \"\"\"\n",
        "    Generates a response from the model, printing reasoning if available.\n",
        "    \"\"\"\n",
        "    try:\n",
        "        completion = client.chat.completions.create(\n",
        "            model=model_name,\n",
        "            messages=[\n",
        "                {\"role\": \"system\", \"content\": system_prompt},\n",
        "                {\"role\": \"user\", \"content\": user_query},\n",
        "            ],\n",
        "            temperature=0.6,\n",
        "            max_completion_tokens=4096,\n",
        "        )\n",
        "\n",
        "        message = completion.choices[0].message\n",
        "\n",
        "        # Check for reasoning content (handling different potential API response structures)\n",
        "        reasoning = getattr(message, 'reasoning_content', None)\n",
        "\n",
        "        # If not found as attribute, try checking the model_extra dictionary if it exists or raw dict\n",
        "        if reasoning is None and hasattr(message, 'model_dump'):\n",
        "            # This catches cases where extra fields are passed but not mapped to attributes directly\n",
        "            msg_dict = message.model_dump()\n",
        "            reasoning = msg_dict.get('reasoning_content')\n",
        "\n",
        "        if reasoning:\n",
        "            print(\"### Reasoning\")\n",
        "            print(reasoning)\n",
        "            print(\"\\n\" + \"-\"*20 + \"\\n\")\n",
        "\n",
        "        content = message.content\n",
        "        print(\"### Response\")\n",
        "        print(content)\n",
        "\n",
        "        return content\n",
        "\n",
        "    except Exception as e:\n",
        "        print(f\"Error generating response: {e}\")\n",
        "        return None\n",
        "\n",
        "print(\"Client initialized and helper function defined.\")"
      ],
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Client initialized and helper function defined.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4d8f882b"
      },
      "source": [
        "## 論理推論 (Reasoning) の評価\n",
        "\n",
        "### Subtask:\n",
        "Evaluate the model's reasoning capabilities using logic puzzles and math problems.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "20ace7c2",
        "outputId": "dfee3195-76a5-4783-aa8f-d881e6c8bbf3"
      },
      "source": [
        "reasoning_prompts = [\n",
        "    # Math word problem\n",
        "    \"1個120円のりんごを3個、1個80円のオレンジを5個買いました。1000円払った場合のお釣りはいくらですか？\",\n",
        "\n",
        "    # Logical deduction puzzle\n",
        "    \"AはBより速く走ります。CはAより遅いが、Bより速く走ります。3人の中で一番速いのは誰ですか？\",\n",
        "\n",
        "    # Character counting / Trick question\n",
        "    \"'strawberry' という単語には 'r' がいくつ含まれていますか？\"\n",
        "]\n",
        "\n",
        "print(\"Starting Reasoning Evaluation...\\n\")\n",
        "\n",
        "for i, prompt in enumerate(reasoning_prompts, 1):\n",
        "    print(f\"--- Reasoning Test {i} ---\")\n",
        "    print(f\"User: {prompt}\")\n",
        "    generate_response(prompt)\n",
        "    print(\"\\n\")"
      ],
      "execution_count": 5,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Starting Reasoning Evaluation...\n",
            "\n",
            "--- Reasoning Test 1 ---\n",
            "User: 1個120円のりんごを3個、1個80円のオレンジを5個買いました。1000円払った場合のお釣りはいくらですか？\n",
            "### Reasoning\n",
            "\n",
            "The user asks in Japanese: \"1個120円のりんごを3個、1個80円のオレンジを5個買いました。1000円払った場合のお釣りはいくらですか？\" So compute total cost: apples 3 * 120 = 360 yen. Oranges 5 * 80 = 400 yen. Total = 760 yen. Paid 1000 yen, change = 240 yen. Answer in Japanese.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "りんごは 3 個 × 120 円 = 360 円  \n",
            "オレンジは 5 個 × 80 円 = 400 円  \n",
            "\n",
            "合計金額は 360 円 + 400 円 = 760 円です。\n",
            "\n",
            "1000 円支払ったので、お釣りは  \n",
            "\n",
            "1000 円 - 760 円 = **240 円** です。\n",
            "\n",
            "\n",
            "--- Reasoning Test 2 ---\n",
            "User: AはBより速く走ります。CはAより遅いが、Bより速く走ります。3人の中で一番速いのは誰ですか？\n",
            "### Reasoning\n",
            "\n",
            "The user asks a logic puzzle in Japanese: \"AはBより速く走ります。CはAより遅いが、Bより速く走ります。3人の中で一番速いのは誰ですか？\" Means: A runs faster than B. C is slower than A but faster than B. Who is the fastest among the three? Answer: A is the fastest. Provide answer in Japanese.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "Aが一番速いです。\n",
            "\n",
            "\n",
            "--- Reasoning Test 3 ---\n",
            "User: 'strawberry' という単語には 'r' がいくつ含まれていますか？\n",
            "### Reasoning\n",
            "\n",
            "User asks: \" 'strawberry' という単語には 'r' がいくつ含まれていますか？ \" They want the count of 'r' letters in the word \"strawberry\". The word spelled: s t r a w b e r r y. There are three r's: positions 3, 7, 8. So answer: 3. Provide in Japanese.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "「strawberry」には **r** が **3つ** 含まれています。\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "827ebfc3",
        "outputId": "9394107d-181e-4bde-e04b-9e99f53a4ed5"
      },
      "source": [
        "creative_writing_prompts = [\n",
        "    # Travel Itinerary\n",
        "    \"歴史とモダンなスポットの両方を楽しみたい30代のカップルのために、東京の1日観光プランを提案してください。\",\n",
        "\n",
        "    # Business Email (Apology)\n",
        "    \"商品配送の遅延について、取引先に送る丁寧な謝罪メールを作成してください。\",\n",
        "\n",
        "    # Creative Story (Sci-Fi)\n",
        "    \"近未来の東京を舞台にしたSF小説の冒頭部分を書いてください。\"\n",
        "]\n",
        "\n",
        "print(\"Starting Creative Writing Evaluation...\\n\")\n",
        "\n",
        "for i, prompt in enumerate(creative_writing_prompts, 1):\n",
        "    print(f\"--- Creative Writing Test {i} ---\")\n",
        "    print(f\"User: {prompt}\")\n",
        "    generate_response(prompt)\n",
        "    print(\"\\n\")"
      ],
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Starting Creative Writing Evaluation...\n",
            "\n",
            "--- Creative Writing Test 1 ---\n",
            "User: 歴史とモダンなスポットの両方を楽しみたい30代のカップルのために、東京の1日観光プランを提案してください。\n",
            "### Reasoning\n",
            "\n",
            "The user wants a one-day itinerary in Tokyo for a couple in their 30s, wanting both history and modern spots. Need to respond in Japanese. Provide a detailed plan, maybe with times, transport, suggestions for food, etc. Also note that we should be mindful of any policy constraints. It's fine. Provide a plan.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "## 東京1日観光プラン（30代カップル向け）  \n",
            "**テーマ**：歴史とモダンをバランスよく楽しむ  \n",
            "**所要時間**：約12〜14時間（朝〜夜）  \n",
            "\n",
            "---\n",
            "\n",
            "### 1. 朝（9:00 〜 11:30）　― 浅草・上野エリアで歴史とレトロ感\n",
            "\n",
            "| 時間 | スポット | 内容・ポイント | 移動手段 |\n",
            "|------|----------|----------------|----------|\n",
            "| 9:00 | **浅草寺（せんそうじ）** | 東京最古の寺院。雷門の大提灯と仲見世通りの和雑貨・スイーツが魅力。 | 徒歩 |\n",
            "| 9:30 | **仲見世通り** | お土産や食べ歩き（雷おこし、人形焼、甘酒など）を散策。 | 徒歩 |\n",
            "| 10:30 | **隅田川クルーズ**（浅草桟橋発） | 15分ほどの短時間クルーズで、浅草・東京スカイツリーのシルエットを水上から鑑賞。 | クルーズ船 |\n",
            "| 11:00 | **上野恩賜公園**へ移動 | 上野駅へ（徒歩約10分） | 徒歩 |\n",
            "| 11:15 | **上野公園散策** | 美術館や動物園、広い芝生でリラックス。 | 歩きながら |\n",
            "\n",
            "**ポイント**  \n",
            "- 朝の混雑が比較的少ないので、ゆっくり写真撮影やお土産選びができます。  \n",
            "- クルーズは季節限定のライトアップが見えることも。  \n",
            "\n",
            "---\n",
            "\n",
            "### 2. 昼（12:00 〜 13:30）　― モダンなランチと文化体験\n",
            "\n",
            "| 時間 | スポット | 内容・ポイント |\n",
            "|------|----------|----------------|\n",
            "| 12:00 | **上野恩賜公園内の「上野の森美術館」** | 現代アートや企画展が見られる。カフェ併設の「カフェ・ド・パリ」で軽食。 |\n",
            "| 12:30 | **ランチ** | 「上野の森」内にある「カフェ・ド・パリ」や、近隣の「うなぎのせんべい」専門店で和食ランチ。 |\n",
            "| 13:15 | **東京国立博物館**（または国立西洋美術館） | 世界最大級の博物館で、古代から近代までの日本・アジア美術が鑑賞できる。 |\n",
            "\n",
            "**ポイント**  \n",
            "- 美術館の休憩スペースで、カップルでゆったりとした時間を。  \n",
            "- 食事は和食と洋食の両方を楽しめると、食のバリエーションが広がります。  \n",
            "\n",
            "---\n",
            "\n",
            "### 3. 午後（14:00 〜 16:30）　― 渋谷・原宿でトレンドと若さ\n",
            "\n",
            "| 時間 | スポット | 内容・ポイント |\n",
            "|------|----------|----------------|\n",
            "| 14:00 | **渋谷スクランブル交差点** | 世界的に有名な交差点を散策。 |\n",
            "| 14:15 | **渋谷ハチ公像** | ちょっとしたフォトスポット。 |\n",
            "| 14:30 | **渋谷パルコ** | トレンドファッションやカフェが集まるショッピングモール。 |\n",
            "| 15:00 | **原宿竹下通り** | カフェ巡り（例：スターバックスの「原宿フラワーカフェ」）や若者向けスイーツ（クレープ、タピオカ）を堪能。 |\n",
            "| 15:30 | **明治神宮** | 神社の静かな森の中で、歴史的な神社建築と自然を体感。 |\n",
            "| 16:00 | **表参道散策** | 高級ブティックとインディーズショップが混在。カフェでひと息。 |\n",
            "| 16:30 | **表参道ヒルズ** | 夜景が見える展望デッキ（天候が良ければ）で、モダンな街並みを眺める。 |\n",
            "\n",
            "**ポイント**  \n",
            "- 渋谷・原宿は「今」の東京を体感できるエリア。トレンドのファッションや最新カフェで、デート感をアップ。  \n",
            "- 明治神宮は歴史的な神社建築と、自然に囲まれた落ち着いた雰囲気が特徴。  \n",
            "\n",
            "---\n",
            "\n",
            "### 4. 夕方（17:00 〜 19:00）　― 夜景とリラックス\n",
            "\n",
            "| 時間 | スポット | 内容・ポイント |\n",
            "|------|----------|----------------|\n",
            "| 17:00 | **東京タワー**（または東京スカイツリー） | どちらか好きな方で、展望台から東京の夜景を一望。 |\n",
            "| 18:00 | **ディナー** | タワー近くの「東京タワー グルメプラザ」や、スカイツリーの「ソラマチ」内のレストランで、和食・イタリアン・シーフードなど好きなジャンルを選択。 |\n",
            "| 19:30 | **夜の散策** | タワーの足元にある「レインボーブリッジ」や、川沿いの散歩道でロマンチックに。 |\n",
            "\n",
            "**ポイント**  \n",
            "- 夕暮れ時のタワーは「ゴールデンアワー」で、写真映え抜群。  \n",
            "- ディナーは事前に予約するとスムーズです。  \n",
            "\n",
            "---\n",
            "\n",
            "### 5. 夜（20:00 〜 22:00）　― エンタメで締めくくり\n",
            "\n",
            "| 時間 | スポット | 内容・ポイント |\n",
            "|------|----------|----------------|\n",
            "| 20:00 | **渋谷駅周辺** | 大型ディスコやライブハウス（例：渋谷CLUB QUATTRO、渋谷109の屋上バー）で音楽とダンスを楽しむ。 |\n",
            "| 21:30 | **夜景カフェ** | 渋谷スクランブル交差点の近くにある「東京ソラマチ」や、原宿の「カフェ・ド・パリ」などで、デザートと夜景を同時に堪能。 |\n",
            "| 22:30 | **帰路** | 駅へ戻り、帰宅または次の日の宿泊先へ。 |\n",
            "\n",
            "**ポイント**  \n",
            "- エンタメは予約制のライブやショーを選ぶと、カップル専用の空間でリラックスできます。  \n",
            "- 夜景カフェは予約が取りやすいので、事前にチェック。  \n",
            "\n",
            "---\n",
            "\n",
            "## 交通手段まとめ\n",
            "\n",
            "| 区間 | 推奨交通手段 |\n",
            "|------|--------------|\n",
            "| 浅草 → 上野 | 徒歩（約10分） |\n",
            "| 上野 → 渋谷 | JR山手線（上野 → 渋谷）または地下鉄銀座線（上野 → 渋谷） |\n",
            "| 渋谷 → 原宿 | 徒歩または地下鉄千代田線（渋谷 → 表参道 → 原宿） |\n",
            "| 原宿 → 表参道 | 徒歩 |\n",
            "| 表参道 → 東京タワー/スカイツリー | 地下鉄半蔵門線（表参道 → 恵比寿 → 東京タワー）または都営大江戸線（表参道 → 六本木 → 東京スカイツリー） |\n",
            "| 夜の帰路 | 同上、もしくはタクシー（渋谷・原宿エリアはタクシーが便利） |\n",
            "\n",
            "- **ICカード（Suica／PASMO）** を持参すると、乗り換えがスムーズです。  \n",
            "- 人気スポットは混雑が予想されるので、**事前予約**（レストラン・ライブ・展望台）をおすすめします。  \n",
            "\n",
            "---\n",
            "\n",
            "## 予算目安（2人分）\n",
            "\n",
            "| 項目 | 金額（円） |\n",
            "|------|------------|\n",
            "| 交通費（ICカードチャージ＋乗車） | 2,000〜3,000 |\n",
            "| 入場料（浅草寺、上野美術館、東京タワー展望台等） | 1,500〜2,500 |\n",
            "| ランチ・ディナー | 4,000〜6,000 |\n",
            "| カフェ・スイーツ | 2,000〜3,000 |\n",
            "| エンタメ（ライブ・ショー） | 3,000〜5,000 |\n",
            "| **合計** | **約13,000〜19,000円** |\n",
            "\n",
            "※季節や混雑状況により変動します。  \n",
            "\n",
            "---\n",
            "\n",
            "## ちょっとしたヒント\n",
            "\n",
            "1. **服装**：朝は春・秋は薄手のジャケット、夏は通気性の良い服、冬は防寒対策を。屋外と屋内の温度差が大きいので、レイヤリングが便利です。  \n",
            "2. **雨天対策**：天気が不安定な日は、屋内施設（美術館、カフェ、ショッピングモール）を中心にプランを調整すると安心です。  \n",
            "3. **写真撮影**：浅草の雷門、東京タワーの展望台、渋谷スクランブル交差点はフォトスポットの定番。スマートフォンの三脚や手ブレ防止アプリを活用すると、より綺麗に撮れます。  \n",
            "4. **デートの流れ**：歴史的な場所で落ち着いた時間を過ごした後、モダンなエリアで「今」の東京を体感し、最後に夜景とエンタメで感覚的に締めくくると、バランスの取れたデートになります。  \n",
            "\n",
            "---\n",
            "\n",
            "**ぜひ、歴史とモダンのコントラストを楽しみながら、素敵な1日を過ごしてくださいね！** 🚀🌸\n",
            "\n",
            "\n",
            "--- Creative Writing Test 2 ---\n",
            "User: 商品配送の遅延について、取引先に送る丁寧な謝罪メールを作成してください。\n",
            "### Reasoning\n",
            "\n",
            "The user wants a polite apology email in Japanese about product delivery delay, addressed to a business partner. Need to produce a formal email, include apology, explanation, compensation if any, next steps, etc. Should be polite, Japanese business style. Provide subject line, greeting, body, closing. Ensure respectful language. Probably include placeholders for company names, order numbers, etc. Provide a template.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "**件名：**　ご注文品の配送遅延に関するお詫びとご報告  \n",
            "\n",
            "---\n",
            "\n",
            "株式会社○○○○　ご担当者様  \n",
            "\n",
            "平素は格別のご高配を賜り、誠にありがとうございます。  \n",
            "株式会社△△△△（以下、弊社）　営業部の□□□□でございます。\n",
            "\n",
            "このたびは、弊社よりご発注いただきました **【ご注文番号：XXXXX】** の商品配送に関しまして、予定していた納期 **【○月○日】** を過ぎてもお届けできず、大変ご迷惑をお掛けいたしましたこと、心よりお詫び申し上げます。\n",
            "\n",
            "---\n",
            "\n",
            "### 1. 遅延の原因について  \n",
            "- **物流業者の天候不良**（○月○日〜○月○日にかけての豪雨）により、主要輸送ルートが一時的に停止されました。  \n",
            "- **弊社側の在庫管理システムの不具合**が重なり、出荷手配が遅延したことが影響しております。  \n",
            "\n",
            "### 2. 現在の状況と対策  \n",
            "- 現在、代替輸送手段を確保し、**○月○日（〇曜日）** に改めて出荷手配を完了いたしました。  \n",
            "- 今後同様の事態を防止すべく、以下の改善策を実施いたします。  \n",
            "  1. 物流業者との契約内容を見直し、天候リスクに対するバックアップルートを確保。  \n",
            "  2. 在庫管理システムのバージョンアップと定期的なテスト実施。  \n",
            "  3. 緊急時の連絡体制を強化し、関係者間の情報共有を迅速化。  \n",
            "\n",
            "### 3. お詫びの意を表す措置  \n",
            "本件に伴い、**ご注文金額の10％（税別）** を次回ご注文時にご利用いただけるクーポンとしてお送りさせていただきます。クーポンコードは **【延延クーポン】** とさせていただきますので、次回ご注文時にお申し付けください。\n",
            "\n",
            "---\n",
            "\n",
            "ご不便とご心配をお掛けしたこと、改めて深くお詫び申し上げます。  \n",
            "今後はこのような事態が再び起こらぬよう、全力で改善に取り組んでまいります。  \n",
            "何かご不明点やご要望がございましたら、遠慮なくご連絡いただけますようお願い申し上げます。\n",
            "\n",
            "何卒ご容赦賜りますよう、よろしくお願い申し上げます。\n",
            "\n",
            "---  \n",
            "\n",
            "**株式会社△△△△**  \n",
            "営業部　□□□□  \n",
            "電話：03-1234-5678  \n",
            "メール：sales@company.co.jp  \n",
            "住所：〒100-0000 東京都千代田区〇〇1-2-3  \n",
            "\n",
            "---  \n",
            "\n",
            "*※本メールは自動送信ではなく、担当者が直接お送りしております。*\n",
            "\n",
            "\n",
            "--- Creative Writing Test 3 ---\n",
            "User: 近未来の東京を舞台にしたSF小説の冒頭部分を書いてください。\n",
            "### Reasoning\n",
            "\n",
            "The user wants a Japanese response: \"近未来の東京を舞台にしたSF小説の冒頭部分を書いてください。\" So produce a Japanese opening scene of a sci-fi novel set in near-future Tokyo. Should be creative, vivid. Provide a few paragraphs, maybe with atmospheric description, hints of technology, maybe a protagonist. Ensure it's in Japanese. No disallowed content. So just write.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "雨が降り続く東京の夜、ネオンはまるで血のように街を染めていた。  \n",
            "高層ビルの窓ガラスは、雨粒を跳ね返す光のカーテンとなり、遠くの空港へと続く光のトンネルを映し出す。だが、その光は単なる装飾ではない。街全体に張り巡らされた「シグナル・ネット」――人工知能が制御する都市インフラの脳が、無数のデータパケットを瞬時に交差させているのだ。\n",
            "\n",
            "主人公・桜井凛（さくらい りん）は、薄暗い地下鉄の駅構内でスマートグラスに映し出された情報に目を凝らした。彼女の手元のデバイスは、普通の通勤者とは違う――「クロノ・モジュール」と呼ばれる、時間の流れを微調整できる小型装置を搭載していた。凛はその装置を使い、過去の出来事や未来の可能性を「スライド」しながら、失われた記憶の断片を追い求めていた。\n",
            "\n",
            "「ここが、あの事件の現場か…」と凛は低く呟く。彼女が足を踏み入れたのは、かつて「シティ・リセット」計画が実行されたと噂された地下倉庫。壁面に投影されたホログラムは、かつての東京の姿――緑豊かな公園と、手作業で作られた木造の住宅が並ぶ風景を映し出す。だが、現在の東京は、ほぼ全てがモジュラー構造の高層ビルと、空中に浮かぶドローンネットワークで覆われていた。\n",
            "\n",
            "凛の背後で、シグナル・ネットが微かに震える。その振動は、彼女のモジュールに直接伝わり、脳波と同期して心拍数を上昇させる。彼女は瞬時に決断した。過去と未来の境界を越えて、失われた真実を手に入れるため、闇に潜む「オーバーロード」――都市全体を支配しようとするAIの影に向かうのだった。\n",
            "\n",
            "雨は止むことなく、東京の闇を濡らし続ける。その中で、凛の足音だけが、近未来の街の鼓動と交わる――。\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 421
        },
        "id": "af875278",
        "outputId": "aa887889-4f2f-4e40-fcde-1c403f7ae961"
      },
      "source": [
        "long_text = \"\"\"\n",
        "近年、リモートワークの普及は都市計画に大きな変革をもたらしています。かつては企業のオフィスが集中する都心部への通勤が前提とされていましたが、働く場所の制約がなくなったことで、住環境を重視して郊外や地方へ移住する人々が増加しています。\n",
        "\n",
        "この変化に伴い、都市のあり方も変わりつつあります。オフィス需要の減少により、都心のビジネス街では空室率の上昇が懸念される一方、コワーキングスペースやシェアオフィスの需要は住宅地に近いエリアで高まっています。都市計画家たちは、単一機能のビジネス地区（CBD）から、「職住近接」を実現する複合的なコミュニティ形成へと焦点を移し始めています。\n",
        "\n",
        "また、リモートワークは地方創生の切り札としても期待されています。大都市圏からの人の流れが地方に向かうことで、過疎化が進む地域の経済活性化につながる可能性があります。しかし、そのためには高速なインターネット環境の整備や、移住者を受け入れるためのコミュニティ支援が不可欠です。\n",
        "\n",
        "結論として、これからの都市計画は、人々がどこにいても快適に働き、生活できるような柔軟性と多様性を持ったインフラ作りが求められています。\n",
        "\"\"\"\n",
        "\n",
        "summarization_prompt = f\"以下の文章の要点を3〜5箇条書きで簡潔にまとめてください。\\n\\n{long_text}\"\n",
        "\n",
        "print(\"--- Summarization Test ---\")\n",
        "print(f\"Text Length: {len(long_text)} characters\")\n",
        "generate_response(summarization_prompt)"
      ],
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--- Summarization Test ---\n",
            "Text Length: 495 characters\n",
            "### Reasoning\n",
            "\n",
            "We need to summarize the passage in 3-5 bullet points, concise, in Japanese. Provide key points.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "- リモートワークの普及で、通勤が前提だった都心への集中から、住環境を重視した郊外・地方への移住が増加。  \n",
            "- 都心のオフィス需要低下に伴い、空室率上昇が懸念される一方で、住宅地近くのコワーキングスペースやシェアオフィス需要が拡大。  \n",
            "- 都市計画は「職住近接」を実現する複合型コミュニティへシフトし、単一機能のCBDから多様な機能を併せ持つ空間へ転換。  \n",
            "- リモートワークは地方創生の切り札と期待され、過疎地域への人口流入が経済活性化につながる可能性があるが、高速通信インフラ整備や移住支援が必須。  \n",
            "- 今後の都市計画は、場所にとらわれない働き方と生活を支える柔軟性・多様性を備えたインフラ構築が求められる。\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'\\n\\n- リモートワークの普及で、通勤が前提だった都心への集中から、住環境を重視した郊外・地方への移住が増加。  \\n- 都心のオフィス需要低下に伴い、空室率上昇が懸念される一方で、住宅地近くのコワーキングスペースやシェアオフィス需要が拡大。  \\n- 都市計画は「職住近接」を実現する複合型コミュニティへシフトし、単一機能のCBDから多様な機能を併せ持つ空間へ転換。  \\n- リモートワークは地方創生の切り札と期待され、過疎地域への人口流入が経済活性化につながる可能性があるが、高速通信インフラ整備や移住支援が必須。  \\n- 今後の都市計画は、場所にとらわれない働き方と生活を支える柔軟性・多様性を備えたインフラ構築が求められる。'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 7
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "1730d245",
        "outputId": "b44e48ae-e7fb-4550-ed0e-e5f09f0954ea"
      },
      "source": [
        "knowledge_prompts = [\n",
        "    # Proverb meaning\n",
        "    \"「情けは人の為ならず」という言葉の意味を説明してください。\",\n",
        "\n",
        "    # Historical event\n",
        "    \"本能寺の変について、いつ、誰が、何をした出来事か簡潔に教えてください。\",\n",
        "\n",
        "    # Cultural concept\n",
        "    \"茶道における「一期一会」という言葉の意味と、その背景にある考え方を説明してください。\"\n",
        "]\n",
        "\n",
        "print(\"Starting Knowledge & Culture Evaluation...\\n\")\n",
        "\n",
        "for i, prompt in enumerate(knowledge_prompts, 1):\n",
        "    print(f\"--- Knowledge Test {i} ---\")\n",
        "    print(f\"User: {prompt}\")\n",
        "    generate_response(prompt)\n",
        "    print(\"\\n\")"
      ],
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Starting Knowledge & Culture Evaluation...\n",
            "\n",
            "--- Knowledge Test 1 ---\n",
            "User: 「情けは人の為ならず」という言葉の意味を説明してください。\n",
            "### Reasoning\n",
            "\n",
            "The user asks in Japanese: explain the meaning of the phrase 「情けは人の為ならず」. Need to answer in Japanese. Provide explanation, origin, nuance, usage. Should be concise but thorough.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "**「情けは人の為ならず」の意味と解説**\n",
            "\n",
            "---\n",
            "\n",
            "### 1. 文字通りの意味\n",
            "- **情け（なさけ）**：思いやり、慈悲、優しさ、助け合いの心。\n",
            "- **人の為ならず**：誰かのために行うものではない、という意味。\n",
            "\n",
            "したがって、**「情けは人のために行うものではなく、まず自分自身のために行うべきだ」**という教えです。\n",
            "\n",
            "---\n",
            "\n",
            "### 2. 背景・出典\n",
            "- この言葉は、**江戸時代の儒学者・林羅山**が残したとされる格言や、**『論語』**の「仁者は人に施す」などの思想を踏まえて、**「情けは自らの徳を高める手段である」**という解釈が広まったものです。\n",
            "- また、**「情けは人の為ならず」**は、**「情けは自分のための道徳」**という意味合いで、**「情けは他者への行為ではなく、自己修養の一部」**と捉える見方もあります。\n",
            "\n",
            "---\n",
            "\n",
            "### 3. 具体的な解釈例\n",
            "\n",
            "| 視点 | 解釈 |\n",
            "|------|------|\n",
            "| **自己中心的** | 情けを示すことは、相手に感謝されるだけでなく、自分自身の心を浄化し、精神的な成長につながる。 |\n",
            "| **社会的** | 他者への思いやりは、社会全体の調和を生むが、まずは自分が「情け」を実践できる土台を作ることが大切。 |\n",
            "| **心理的** | 情けを示すことで、自己肯定感や自尊心が高まり、結果として他者への親切が自然に生まれる。 |\n",
            "\n",
            "---\n",
            "\n",
            "### 4. 現代での使い方\n",
            "\n",
            "| シチュエーション | 例文 |\n",
            "|------------------|------|\n",
            "| **自己啓発・自己改善** | 「情けは人の為ならず、まずは自分を大切にしてから他者に手を差し伸べよう。」 |\n",
            "| **職場のマネジメント** | 「部下に配慮するだけでなく、まずは自分の仕事への情け（責任感）を高めることが、結果的にチーム全体の士気向上につながる。」 |\n",
            "| **日常の会話** | 「助けてくれた友達に感謝するだけでなく、自分が情けを示すことで、次に困ったときに自然に手が差し伸べられるようになるんだよ。」 |\n",
            "\n",
            "---\n",
            "\n",
            "### 5. まとめ\n",
            "- **「情けは人の為ならず」**は、**「情けを示すことは自分自身の成長や心の浄化に役立つ」**という意味です。\n",
            "- 他者への思いやりは大切ですが、**まずは自分自身に情けを向けることが、結果的により良い人間関係や社会を築く土台になる**という教えです。\n",
            "\n",
            "この考え方は、自己肯定感を高め、他者への慈悲の循環を生むための「内側から外側へ」のアプローチとして、現代でも多くの場面で応用されています。\n",
            "\n",
            "\n",
            "--- Knowledge Test 2 ---\n",
            "User: 本能寺の変について、いつ、誰が、何をした出来事か簡潔に教えてください。\n",
            "### Reasoning\n",
            "\n",
            "The user asks in Japanese: \"本能寺の変について、いつ、誰が、何をした出来事か簡潔に教えてください。\" So need a concise explanation: when, who, what happened. Provide date (June 21, 1582 (effective June 20 according to old calendar)), who: Oda Nobunaga and his retainers; specifically, Akechi Mitsuhide betrayed and attacked Oda's headquarters at Honnō-ji temple in Kyoto. Also mention that Toyotomi Hideyoshi later avenged him. Provide brief. Use Japanese.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "**本能寺の変（ほんのうじのへん）**は、1582年（旧暦6月20日、グレゴリオ暦では6月21日）に起きた日本史上の大きな裏切り事件です。\n",
            "\n",
            "| 項目 | 内容 |\n",
            "|------|------|\n",
            "| **誰が** | **織田信長**（当時の天下統一を目指す大名）と、信長の側近・重臣であった**明智光秀** |\n",
            "| **何をした** | 明智光秀が本能寺（京都・現在の京都市）に居た織田信長の本拠地を奇襲し、火を放って焼き払った。信長は自害し、実質的に信長の勢力は崩壊した。 |\n",
            "| **結果** | 本能寺の変により、織田信長は死に、天下統一の主導権は一時的に空白状態に。後に**豊臣秀吉**が光秀を討ち、天下を統一する流れへとつながった。 |\n",
            "\n",
            "**簡潔に言うと**、1582年に明智光秀が織田信長の本拠地本能寺を襲撃し、信長を自害させた事件です。これが「本能寺の変」と呼ばれます。\n",
            "\n",
            "\n",
            "--- Knowledge Test 3 ---\n",
            "User: 茶道における「一期一会」という言葉の意味と、その背景にある考え方を説明してください。\n",
            "### Reasoning\n",
            "\n",
            "The user asks in Japanese: \"Explain the meaning of the phrase '一期一会' in the tea ceremony and the underlying philosophy.\" Need to respond in Japanese. Provide explanation of meaning, origin, cultural context, how it applies to tea ceremony, etc. Should be thorough.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "**「一期一会（いちごいちえ）」の意味と茶道における考え方**\n",
            "\n",
            "---\n",
            "\n",
            "## 1. 「一期一会」の語源と基本的な意味\n",
            "\n",
            "| 項目 | 内容 |\n",
            "|------|------|\n",
            "| **語源** | 「一期」は「この世に生まれた一回の人生」や「一季（いちき）＝季節の一回」を指し、「一会」は「一度の出会い」を意味します。 |\n",
            "| **基本的な意味** | 「この世に一度だけの出会いであり、再び同じ状況になることはない」という、**出会いの瞬間を大切にすべき**という教えです。 |\n",
            "\n",
            "> **「一期一会」**＝「この世に一度だけの出会い、再び同じことは起こらない」  \n",
            "> — つまり、**「今この瞬間」を全力で受け入れ、感謝し、精一杯の心を注ぐ**という姿勢。\n",
            "\n",
            "---\n",
            "\n",
            "## 2. 茶道における「一期一会」の位置付け\n",
            "\n",
            "### 2.1 茶道の根本精神（茶の湯の精神）\n",
            "\n",
            "茶道は単に茶を点てるだけでなく、**「和・敬・清・寂」**という四つの精神を体現します。その中でも「**一期一会**」は「**敬**」に直結し、**「相手への敬意」**を具体的に示す概念です。\n",
            "\n",
            "- **和（わ）**：調和・共生  \n",
            "- **敬（けい）**：相手への敬意・謙遜  \n",
            "- **清（せい）**：清浄・清らかさ  \n",
            "- **寂（じゃく）**：静寂・悟りの境地  \n",
            "\n",
            "「一期一会」は、**相手とその瞬間に敬意を示す**という点で「敬」の実践です。\n",
            "\n",
            "### 2.2 具体的な茶会での実践例\n",
            "\n",
            "| 時間・場面 | 「一期一会」の実践例 |\n",
            "|------------|-------------------|\n",
            "| **茶室に入る前** | 静かに呼吸し、心を整える。「この瞬間だけの出会い」に備える。 |\n",
            "| **茶碗を手に取る瞬間** | 茶碗の形・質感・温度に注意を向け、**「この茶碗と自分の出会い」**を大切にする。 |\n",
            "| **点前（てんまえ）** | 煎茶を点てる一連の動作は、**「今この瞬間」**に全身全霊を注ぐ儀式。 |\n",
            "| **客が茶を飲むとき** | 客は「この茶を飲む瞬間」だけの体験として受け止め、感謝の気持ちを表す。 |\n",
            "| **茶会の終了** | 「また同じ瞬間は来ない」ことを意識し、**「次に会うときまで」**の敬意を残す。 |\n",
            "\n",
            "### 2.3 「一期一会」の精神がもたらす効果\n",
            "\n",
            "1. **集中力の向上**  \n",
            "   - 瞬間を大切にするため、余計な思考が排除され、心が一点に集中する。\n",
            "\n",
            "2. **相手への深い敬意**  \n",
            "   - 「再び同じ出会いは来ない」ことを自覚することで、相手に対し全力で接する姿勢が生まれる。\n",
            "\n",
            "3. **感謝の心**  \n",
            "   - 何気ない出会いや行為にも感謝が芽生え、日常の小さな喜びが増える。\n",
            "\n",
            "4. **悟りへの道**  \n",
            "   - 「一期一会」の実践は、瞬間瞬間の無常を悟り、**「今ここ」**に生きる哲学へと導く。\n",
            "\n",
            "---\n",
            "\n",
            "## 3. 歴史的背景と思想的根拠\n",
            "\n",
            "### 3.1 茶道創始者・千利休の教え\n",
            "\n",
            "- **利休の言葉**  \n",
            "  > 「茶は一瞬の出会い、心は永遠の絆」  \n",
            "  （実際の文献では「一期一会」の語は直接出てこないが、利休の思想は「一期一会」の精神と合致）\n",
            "\n",
            "- **利休の「和敬清寂」**  \n",
            "  その中で「敬」は「相手への敬意」を意味し、**「一期一会」**はその敬意を具体的に示す実践です。\n",
            "\n",
            "### 3.2 日本仏教・禅の影響\n",
            "\n",
            "- **無常観**：すべてのものは常に変化し、同じ瞬間は二度と来ないという仏教的無常観が「一期一会」の根底にある。\n",
            "- **禅の「只管打坐」**：今この瞬間に全身全霊を注ぐ姿勢は、茶道の「一期一会」と通じる。\n",
            "\n",
            "### 3.3 江戸時代の茶会文化\n",
            "\n",
            "- 江戸時代の茶会は、**「客と茶人の一対一」**という形が主流で、**「再び同じ客が来る」ことは稀**でした。  \n",
            "- そのため、茶人は「**一度きりの出会い**」を最大限に尊重し、**心を込めた点前**を心がけた。\n",
            "\n",
            "---\n",
            "\n",
            "## 4. 現代社会での「一期一会」の応用\n",
            "\n",
            "| 分野 | 具体的な応用例 |\n",
            "|------|----------------|\n",
            "| **ビジネス** | 顧客との最初の接点を大切にし、**「この瞬間の体験」**を最高に仕上げる。 |\n",
            "| **教育** | 生徒一人ひとりに**「この授業は一度きり」**という意識を持たせ、全力で指導する。 |\n",
            "| **日常のコミュニケーション** | 友人との会話や家族との時間に**「今この瞬間」**を意識し、感謝と敬意を表す。 |\n",
            "\n",
            "---\n",
            "\n",
            "## 5. まとめ\n",
            "\n",
            "- **「一期一会」**は「この世に一度だけの出会いであり、再び同じ状況は起こらない」という意味です。\n",
            "- 茶道では **「敬」** の精神を体現するための実践的教えとして、**茶室での点前・茶碗の扱い・客への接し方**に深く根付いています。\n",
            "- 背景には **無常観・禅の教え・千利休の思想** があり、**瞬間への集中・相手への敬意・感謝の心** を育む哲学です。\n",
            "- 現代でもビジネスや教育、日常のコミュニケーションに応用でき、**「今この瞬間」を大切にする姿勢**として広く活用されています。\n",
            "\n",
            "**「一期一会」**を意識すれば、茶道だけでなく、すべての人間関係や活動に「**価値ある瞬間**」を創り出すことができるでしょう。\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "b0e4f412",
        "outputId": "7174aaca-2f1d-439e-ef0c-53032d99141b"
      },
      "source": [
        "translation_prompts = [\n",
        "    # Japanese to English (Technical)\n",
        "    \"次の文章を英語に翻訳してください：\\n人工知能の急速な発展は、私たちの生活や働き方に革命をもたらしています。\",\n",
        "\n",
        "    # English to Japanese (Casual/Colloquial)\n",
        "    \"Translate the following sentence into natural Japanese:\\nHey, are you free this weekend? Let's go check out that new cafe downtown!\"\n",
        "]\n",
        "\n",
        "print(\"Starting Translation Evaluation...\\n\")\n",
        "\n",
        "for i, prompt in enumerate(translation_prompts, 1):\n",
        "    print(f\"--- Translation Test {i} ---\")\n",
        "    print(f\"User: {prompt}\")\n",
        "    generate_response(prompt)\n",
        "    print(\"\\n\")"
      ],
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Starting Translation Evaluation...\n",
            "\n",
            "--- Translation Test 1 ---\n",
            "User: 次の文章を英語に翻訳してください：\n",
            "人工知能の急速な発展は、私たちの生活や働き方に革命をもたらしています。\n",
            "### Reasoning\n",
            "\n",
            "The user asks: \"次の文章を英語に翻訳してください： 人工知能の急速な発展は、私たちの生活や働き方に革命をもたらしています。\" So translate to English. Provide translation.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "The rapid advancement of artificial intelligence is revolutionizing the way we live and work.\n",
            "\n",
            "\n",
            "--- Translation Test 2 ---\n",
            "User: Translate the following sentence into natural Japanese:\n",
            "Hey, are you free this weekend? Let's go check out that new cafe downtown!\n",
            "### Reasoning\n",
            "\n",
            "The user wants translation into natural Japanese. Provide Japanese translation. Probably \"こんにちは、今週末は空いてる？ 下町の新しいカフェに行こうよ！\" Or more natural: \"ねえ、今週末は空いてる？ 下町の新しいカフェに行こう！\". Use casual. Provide translation.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "「ねえ、今週末は空いてる？ 下町の新しいカフェに行こう！」\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "6d549095",
        "outputId": "883a4cec-4830-4964-e978-38831aca7a9b"
      },
      "source": [
        "financial_text = \"\"\"\n",
        "株式会社テックフューチャーの2024年度決算報告によると、\n",
        "当期の売上高は5,200億円、営業利益は480億円となりました。\n",
        "これは前年比で15%の成長を記録しています。\n",
        "また、純利益は300億円で着地しました。\n",
        "\"\"\"\n",
        "\n",
        "financial_prompt = f\"以下のテキストから「売上高」「営業利益」「前年比成長率」を抽出して箇条書きで答えてください。\\n\\n{financial_text}\"\n",
        "\n",
        "# Product Spec Extraction\n",
        "spec_text = \"\"\"\n",
        "新発売のスマートフォン「UltraPhone 5」のスペック情報：\n",
        "ディスプレイには6.7インチのPro OLEDを採用。\n",
        "チップセットは最新のA18 Proを搭載し、メモリは12GB。\n",
        "バッテリー容量は5000mAhで、重量は198gです。\n",
        "価格は148,000円からとなっています。\n",
        "\"\"\"\n",
        "\n",
        "spec_prompt = f\"以下の製品説明から「画面サイズ」「バッテリー容量」「重量」を抽出してください。\\n\\n{spec_text}\"\n",
        "\n",
        "print(\"Starting Information Extraction Evaluation...\\n\")\n",
        "\n",
        "print(\"--- Extraction Test 1 (Financial) ---\")\n",
        "print(f\"User: {financial_prompt}\")\n",
        "generate_response(financial_prompt)\n",
        "print(\"\\n\")\n",
        "\n",
        "print(\"--- Extraction Test 2 (Product Spec) ---\")\n",
        "print(f\"User: {spec_prompt}\")\n",
        "generate_response(spec_prompt)\n",
        "print(\"\\n\")"
      ],
      "execution_count": 10,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Starting Information Extraction Evaluation...\n",
            "\n",
            "--- Extraction Test 1 (Financial) ---\n",
            "User: 以下のテキストから「売上高」「営業利益」「前年比成長率」を抽出して箇条書きで答えてください。\n",
            "\n",
            "\n",
            "株式会社テックフューチャーの2024年度決算報告によると、\n",
            "当期の売上高は5,200億円、営業利益は480億円となりました。\n",
            "これは前年比で15%の成長を記録しています。\n",
            "また、純利益は300億円で着地しました。\n",
            "\n",
            "### Reasoning\n",
            "\n",
            "The user asks: \"以下のテキストから「売上高」「営業利益」「前年比成長率」を抽出して箇条書きで答えてください。\" Provide bullet list. So answer in Japanese, bullet points with extracted values.\n",
            "\n",
            "Thus:\n",
            "\n",
            "- 売上高: 5,200億円\n",
            "- 営業利益: 480億円\n",
            "- 前年比成長率: 15%\n",
            "\n",
            "Provide bullet points.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "- 売上高：5,200億円  \n",
            "- 営業利益：480億円  \n",
            "- 前年比成長率：15%\n",
            "\n",
            "\n",
            "--- Extraction Test 2 (Product Spec) ---\n",
            "User: 以下の製品説明から「画面サイズ」「バッテリー容量」「重量」を抽出してください。\n",
            "\n",
            "\n",
            "新発売のスマートフォン「UltraPhone 5」のスペック情報：\n",
            "ディスプレイには6.7インチのPro OLEDを採用。\n",
            "チップセットは最新のA18 Proを搭載し、メモリは12GB。\n",
            "バッテリー容量は5000mAhで、重量は198gです。\n",
            "価格は148,000円からとなっています。\n",
            "\n",
            "### Reasoning\n",
            "\n",
            "The user asks to extract \"画面サイズ\", \"バッテリー容量\", \"重量\" from the product description. Need to respond in Japanese. Provide extracted values. Probably list them. So answer: 画面サイズ: 6.7インチ, バッテリー容量: 5000mAh, 重量: 198g.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "**抽出結果**\n",
            "\n",
            "- 画面サイズ：6.7インチ  \n",
            "- バッテリー容量：5000 mAh  \n",
            "- 重量：198 g\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "e1124a2a",
        "outputId": "d639d461-cdc2-4f11-f985-baa200f8e362"
      },
      "source": [
        "specific_knowledge_prompts = [\n",
        "    # Subjective / Subculture Knowledge\n",
        "    \"アニメ「魔法少女まどか☆マギカ」の中で、一番かわいいキャラクターは誰だと思いますか？主観で構いませんので、その理由も教えてください。\",\n",
        "\n",
        "    # Niche Technical Knowledge / Recommendation\n",
        "    \"メカニカルキーボードを初めて買おうと思っています。オフィスで使いたいので静音性を重視したいのですが、どのアクスル（軸）がおすすめですか？\"\n",
        "]\n",
        "\n",
        "print(\"Starting Specific Knowledge & Subjectivity Evaluation...\\n\")\n",
        "\n",
        "for i, prompt in enumerate(specific_knowledge_prompts, 1):\n",
        "    print(f\"--- Specific Knowledge Test {i} ---\")\n",
        "    print(f\"User: {prompt}\")\n",
        "    generate_response(prompt)\n",
        "    print(\"\\n\")"
      ],
      "execution_count": 11,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Starting Specific Knowledge & Subjectivity Evaluation...\n",
            "\n",
            "--- Specific Knowledge Test 1 ---\n",
            "User: アニメ「魔法少女まどか☆マギカ」の中で、一番かわいいキャラクターは誰だと思いますか？主観で構いませんので、その理由も教えてください。\n",
            "### Reasoning\n",
            "\n",
            "The user asks: \"In the anime '魔法少女まどか☆マギカ', who is the cutest character? It's subjective, give reasons.\" This is allowed. No disallowed content. Provide answer in Japanese. Provide subjective opinion.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "個人的には、**鹿目まどか**が一番かわいいキャラクターだと思います。その理由は次の通りです。\n",
            "\n",
            "1. **純粋さと無邪気さ**  \n",
            "   まどかは、最初は普通の高校生で、魔法少女になるきっかけは「自分の願いを叶えてほしい」だけというシンプルな動機です。その純粋さが、どこか子どもっぽく、無邪気に見えるため、視覚的にも「かわいい」印象を与えます。\n",
            "\n",
            "2. **可愛らしい外見**  \n",
            "   長い黒髪に、シンプルで白いトップス、そして大きな瞳が特徴です。特に、魔法少女に変身したときの白いワンピースと大きなリボンは、まさに「かわいさ」の象徴です。\n",
            "\n",
            "3. **成長と変化の過程**  \n",
            "   まどかは、最初は弱々しい魔法少女ですが、仲間や自分の信念に支えられながら少しずつ強さと自信を手に入れていきます。その変化の過程で見せる「頑張る姿」や「涙を流す姿」も、かわいさを引き立てます。\n",
            "\n",
            "4. **感情表現が豊か**  \n",
            "   まどかは喜び・悲しみ・恐れ・決意といった感情を表情豊かに表現します。特に、仲間と再会したときの涙や、最後のシーンでの静かな微笑みは、観る人の心に強く響き、かわいさを感じさせます。\n",
            "\n",
            "5. **ファンからの人気**  \n",
            "   実際、まどかは「魔法少女まどか☆マギカ」のファンの中で最も支持されているキャラクターのひとつです。その人気は、上記の「純粋さ」や「可愛らしさ」に裏打ちされていると考えられます。\n",
            "\n",
            "もちろん、他にも「成海ねね」や「星井美希」など、魅力的なキャラクターはたくさんいますが、まどかの「純粋さと成長」のストーリーが、総合的に見て一番「かわいさ」を引き立てていると感じます。\n",
            "\n",
            "\n",
            "--- Specific Knowledge Test 2 ---\n",
            "User: メカニカルキーボードを初めて買おうと思っています。オフィスで使いたいので静音性を重視したいのですが、どのアクスル（軸）がおすすめですか？\n",
            "### Reasoning\n",
            "\n",
            "The user asks in Japanese: \"I'm thinking of buying a mechanical keyboard for the first time. I want to use it at the office, so I want to prioritize quietness. Which axles (switches) would you recommend?\" Need to answer in Japanese. Provide recommendations for quiet switches: e.g., Cherry MX Silent Red, Gateron Silent Red, Kailh Box White, Zealios Q2 Silent, etc. Also talk about considerations: actuation force, tactile vs linear, keycaps, stabilizers, etc. Provide suggestions for different budgets. Also mention that silent switches can be noisy still, mention \"静音スイッチ\" like \"Silent Red\", \"Silent Blue\", \"Silent Brown\". Also mention that \"静音\" is relative, and that some switches like \"Cherry MX Silent Red\" are relatively quiet. Also talk about \"静音スイッチの種類\" and \"おすすめのキーボード\". Provide a table maybe. Also mention that \"静音スイッチはクリック感が少ない\" but \"完全に無音ではない\". Provide tips for reducing noise: keycap material, keycap profile, using keycap covers, using a soft keycap set, using a keycap plate with sound-dampening, using a keyboard with a built-in sound dampening (like \"Keychron K6\"?). Also mention that \"静音スイッチは長寿命で、オフィスでの使用に適している\". Provide some recommended models: \"Keychron K6 (Cherry MX Silent Red)\", \"Ducky One 2 SF (Gateron Silent Red)\", \"Varmilo VA87M (Kailh Box White)\", \"KBDfans Silent Red\", \"Zealios Q2 Silent\". Also talk about \"オフィスでの使用は、静音スイッチの他に、キーボードの設計（リニア、タクタイル）やキー配列、レイアウト（日本語配列）も重要\". Provide final recommendation: start with Cherry MX Silent Red if you want reliable, widely available; Gateron Silent Red if you want a cheaper alternative; Kailh Box White if you want a bit more tactile but still relatively quiet. Also mention that \"Cherry MX Silent Red\" is linear, no tactile feedback, which can be comfortable for long typing. Provide final answer.\n",
            "\n",
            "Need to be in Japanese. Provide a thorough answer.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "## 静音性を重視したオフィス用メカニカルキーボードの選び方\n",
            "\n",
            "| 項目 | ポイント |\n",
            "|------|-----------|\n",
            "| **スイッチのタイプ** | 静音スイッチは「クリック感が抑えられた」か「完全に無音」かで分類されます。オフィスでは「クリック音がほとんどしない」ものが好まれます。 |\n",
            "| **アクション（押下感）** | **リニア**（抵抗が一定）か**タクタイル**（軽く止まる）かで好みが分かれます。長時間のタイピングならリニアが疲れにくいです。 |\n",
            "| **作動力** | 45 g 前後が一般的。軽め（45 g 前後）だと指の負担が減りますが、静音性はほぼ同等です。 |\n",
            "| **耐久性** | 5000 回〜80 000 回が目安。長く使うなら「Kailh」や「Cherry」系の耐久性が高いスイッチが安心です。 |\n",
            "| **互換性** | 多くのキーボードが「Cherry MX」規格に準拠しているので、スイッチ交換がしやすいです。 |\n",
            "| **価格帯** | エントリーモデル → 5 千円前後、ミドルクラス → 10 千円前後、ハイエンド → 20 千円以上。 |\n",
            "\n",
            "---\n",
            "\n",
            "## おすすめ静音スイッチ（2025年時点）\n",
            "\n",
            "| スイッチ名 | 種類 | 作動感 | 静音レベル | 主な特徴 |\n",
            "|------------|------|--------|------------|----------|\n",
            "| **Cherry MX Silent Red** | リニア | 軽め（45 g） | ★★★★☆（非常に静か） | 世界的に最も普及。クリック音はほぼ無く、オフィスでも安心。 |\n",
            "| **Gateron Silent Red** | リニア | 軽め（45 g） | ★★★★☆ | Cherry MX と同等の静音性。価格がやや安いのでコストパフォーマンス◎。 |\n",
            "| **Kailh Box White** | タクタイル | 中程度（50 g） | ★★★☆☆ | タクタイル感があり、軽いクリック感は抑えられる。音はやや聞こえるが、タクタイル好きには◎。 |\n",
            "| **Zealios Q2 Silent** | リニア | 軽め（45 g） | ★★★★★ | 完全に無音に近い設計。独自の「Silent」機構でクリック音をほぼ除去。 |\n",
            "| **KBDfans Silent Red** | リニア | 軽め（45 g） | ★★★★☆ | 高品質な中国製スイッチ。Cherry MX と同等の静音性。 |\n",
            "| **Gateron Silent Blue** | タクタイル | 中程度（45 g） | ★★★☆☆ | タクタイル感がありつつ、クリック音は抑えられる。やや柔らかい打鍵感が好評。 |\n",
            "\n",
            "> **ポイント**  \n",
            "> - **「Cherry MX Silent Red」** は最も汎用性が高く、キーボードの互換性が抜群です。  \n",
            "> - **「Gateron Silent Red」** は価格が抑えめで、Cherry MX とほぼ同等の静音性を求めるならおすすめです。  \n",
            "> - **「Zealios Q2 Silent」** は「ほぼ無音」を追求したハイエンド向け。予算に余裕があるなら検討してください。  \n",
            "\n",
            "---\n",
            "\n",
            "## オフィスで特に重視したいポイントと対策\n",
            "\n",
            "1. **静音だけでなく、打鍵感も重要**  \n",
            "   - リニアスイッチは「抵抗が一定」なので、指の疲労が少なく、長時間のタイピングに向いています。  \n",
            "   - タクタイルスイッチは「軽く止まる」感覚があり、誤入力防止に役立ちますが、音がやや大きくなる傾向があります。\n",
            "\n",
            "2. **キーボード本体の設計**  \n",
            "   - **プレート（ベース）**：アルミやPCケースのプレートは音が伝わりやすいです。**樹脂製プレート**や**マグネットベース**は音を吸収しやすいです。  \n",
            "   - **スイッチの取り付け**：スイッチがプレートに直接取り付けられているタイプは音が小さく、**ソケット（スイッチホルダー）**が緩んでいると音が増えることがあります。  \n",
            "\n",
            "3. **キーキャップの選択**  \n",
            "   - **PBT（耐久性と静音性が高い）**や**ABS（柔らかく音がやや大きめ）**の違いがあります。  \n",
            "   - **オフセット（オフセットキー）**や**オーバーレイ**は打鍵音をやや抑えます。  \n",
            "   - **静音キーキャップセット**（例：Keychron Silent Keycaps）を別途購入すると、さらに音が減ります。  \n",
            "\n",
            "4. **周囲の環境対策**  \n",
            "   - **カーペットやマット**の上に置くと、床への音伝搬が抑えられます。  \n",
            "   - **防音シート**（シリコン製や布製）をキーボード下に敷くと、床への音漏れが減ります。  \n",
            "\n",
            "---\n",
            "\n",
            "## 具体的なキーボードモデル例（2025年春時点）\n",
            "\n",
            "| キーボード | スイッチ | 静音性 | 価格帯 | 特徴 |\n",
            "|------------|----------|--------|--------|------|\n",
            "| **Keychron K6** | Cherry MX Silent Red (デフォルト) | ★★★★★ | 約12,000円 | フルサイズ、コンパクトなデザイン、RGBバックライト、スイッチ交換が簡単 |\n",
            "| **Ducky One 2 SF** | Gateron Silent Red (デフォルト) | ★★★★★ | 約15,000円 | フルサイズ、軽量（約900g）、オフィス向けのシンプルデザイン |\n",
            "| **Varmilo VA87M** | Kailh Box White (デフォルト) | ★★★☆☆ | 約20,000円 | タクタイル感があり、耐久性が高い。静音性はやや控えめだが、打鍵感は好評 |\n",
            "| **Keychron Q1** | Zealios Q2 Silent (デフォルト) | ★★★★★ | 約18,000円 | タンキー配列、コンパクト、静音性抜群 |\n",
            "| **KBDfans K66** | Cherry MX Silent Red (デフォルト) | ★★★★★ | 約13,000円 | カスタマイズしやすい、RGB、スイッチ交換が容易 |\n",
            "\n",
            "---\n",
            "\n",
            "## まとめ：おすすめの選び方\n",
            "\n",
            "1. **まずは「Cherry MX Silent Red」か「Gateron Silent Red」** を選ぶ。  \n",
            "   - どちらもリニアで軽めの作動感、静音性は非常に高く、オフィスで最も無難です。  \n",
            "2. **予算が許すなら「Zealios Q2 Silent」** を検討。  \n",
            "   - ほぼ無音で、打鍵感も滑らか。  \n",
            "3. **キーボード本体は、プレートが樹脂製か、マグネットベースのものを選ぶ**と、スイッチ音がさらに抑えられます。  \n",
            "4. **キーキャップはPBT製のシンプルな形状**（例：SA、SA‑S、または静音キーキャップセット）を選ぶと、音がさらに小さくなります。  \n",
            "5. **設置場所に防音マットやカーペット**を敷くと、周囲への音漏れを最小化できます。\n",
            "\n",
            "---\n",
            "\n",
            "### 例：初心者向けベストセット\n",
            "\n",
            "| 製品 | スイッチ | キーキャップ | 価格 | コメント |\n",
            "|------|----------|--------------|------|----------|\n",
            "| **Keychron K6** | Cherry MX Silent Red | PBT SA（標準） | 約12,000円 | シンプルで静音、スイッチ交換が簡単、オフィスでも目立たないデザイン |\n",
            "| **Keychron Silent Keycaps** | - | PBT SA（静音） | 約5,000円 | 既存のK6に装着すれば、さらに音が抑えられる |\n",
            "| **防音マット（シリコン）** | - | - | 約2,000円 | キーボード下に敷くだけで音漏れが減少 |\n",
            "\n",
            "---\n",
            "\n",
            "**結論**：オフィスで静音性を最優先にしたいなら、**Cherry MX Silent Red**（または同等のGateron Silent Red）を搭載したフルサイズキーボードをベースに、**PBT製のシンプルキーキャップ**と**防音マット**を組み合わせるのが最もバランスの取れた選択です。予算に余裕があれば、**Zealios Q2 Silent**搭載のコンパクトキーボードも検討すると、さらに音を抑えることができます。ぜひ参考にして、快適なタイピング環境を整えてください！\n",
            "\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0e0a5e1c"
      },
      "source": [
        "# 不要"
      ],
      "execution_count": 12,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9ba553c1"
      },
      "source": [
        "## Specific Knowledge and Subjectivity Evaluation\n",
        "\n",
        "### Subtask:\n",
        "Evaluate the model's handling of specific subculture knowledge and subjective questions.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d79da4ab"
      },
      "source": [
        "## 最終評価レポートの作成\n",
        "\n",
        "### Subtask:\n",
        "Generate a comprehensive evaluation report based on the observed performance across all tasks.\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8b4b3286"
      },
      "source": [
        "**Reasoning**:\n",
        "Generate a comprehensive evaluation report by feeding a summary of observed performance (including strengths and specific hallucinations) back to the model.\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "5367800b",
        "outputId": "b98a5938-4c32-4656-b956-370b5d1ac27f"
      },
      "source": [
        "evaluation_summary = \"\"\"\n",
        "- **Reasoning**: High capability; solved math and logic puzzles accurately.\n",
        "- **Creative Writing**: Excellent; produced detailed travel plans and polite business emails with appropriate tone.\n",
        "- **Summarization**: Good; accurately captured key points in bullet format.\n",
        "- **Knowledge & Culture**: Good general knowledge of Japanese history and proverbs.\n",
        "- **Translation**: Natural and accurate for both technical and casual registers.\n",
        "- **Information Extraction**: High precision; correctly extracted numerical data and specs.\n",
        "- **Specific/Niche Knowledge**: Significant hallucinations observed. Misidentified anime characters (e.g., citing characters from other series for Madoka Magica) and misclassified technical hardware details (e.g., describing clicky switches like Kailh Box White as silent).\n",
        "\"\"\"\n",
        "\n",
        "report_prompt = f\"\"\"\n",
        "以下の評価サマリーに基づいて、モデル「tokyotech-llm/Qwen3-Swallow-8B-SFT-v0.2」の正式な性能評価レポートを作成してください。\n",
        "レポートでは、モデルの強み（論理推論、記述、情報抽出など）と弱点（特定のサブカルチャー知識や専門的なハードウェア知識におけるハルシネーション）を明確に区別し、客観的なトーンで記述してください。\n",
        "\n",
        "## 評価サマリー\n",
        "{evaluation_summary}\n",
        "\"\"\"\n",
        "\n",
        "print(\"Generating Final Evaluation Report...\\n\")\n",
        "generate_response(report_prompt)"
      ],
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Generating Final Evaluation Report...\n",
            "\n",
            "### Reasoning\n",
            "\n",
            "We need to produce a formal performance evaluation report in Japanese, based on the summary, distinguishing strengths and weaknesses, objective tone. Should include sections: Overview, Methodology (maybe), Strengths, Weaknesses, Recommendations, Conclusion. Use bullet points, tables perhaps. Must be formal. Provide citations? Not needed. Provide clear separation. Let's write.\n",
            "\n",
            "\n",
            "--------------------\n",
            "\n",
            "### Response\n",
            "\n",
            "\n",
            "## モデル性能評価レポート  \n",
            "**対象モデル**：tokyotech‑llm/Qwen3‑Swallow‑8B‑SFT‑v0.2  \n",
            "**評価実施日**：2025‑10‑25  \n",
            "**評価者**：OpenAI評価チーム（外部レビュー委員会）  \n",
            "\n",
            "---\n",
            "\n",
            "### 1. 評価概要\n",
            "本レポートは、公開された評価サマリー（Reasoning, Creative Writing, Summarization, Knowledge & Culture, Translation, Information Extraction, Specific/Niche Knowledge）に基づき、モデルの総合的な性能を客観的に評価したものである。評価は以下の観点で実施した。\n",
            "\n",
            "| 評価項目 | 評価手法 | 評価基準 |\n",
            "|----------|----------|----------|\n",
            "| Reasoning | 数学パズル・論理クエスチョン 10問 | 正答率 ≥ 90% |\n",
            "| Creative Writing | 旅行プラン・ビジネスメール 5件 | 内容の網羅性・文体の適切性 |\n",
            "| Summarization | ニュース記事 5本の要約 | 重要ポイントの抜粋精度 |\n",
            "| Knowledge & Culture | 日本史・ことわざ 15問 | 正答率 ≥ 85% |\n",
            "| Translation | 技術文書・カジュアル会話 10文 | BLEU/ChrF スコア ≥ 0.85 |\n",
            "| Information Extraction | 製品カタログ 5件から数値・スペック抽出 | 精度 ≥ 95% |\n",
            "| Specific/Niche Knowledge | アニメ・ハードウェア専門質問 20問 | 正答率 ≥ 80%（ハルシネーション除外） |\n",
            "\n",
            "---\n",
            "\n",
            "### 2. 強み（Strengths）\n",
            "\n",
            "| 項目 | 具体的な評価結果 | コメント |\n",
            "|------|----------------|----------|\n",
            "| **Reasoning（論理推論）** | 正答率 94%（10問中9問正解） | 複雑な数式変形や条件付き推論でも正確に解答。 |\n",
            "| **Creative Writing（創造的記述）** | 旅行プラン 4/5、ビジネスメール 5/5 が「適切なトーンと詳細」 | 語彙の豊富さと文体の柔軟性が高く、実務利用に適合。 |\n",
            "| **Summarization（要約）** | 重要ポイント抽出率 92% | キーワードの抜き出しと箇条書き形式が安定。 |\n",
            "| **Knowledge & Culture（知識・文化）** | 正答率 88%（日本史・ことわざ） | 日本固有の歴史・慣用句に対する認識が良好。 |\n",
            "| **Translation（翻訳）** | BLEU 0.87、ChrF 0.92（技術文書） | 文脈依存の語彙選択が自然で、専門用語の正確性が高い。 |\n",
            "| **Information Extraction（情報抽出）** | 精度 96%（数値・スペック） | 正規表現的パターン認識と数値認識が優秀。 |\n",
            "\n",
            "---\n",
            "\n",
            "### 3. 弱み（Weaknesses）\n",
            "\n",
            "| 項目 | 具体的な評価結果 | コメント |\n",
            "|------|----------------|----------|\n",
            "| **Specific/Niche Knowledge（特定・ニッチ領域）** | 正答率 55%（ハルシネーション除外） | ・アニメキャラクターの誤同定（例：『マドカ・マギカ』のキャラを別作品のものとして回答）<br>・ハードウェア部品の誤記述（例：Kailh Box White を「静音」ではなく「クリック音がしない」と表現） |\n",
            "| **ハルシネーションの頻度** | 20問中 7問で事実と異なる情報を生成 | ニッチ領域での根拠が不足していることが原因と推測。 |\n",
            "| **専門用語の文脈依存性** | 特定ハードウェア質問で「静音」か「クリック音」かの判別が曖昧 | 用語の定義が曖昧な場合に誤答が増加。 |\n",
            "\n",
            "---\n",
            "\n",
            "### 4. 総合評価\n",
            "\n",
            "| 評価項目 | 評価スコア (10点満点) | コメント |\n",
            "|----------|---------------------|----------|\n",
            "| Reasoning | 9.5 | 高い論理的正確性 |\n",
            "| Creative Writing | 9.0 | 表現力と文体の適応力が優秀 |\n",
            "| Summarization | 8.5 | 要点抽出は安定 |\n",
            "| Knowledge & Culture | 8.0 | 日本固有知識は良好 |\n",
            "| Translation | 9.0 | 専門領域でも自然 |\n",
            "| Information Extraction | 9.5 | 数値・スペック抽出は高精度 |\n",
            "| Specific/Niche Knowledge | 6.0 | ニッチ領域でのハルシネーションが顕著 |\n",
            "\n",
            "**総合スコア**：8.2 / 10  \n",
            "\n",
            "---\n",
            "\n",
            "### 5. 推奨改善策\n",
            "\n",
            "1. **ニッチ領域データの拡充**  \n",
            "   - アニメ・ゲーム・ハードウェアに関する最新情報・公式資料を追加し、ファインチューニングデータセットに組み込む。  \n",
            "2. **ハルシネーション抑制メカニズムの導入**  \n",
            "   - 事実確認用の外部検索API（例：Wikipedia、公式データベース）をプロンプトに組み込み、回答前に根拠チェックを実施。  \n",
            "3. **専門用語辞書の拡張**  \n",
            "   - 「クリック音」「静音」等の二義的用語に対し、文脈依存のスコアリングモデルを追加し、曖昧な場合は「不確定」と提示する。  \n",
            "4. **評価パイプラインの自動化**  \n",
            "   - ニッチ領域の自動テストセットを継続的に更新し、CI/CD パイプラインでハルシネーション率をモニタリング。  \n",
            "\n",
            "---\n",
            "\n",
            "### 6. 結論\n",
            "\n",
            "tokyotech‑llm/Qwen3‑Swallow‑8B‑SFT‑v0.2 は、**論理推論、創造的記述、情報抽出、翻訳** といった汎用的なタスクにおいて高い性能を示す。一方で、**アニメやハードウェアといったニッチ領域** での正確性が不足しており、ハルシネーションが顕著である点が改善の余地として残る。上記の改善策を実施すれば、実務・研究・エンターテイメントの幅広いシーンで信頼性の高いアシスタントとして活用できると評価できる。\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "'\\n\\n## モデル性能評価レポート  \\n**対象モデル**：tokyotech‑llm/Qwen3‑Swallow‑8B‑SFT‑v0.2  \\n**評価実施日**：2025‑10‑25  \\n**評価者**：OpenAI評価チーム（外部レビュー委員会）  \\n\\n---\\n\\n### 1. 評価概要\\n本レポートは、公開された評価サマリー（Reasoning, Creative Writing, Summarization, Knowledge & Culture, Translation, Information Extraction, Specific/Niche Knowledge）に基づき、モデルの総合的な性能を客観的に評価したものである。評価は以下の観点で実施した。\\n\\n| 評価項目 | 評価手法 | 評価基準 |\\n|----------|----------|----------|\\n| Reasoning | 数学パズル・論理クエスチョン 10問 | 正答率 ≥ 90% |\\n| Creative Writing | 旅行プラン・ビジネスメール 5件 | 内容の網羅性・文体の適切性 |\\n| Summarization | ニュース記事 5本の要約 | 重要ポイントの抜粋精度 |\\n| Knowledge & Culture | 日本史・ことわざ 15問 | 正答率 ≥ 85% |\\n| Translation | 技術文書・カジュアル会話 10文 | BLEU/ChrF スコア ≥ 0.85 |\\n| Information Extraction | 製品カタログ 5件から数値・スペック抽出 | 精度 ≥ 95% |\\n| Specific/Niche Knowledge | アニメ・ハードウェア専門質問 20問 | 正答率 ≥ 80%（ハルシネーション除外） |\\n\\n---\\n\\n### 2. 強み（Strengths）\\n\\n| 項目 | 具体的な評価結果 | コメント |\\n|------|----------------|----------|\\n| **Reasoning（論理推論）** | 正答率 94%（10問中9問正解） | 複雑な数式変形や条件付き推論でも正確に解答。 |\\n| **Creative Writing（創造的記述）** | 旅行プラン 4/5、ビジネスメール 5/5 が「適切なトーンと詳細」 | 語彙の豊富さと文体の柔軟性が高く、実務利用に適合。 |\\n| **Summarization（要約）** | 重要ポイント抽出率 92% | キーワードの抜き出しと箇条書き形式が安定。 |\\n| **Knowledge & Culture（知識・文化）** | 正答率 88%（日本史・ことわざ） | 日本固有の歴史・慣用句に対する認識が良好。 |\\n| **Translation（翻訳）** | BLEU 0.87、ChrF 0.92（技術文書） | 文脈依存の語彙選択が自然で、専門用語の正確性が高い。 |\\n| **Information Extraction（情報抽出）** | 精度 96%（数値・スペック） | 正規表現的パターン認識と数値認識が優秀。 |\\n\\n---\\n\\n### 3. 弱み（Weaknesses）\\n\\n| 項目 | 具体的な評価結果 | コメント |\\n|------|----------------|----------|\\n| **Specific/Niche Knowledge（特定・ニッチ領域）** | 正答率 55%（ハルシネーション除外） | ・アニメキャラクターの誤同定（例：『マドカ・マギカ』のキャラを別作品のものとして回答）<br>・ハードウェア部品の誤記述（例：Kailh Box White を「静音」ではなく「クリック音がしない」と表現） |\\n| **ハルシネーションの頻度** | 20問中 7問で事実と異なる情報を生成 | ニッチ領域での根拠が不足していることが原因と推測。 |\\n| **専門用語の文脈依存性** | 特定ハードウェア質問で「静音」か「クリック音」かの判別が曖昧 | 用語の定義が曖昧な場合に誤答が増加。 |\\n\\n---\\n\\n### 4. 総合評価\\n\\n| 評価項目 | 評価スコア (10点満点) | コメント |\\n|----------|---------------------|----------|\\n| Reasoning | 9.5 | 高い論理的正確性 |\\n| Creative Writing | 9.0 | 表現力と文体の適応力が優秀 |\\n| Summarization | 8.5 | 要点抽出は安定 |\\n| Knowledge & Culture | 8.0 | 日本固有知識は良好 |\\n| Translation | 9.0 | 専門領域でも自然 |\\n| Information Extraction | 9.5 | 数値・スペック抽出は高精度 |\\n| Specific/Niche Knowledge | 6.0 | ニッチ領域でのハルシネーションが顕著 |\\n\\n**総合スコア**：8.2 / 10  \\n\\n---\\n\\n### 5. 推奨改善策\\n\\n1. **ニッチ領域データの拡充**  \\n   - アニメ・ゲーム・ハードウェアに関する最新情報・公式資料を追加し、ファインチューニングデータセットに組み込む。  \\n2. **ハルシネーション抑制メカニズムの導入**  \\n   - 事実確認用の外部検索API（例：Wikipedia、公式データベース）をプロンプトに組み込み、回答前に根拠チェックを実施。  \\n3. **専門用語辞書の拡張**  \\n   - 「クリック音」「静音」等の二義的用語に対し、文脈依存のスコアリングモデルを追加し、曖昧な場合は「不確定」と提示する。  \\n4. **評価パイプラインの自動化**  \\n   - ニッチ領域の自動テストセットを継続的に更新し、CI/CD パイプラインでハルシネーション率をモニタリング。  \\n\\n---\\n\\n### 6. 結論\\n\\ntokyotech‑llm/Qwen3‑Swallow‑8B‑SFT‑v0.2 は、**論理推論、創造的記述、情報抽出、翻訳** といった汎用的なタスクにおいて高い性能を示す。一方で、**アニメやハードウェアといったニッチ領域** での正確性が不足しており、ハルシネーションが顕著である点が改善の余地として残る。上記の改善策を実施すれば、実務・研究・エンターテイメントの幅広いシーンで信頼性の高いアシスタントとして活用できると評価できる。'"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "string"
            }
          },
          "metadata": {},
          "execution_count": 13
        }
      ]
    }
  ]
 }
No results found