Skip to content

Instantly share code, notes, and snippets.

View p208p2002's full-sized avatar

Philip p208p2002

View GitHub Profile
@p208p2002
p208p2002 / fix_chatglm_tokenizer.ipynb
Last active February 22, 2024 01:09
fix_chatglm_tokenizer.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
# GPT2 BPE-Tokenizer token 轉 utf-8 處理
# 轉換僅針對不在詞表內,以bytes形式表達的token(如中文字)
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
word = "台"
tokens = tokenizer.convert_ids_to_tokens(tokenizer(word,add_special_tokens=False)["input_ids"])
print("tokens:",tokens)
# 轉 utf-8
# https://huggingface.co/docs/transformers/perplexity
from typing import Any
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
class PPL():
def __init__(self, model_id="gpt2") -> None:
self.model = AutoModelForCausalLM.from_pretrained(model_id)
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
self.device = 'cpu'
<s>[INST] <<SYS>>你是一位中文母語使用者,你只能用中文對話<</SYS>>hello [/INST] *你好* (nǐ hǎo) </s>
<s>[INST] 你是誰 [/INST] *我是 líng* (wǒ shì líng) - I am Chinese. </s>
<s>[INST] 說個笑話來聽聽 [/INST] *笑* (xì) - Sure, here's a Chinese joke for you </s>
# $ pip install deepspeed>=0.9.3
# $ deepspeed deepspeed_inference.py
import os
import deepspeed
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
local_rank = int(os.getenv("LOCAL_RANK", "0"))
@p208p2002
p208p2002 / Python3 文字與unicode互轉.md
Created June 10, 2022 01:58
#blog #python3 #uni2word #word2uni

Python3 文字與unicode互轉

word2unicode

文字轉unicode較為簡單,用ord(x)即可

import re
def word2unicode(x):
    uni = hex(ord(x))
    uni = re.sub("^0x", "", uni).upper()
    return uni
word2unicode("字") # 5B57
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@p208p2002
p208p2002 / Flask與Pytorch模型部署.md
Last active March 18, 2022 03:33
#blog #flask #pyorch #cuda-out-of-memory

Flask與Pytorch模型部署

pytorch模型部署時若遇到多執行緒(或是多個併發請求)會自動請求新的vram,使用完畢後也不會自動釋放,因此當API用一段時間後常常會出現 cuda out of memor 導致server崩潰。除此之外多執行緒爭奪資源也有機會讓程式變得不穩定。

有幾條思路可以解決

  1. 關閉Flask的多執行緒
  2. 所有線程使用模型時先等待其它執行緒使用完畢

沒有關閉多執行緒的狀況下