Skip to content

Instantly share code, notes, and snippets.

View hppRC's full-sized avatar
๐Ÿ 
sleepy

Hayato Tsukagoshi hppRC

๐Ÿ 
sleepy
View GitHub Profile
import random
import uuid
from pathlib import Path
import click
from vllm import LLM, SamplingParams
from vllm.outputs import RequestOutput
import datasets as ds
from src.data.common import normalize_text
import datasets as ds
from konoha import SentenceTokenizer
def title2text():
dataset: ds.Dataset = ds.load_dataset("globis-university/aozorabunko-clean", split="train", num_proc=16)
def process(x: dict[str, list]):
anc_list, pos_list = [], []
from transformers import PreTrainedTokenizer
from vllm import LLM, SamplingParams
from vllm.outputs import RequestOutput
import datasets as ds
def build_input_text(text: str, tokenizer: PreTrainedTokenizer) -> str:
text = text.strip()
@hppRC
hppRC / print_params.py
Created December 15, 2023 03:27
ใƒขใƒ‡ใƒซใƒ‘ใƒฉใƒกใƒผใ‚ฟใ‚’ใ‚ใ‹ใ‚Šใ‚„ใ™ใ่กจ็คบใ™ใ‚‹ใ‚„ใค
def format_param_with_unit(num_params: int) -> str:
if num_params >= 1000 * 1000 * 1000:
unit = "B"
num_params /= 1000 * 1000 * 1000
elif num_params >= 1000 * 1000:
unit = "M"
num_params /= 1000 * 1000
elif num_params >= 1000:
unit = "K"
num_params /= 1000
@hppRC
hppRC / install-jumanppv2.sh
Created January 4, 2023 07:07
Juman++ V2ใฎใ‚คใƒณใ‚นใƒˆใƒผใƒซๆ‰‹้ †
ORIGIN_DIR=$(pwd)
JUMANPP_DIR="$HOME/.local/share/jumanpp"
mkdir -p $JUMANPP_DIR
cd $JUMANPP_DIR
curl -LO https://github.com/ku-nlp/jumanpp/releases/download/v2.0.0-rc3/jumanpp-2.0.0-rc3.tar.xz
tar -xf jumanpp-2.0.0-rc3.tar.xz
cd jumanpp-2.0.0-rc3
import hoge.a
print(__file__)
from tqdm import tqdm
for i in tqdm(list(range(100)), position=0):
for batch in tqdm(list(range(10000000)), position=1):
pass
N_TRIALS=10
STORAGE=sqlite:///example.db
STUDY_NAME=`optuna create-study --storage $STORAGE`
DISTRIBUTIONS=`cat distributions.json`
for _ in `seq 1 $N_TRIALS`; do
trial=`optuna ask \
--storage $STORAGE \
--study-name $STUDY_NAME \
@hppRC
hppRC / ๐Ÿ“Š Weekly development breakdown
Last active October 29, 2020 00:05
Weekly development breakdown๐Ÿ”ฅ
Python 3 hrs 45 mins โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 50.0%
Markdown 2 hrs 38 mins โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 35.1%
TypeScript 37 mins โ–ˆโ–‹โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 8.2%
JSON 19 mins โ–‰โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 4.2%
JavaScript 5 mins โ–Žโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 1.3%
{"lastUpload":"2020-04-04T07:45:57.757Z","extensionVersion":"v3.4.3"}