Skip to content

Instantly share code, notes, and snippets.

View CultriX-Github's full-sized avatar

CultriX CultriX-Github

  • Netherlands
  • 14:12 (UTC +02:00)
View GitHub Profile
@CultriX-Github
CultriX-Github / Tally-Multi-Vote Dataset Generation.py
Last active January 27, 2025 22:13
Tally-Multi-Vote Dataset Generation.
import os
import requests
import random
import logging
import re
import time
import json
import matplotlib
matplotlib.use('Agg') # Set the backend to 'Agg' before importing pyplot
import matplotlib.pyplot as plt
#!/bin/bash
# Functions
install_basic_packages() {
echo "Installing basic packages..."
apt update -y && apt install -y screen nano git git-lfs speedometer htop libaio-dev || {
echo "Failed to install basic packages" >&2
exit 1
}
Model AGIEval GPT4All TruthfulQA Bigbench
Llama-3.2-3B 25.76 Error: File does not exist 39.22 34.61

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 20.87 ± 2.55
acc_norm 23.23 ± 2.65
agieval_logiqa_en 0 acc 23.96 ± 1.67
Model AGIEval GPT4All TruthfulQA Bigbench
Llama-3.2-3B-DPO 27.06 Error: File does not exist 58.93 34.96

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 18.90 ± 2.46
acc_norm 20.87 ± 2.55
agieval_logiqa_en 0 acc 26.11 ± 1.72
Model AGIEval GPT4All TruthfulQA Bigbench
Llama3-8B-function-calling-uncensored-dareties 39.15 Error: File does not exist 54.99 42.52

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 24.41 ± 2.70
acc_norm 23.23 ± 2.65
agieval_logiqa_en 0 acc 34.56 ± 1.87
Model AGIEval GPT4All TruthfulQA Bigbench
Llama3-8B-function-calling-dpo-slerp 39.52 Error: File does not exist 56.01 42.8

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.98 ± 2.76
acc_norm 23.62 ± 2.67
agieval_logiqa_en 0 acc 38.25 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench
Hermes-3-Llama-3.1-8B 41.51 Error: File does not exist 58.61 43.08

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 26.38 ± 2.77
acc_norm 25.20 ± 2.73
agieval_logiqa_en 0 acc 39.02 ± 1.91
Model AGIEval GPT4All TruthfulQA Bigbench
Llama3-8B-DPO 41.87 Error: File does not exist 71.38 44.5

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 21.65 ± 2.59
acc_norm 20.47 ± 2.54
agieval_logiqa_en 0 acc 40.71 ± 1.93
Model AGIEval GPT4All TruthfulQA Bigbench Average
Phi-3-mini-4k-instruct 44.44 71.88 57.77 41.9 54

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 29.13 ± 2.86
acc_norm 28.74 ± 2.85
agieval_logiqa_en 0 acc 42.86 ± 1.94
Model AGIEval GPT4All TruthfulQA Bigbench Average
CultMerge-7B-v1 45.2 77.1 78.22 49.87 62.6

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 27.17 ± 2.80
acc_norm 25.59 ± 2.74
agieval_logiqa_en 0 acc 39.48 ± 1.92