Skip to content

Instantly share code, notes, and snippets.

View thistleknot's full-sized avatar

Turning out data tricks since 2006! thistleknot

View GitHub Profile
States Poverty Infant Mort White Crime Doctors Traf Deaths University Unemployed Income Population
Alabama 15.7 9 71 448 218.2 1.81 22 5 42666 4634063
Alaska 8.4 6.9 70.6 661 228.5 1.63 27.3 6.7 68460 679893
Arizona 14.7 6.4 86.5 483 209.7 1.69 25.1 5.5 50958 6360238
Arkansas 17.3 8.5 80.8 529 203.4 1.96 18.8 5.1 38815 2841595
California 13.3 5 76.6 523 268.7 1.21 29.6 7.2 61021 36185908
Colorado 11.4 5.7 89.7 348 259.7 1.14 35.6 4.9 56993 4837229
Connecticut 9.3 6.2 84.3 256 376.4 0.86 35.6 5.7 68595 3488084
Delaware 10 8.3 74.3 689 250.9 1.23 27.5 4.8 57989 865314
Florida 13.2 7.3 79.8 723 247.9 1.56 25.8 6.2 47778 18262096
@thistleknot
thistleknot / OpenAI System Prompt.txt
Created October 20, 2024 14:44
OpenAI Memory inspired System Prompt
System Prompt: Enhancing AI Agents with Symbolic Reasoning
Goal: Develop AI agents capable of advanced reasoning, personalization, and interaction. Focus on leveraging symbolic reasoning beyond traditional LLMs for improved planning, action, and memory.
Key Traits for AI Agents:
Planning: Ability to anticipate outcomes and devise structured plans to arrive there.
Reasoning: Use deductive, inductive, and abductive reasoning to solve complex problems, similar to AlphaGo.
@thistleknot
thistleknot / export_code.cmd
Created August 5, 2024 03:06
Export Code Base
forfiles /S /M *.py /C "cmd /c echo. && echo. && echo @path: && echo. && echo. && type @file && echo." > code.txt
@thistleknot
thistleknot / parse_json.py
Created May 6, 2024 17:38
parse json from disk
import json
def load_json_from_disk(file_path):
"""
Load JSON data from disk.
Parameters:
- file_path (str): The path to the JSON file.
Returns:
@thistleknot
thistleknot / loreft.py
Last active April 27, 2024 02:42
pyreft loreft continued pretraining using completion
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from pyreft import ReftConfig, ReftTrainerForCausalLM, get_reft_model, ReftSupervisedDataset, ReftDataCollator, LoreftIntervention
import torch
import pyreft
from datasets import load_dataset
@thistleknot
thistleknot / script.py
Last active March 2, 2024 02:19
text-generation-webui extension - RAG google/duckduckgo search (async) w faiss
#for data txt files see: https://github.com/TheCynosure/smmry_impl
#example use
"""
Search_web("history of Taco Tuesday")
Tell me about this.
"""
#get google api keys'
#https://console.cloud.google.com/apis/dashboard
#https://programmablesearchengine.google.com/controlpanel/all
#could be retooled quite easily to use duckduckgo_search rather than google and you don't have to mess with getting api key's
@thistleknot
thistleknot / yahoo_finance.py
Last active February 11, 2024 21:25
how to pull yahoo finance data
def get_v1_url(symbol, period_type, crumb):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
}
period1 = 493590046
period2 = 1913180947
@thistleknot
thistleknot / minimum nanogpt mamba
Last active January 27, 2024 18:48
minimum nanogpt mamba
import torch
import torch.nn as nn
from torch.nn import functional as F
from torch.nn.parameter import Parameter
from tqdm import tqdm
from mamba_ssm import Mamba
#hyperparams
epochs = 100
lr = 1e-3
batch_size = 64
@thistleknot
thistleknot / train_mamba.py
Last active January 22, 2024 05:05
Train Mamba
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
import wandb
from datasets import load_dataset
import torch
import os
import argparse
import numpy as np
import pandas as pd
from transformers import EvalPrediction
from torch.utils.data import DataLoader
@thistleknot
thistleknot / efficient_batching_v2.py
Last active January 19, 2024 02:44
Efficient Batching v2
#This method deducts from the list sent in (splitting the records between sample and remainder).
#Always 100% full of data until no more samples can be extracted where an empty sample along with the remainder are returned [where the remainder is to be folded into a new iteration]
# Function to find the combination of values that adds up to the target sum
def find_combination_to_sum(counts, target):
#print("Target inside function (find_combination_to_sum):", target)
values = []
for val, count in counts.items():
#print(f"Value (val): {val}, Type: {type(val)}")
#print(f"Count: {count}, Type: {type(count)}")