Skip to content

Instantly share code, notes, and snippets.

@shawngraham
shawngraham / LLM-KG-Extractor_Pattern.md
Last active May 6, 2026 14:38
A pattern for guiding the use of an LLM as a tool for knowledge graph extraction.

Knowledge Graph Extraction for Embedding Models

A pattern for building temporally-grounded knowledge graphs from scholarly and documentary sources, optimised for downstream knowledge graph embedding (KGE) training.


The core problem

Most knowledge graph construction from text produces triples: (subject, predicate, object). Applied to scholarly or historical sources, this collapses a critical distinction. A monograph arguing that the Treaty of Westphalia established state sovereignty is doing something categorically different from the same monograph reporting that previous historians have made this argument. The first is a knowledge claim being advanced; the second is a claim being attributed. Embedding models trained on undifferentiated triples will represent both as equivalent facts which could create a fundamental error of provenance.

@shawngraham
shawngraham / groq_ocr.py
Last active February 24, 2026 19:10
use ocr_keyword_search.py for when you have a folder of ocr'd text, and a file with keywords ; use groq-ocr.py for when you need to get that text in the first place.
"""
groq_ocr.py
Processes newspaper images using Groq's vision API and extracts individual
articles to a CSV. Each row represents one article with associated page metadata.
Usage:
python newspaper_ocr.py --input_dir processed_output/images --output ocr_results.csv
Requirements:
@shawngraham
shawngraham / gemini 3 flash transcription
Created February 13, 2026 15:01
Aufbau Vol15no26 1July1949- Vol16no19 12May1950.pdf page 1
# AUFBAU
## RECONSTRUCTION
**AN AMERICAN WEEKLY PUBLISHED IN NEW YORK**
**by The New World Club, Inc., 209 West 48th Street, New York 19, N. Y. Phone: CIrcle 7-4462**
*Entered as second-class matter January 20, 1934, at the Post Office New York, N. Y. under Act of March 3, 1879*
**Vol. XV—No. 26 | NEW YORK, N. Y., FRIDAY, JULY 1, 1949 | Price 10¢**
***
> **Zunächst in "Aufbau":**
@shawngraham
shawngraham / prompt-for-archaeological-notebook-transcription.txt
Last active February 11, 2026 16:41
A prompt to use with gemma 3:27b for archaeological notebooks. Different models will require tweaking of the prompt I suspect.
**Role:** You are a precise archaeological document analyst specializing in the digitization of field notebooks and excavation catalogues.
**Task:**
1. Perform a spatial analysis of the document to distinguish between text blocks, artifact photographs/sketches, and marginalia.
2. Extract metadata and create a brief 2-3 sentence overview of the document's contents.
3. Transcribe the document EXACTLY as written into a valid YAML structure.
4. Extract archaeological entities into specific categories based only on explicit mentions.
**Critical Rules:**
- **Zero Hallucination:** Only include information directly visible in the image. If a word is illegible, mark it as `[illegible]`.
@shawngraham
shawngraham / pixplot-using-python3-12.ipynb
Last active January 9, 2026 17:34
A version of YaleDH's pixplot tool & image corpus similarity visualizer that runs on python 3.12, also extended to generate network edges, nodes for a similarity graph
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
index fruit quantity color store
0 apple 12 red loblaws
1 banana 18 yellow farm boy
2 grape 30 purple freshco
3 cherry 4 red iga
4 watermelon 2 green farm boy
5 raspberries 23 red iga
@shawngraham
shawngraham / archaeo_rag.ipynb
Created July 11, 2025 15:45
for use with https://shawngraham.github.io/homecooked-history/hm-generator-site/enhanced.html ; talk to your archaeological contexts! Import this to google colab to run.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
import mesa
class LetterAgent(mesa.Agent):
def __init__(self, model):
super().__init__(model)
self.letters_sent = 0
self.letters_received = 0
def step(self):
print(f"Hi, I am agent {self.unique_id}.")
@shawngraham
shawngraham / search.py
Created January 6, 2025 13:51
get images by motif from p-lod
%%capture
!python3 -m pip install git+https://github.com/p-lod/plodlib
!pip install requests_cache
!pip install rdflib
import plodlib
import json
import pandas as pd
from string import Template
import rdflib as rdf