Skip to content

Instantly share code, notes, and snippets.

@barseghyanartur
Last active April 4, 2026 00:28
Show Gist options
  • Select an option

  • Save barseghyanartur/93cafc05616758479eed6377f6593246 to your computer and use it in GitHub Desktop.

Select an option

Save barseghyanartur/93cafc05616758479eed6377f6593246 to your computer and use it in GitHub Desktop.
blog: licence-normaliser

licence-normaliser: Taming licence chaos in Python

Date: 2026-04-04 14:00
category:Tech
tags:python, licence
summary:Tired of wrestling with messy license strings like "MIT License" vs. "The MIT License" or cryptic URLs? licence-normaliser is a lightweight Python tool that tames the chaos. It maps inconsistent metadata into a clean, machine-readable hierarchy (Family → License → Version), turning strings like CC BY-NC-ND 4.0 into a tidy cc-by-nc-nd-4.0 automatically. Whether you're scraping repos or managing compliance, it handles SPDX codes, prose, and even complex Creative Commons variants with ease—giving you a single source of truth for your legal metadata.
image:https://raw.githubusercontent.com/barseghyanartur/licence-normaliser/main/docs/_static/licence_normaliser_logo.webp

Hey, ever tried cleaning up messy license strings — like CC BY-NC-ND 4.0 or MIT License — and getting them into something tidy and machine-readable? That's exactly what licence-normaliser does, and honestly, it's a lifesaver for anyone dealing with open-source compliance or metadata.

licence-normaliser

Check out the repo here: licence-normaliser. It's a lightweight Python library that turns chaos into order using a neat three-level system: familylicenceversion. Think cccc-by-nc-ndcc-by-nc-nd-4.0.

Here's a quick demo — imagine you're scraping papers or repos and licenses come in every flavor:

from licence_normaliser import normalise_licence

result = normalise_licence("CC BY-NC-ND 4.0")
print(result.key)           # → cc-by-nc-nd-4.0
print(result.licence)       # → cc-by-nc-nd
print(result.licence.family)  # → cc

Super clean, right? It handles SPDX codes (Apache-2.0), full URLs, even sloppy prose like "creative commons attribution non-commercial no derivatives". And for Creative Commons fans — yes, it knows all the variants, including the weird IGO ones.

Look at these badges to see what it normalizes:

Creative Commons license badges showing the normalized family and variant icons

Or the compatibility chart if you're remixing stuff:

License compatibility chart with checkmarks and crosses for remixing different licenses

What makes it robust? Everything's file-driven — aliases, patterns, URLs live in JSON, so you add new synonyms without touching code. Want strictness? Pass strict=True and it'll raise an error if it can't match. Debugging? Use --explain or trace=True to see the whole resolution path.

Install's dead simple:

pip install licence-normaliser

(or uv pip install if you're fancy).

CLI's handy too:

licence-normaliser normalise "MIT"          # → mit
licence-normaliser batch "Apache-2.0" "CC BY 4.0"

It's got only two stars right now — probably because it's niche — but if you're building anything with license detection (think ScanCode integration, repo crawlers, academic tools), this quietly solves a headache. It gets updated via CLI data pulls from SPDX, OSI, Creative Commons... no manual hassle.

Bottom line: if licenses are your mess, normalise them. This tool just works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment