Skip to content

Instantly share code, notes, and snippets.

@etuardu
Created August 22, 2024 17:27
Show Gist options
  • Save etuardu/aa77cb0611baf5828e07e4ea76745771 to your computer and use it in GitHub Desktop.
Save etuardu/aa77cb0611baf5828e07e4ea76745771 to your computer and use it in GitHub Desktop.
WikiCommons Chinese characters decomposition data downloader (tsv)
#!/bin/bash
# Download the raw Chinese characters decomposition data
# from WikiCommons as a tsv.
# Usage:
# ./get_wikicommons_decomposition.sh > data.tsv
# awk -F'\t' '{ print $2 }' data.tsv # print the first field
# Depends on:
# - curl
# - jq
# - awk
raw_url="https://api.wikimedia.org/core/v1/commons/page/Commons:Chinese_characters_decomposition"
curl -s "$raw_url" |
jq -r '.source' |
awk '
/<pre>/ { flag=1; next }
/<\/pre>/ { flag=0 }
flag
'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment