Skip to content

Instantly share code, notes, and snippets.

@msanders
Last active December 27, 2024 05:27
Show Gist options
  • Save msanders/903923ce83040054a09907b376b89c7d to your computer and use it in GitHub Desktop.
Save msanders/903923ce83040054a09907b376b89c7d to your computer and use it in GitHub Desktop.
Script to update Kiwix catalog automatically using aria2c

Minimal script to update a Kiwix catalog. Supports automatically resuming partial downloads, can safely set and forget.

Installation

curl --proto "=https" --tlsv1.2 "https://gist.githubusercontent.com/msanders/903923ce83040054a09907b376b89c7d/raw/kiwix_catalog_update" -o /usr/local/bin/kiwix_catalog_update
chmod u+x /usr/local/bin/kiwix_catalog_update

Requirements

  • aria2c
  • kiwix-tools
  • xq

Debian

# apt install aria2 kiwix-tools xq

systemd

Example systemd config on Debian to run every Tuesday – Thursday at 1 AM to 7 AM.

/etc/systemd/system/kiwix-catalog-update.service
[Unit]
Description=Kiwix Catalog Update Service
PartOf=kiwix.service

[Service]
User=kiwix
Group=kiwix
Type=oneshot
ExecStart=/usr/local/bin/kiwix_catalog_update -q

[Install]
WantedBy=multi-user.target
/etc/systemd/system/kiwix-catalog-update.timer
[Unit]
Description=Kiwix Catalog Update Timer

[Timer]
OnCalendar=Tue..Thu 1:00

[Install]
WantedBy=timers.target
/etc/systemd/system/kiwix-catalog-update-stop.service
[Unit]
Description=Kiwix Catalog Update Stop Service

[Service]
User=kiwix
Group=kiwix
Type=oneshot
ExecStart=/usr/bin/systemctl stop kiwix-catalog-update.service

[Install]
WantedBy=multi-user.target
/etc/systemd/system/kiwix-catalog-update-stop.timer
[Unit]
Description=Kiwix Catalog Update Stop Timer

[Timer]
OnCalendar=*-*-* 7:00:00

[Install]
WantedBy=timers.target

Write the above files and then run:

# systemctl daemon-reload
# systemctl enable --now kiwix-catalog-update.timer
# systemctl enable --now kiwix-catalog-update-stop.timer

License

This is made available under the terms of the MIT license. For a copy, see https://opensource.org/licenses/MIT.

#!/bin/sh
# Minimal script to update a Kiwix catalog.
# https://kiwix.org
# https://gist.github.com/msanders/903923ce83040054a09907b376b89c7d
# License: MIT
set -o errexit -o nounset
CATALOG_URL="https://library.kiwix.org/catalog/v2/entries?count=-1"
# Change to your library directory structure.
KIWIX_BASE_DIR="/media/usb/Kiwix"
LIBRARY_PATH="$KIWIX_BASE_DIR/library/library.xml"
ZIM_DIR="$KIWIX_BASE_DIR/content"
# ZIM names to fetch each update.
ZIM_NAMES="
gutenberg/gutenberg_en_all
ifixit/ifixit_en_all
wikipedia/wikipedia_en_all_maxi
"
main() (
echo "Fetching upstream Kiwix catalog..."
metalinks="$(curl --compressed --progress-bar --proto "=https" --tlsv1.2 "$CATALOG_URL" |
xq -x '/feed/entry/link[@type="application/x-zim"]/@href')"
matched_metalinks="$(printf "%s\n" "$metalinks" | rg -F -f /dev/fd/3 3<<-EOF
$(printf "%s" "$ZIM_NAMES" | tail -n +2)
EOF
)"
aria_progress_dir="$ZIM_DIR/.aria2c"
mkdir -p "$aria_progress_dir"
printf "%s" "$matched_metalinks" | while read -r url; do
filename="$(basename "$url" .meta4)"
filepath="$ZIM_DIR/$filename"
if [ ! -f "$filepath" ]; then
echo "$filename missing; downloading..."
(
cd "$aria_progress_dir"
aria2c "$url" "$@" \
--continue \
--follow-metalink=mem \
--metalink-preferred-protocol=https \
--retry-wait=5
prefix="${filename%_????-??.zim}"
for existingpath in "$ZIM_DIR/$prefix"_*.zim; do
# See https://mywiki.wooledge.org/glob#Portability
[ -e "$existingpath" ] || break
printf "Removing out-of-date '%s'.\n" "$existingpath"
rm "$existingpath"
done
mv "$filename" "$filepath"
printf "Added '%s'\n" "$filepath"
)
fi
done
rm -rf "$aria_progress_dir"
echo "Adding downloaded ZIM files to Kiwix library..."
mkdir -p "$(dirname "$LIBRARY_PATH")"
kiwix-manage "$LIBRARY_PATH" add "$ZIM_DIR/"*.zim
echo "Done."
)
main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment