Skip to content

Instantly share code, notes, and snippets.

@rdyv
Last active December 29, 2024 19:18
Show Gist options
  • Save rdyv/9d6b8172c9d0b836f05546580ba5e8e0 to your computer and use it in GitHub Desktop.
Save rdyv/9d6b8172c9d0b836f05546580ba5e8e0 to your computer and use it in GitHub Desktop.
Download NRK TV Subtitles and optionally translate them into english
#!/bin/bash
# Check if a show code is provided as an argument
if [[ -z "$1" ]]; then
echo "Usage: $0 <SHOW_CODE>"
echo "Example: 'DVFJ65100124' is the show code in https://tv.nrk.no/serie/der-ingen-skulle-tru-at-nokon-kunne-bu/sesong/23/episode/DVFJ65100124"
exit 1
fi
# User-provided show code
SHOW_CODE="$1"
# Base URL for fetching the manifest
MANIFEST_URL="https://psapi.nrk.no/playback/manifest/program/$SHOW_CODE"
# Output file for combined subtitles
OUTPUT_FILE="subtitles_$SHOW_CODE.txt"
# Fetch the manifest and extract the preferred webVtt URL
WEBVTT_URL=$(curl -s "$MANIFEST_URL" | jq -r '.playable.subtitles[] | select(.type == "ttv") | .webVtt')
# If no 'ttv' type is found, fallback to any available subtitle
if [[ -z "$WEBVTT_URL" ]]; then
WEBVTT_URL=$(curl -s "$MANIFEST_URL" | jq -r '.playable.subtitles[0].webVtt')
fi
# Check if a valid webVtt URL was found
if [[ -z "$WEBVTT_URL" ]]; then
echo "No subtitles found for the show code: $SHOW_CODE"
exit 1
fi
echo "Using subtitles from: $WEBVTT_URL"
# Download the subtitles and save them to the output file
curl -s "$WEBVTT_URL" -o "$OUTPUT_FILE"
if [[ $? -eq 0 ]]; then
echo "Subtitles saved to: $OUTPUT_FILE"
else
echo "Failed to download subtitles from: $WEBVTT_URL"
exit 1
fi
@rdyv
Copy link
Author

rdyv commented Dec 29, 2024

And to convert these subtitles into Anki flashcard ready unique set of words using Anthropic APIs:

#!/bin/bash

# Check if the subtitle file and Anthropic API key are provided
if [[ -z "$1" ]]; then
    echo "Usage: $0 <SUBTITLE_FILE>" && exit 1
fi

if [[ -z "$ANTHROPIC_API_KEY" ]]; then
    echo "Please set your Anthropic API key in the environment variable ANTHROPIC_API_KEY." && exit 1
fi

MODEL_NAME="claude-3-5-sonnet-20241022"
SUBTITLE_FILE="$1"
TRANSLATED_FILE="translated_$(basename "$SUBTITLE_FILE")"
BATCH_SIZE=100

SUBTITLE_CONTENT=$(grep '^[a-zA-ZåæøÅÆØ]' "$SUBTITLE_FILE")
if [[ -z "$SUBTITLE_CONTENT" ]]; then
    echo "No valid subtitle content found in the file." && exit 1
fi

# Read the subtitle file content, remove any non-alphabetical lines and convert to an array
IFS=$'\n' read -d '' -r -a lines <<< "$SUBTITLE_CONTENT"
total_lines=${#lines[@]}

# Split the subtitle content into chunks and translate each chunk
for ((i=0; i<total_lines; i+=BATCH_SIZE)); do
    echo "Processing line batch starting from $((i+1))-$((i+BATCH_SIZE)) (out of $total_lines)"
    batched_lines=()
    for ((j=i; j<i+BATCH_SIZE && j<total_lines; j++)); do
        batched_lines+=("${lines[j]}")
    done

    PROMPT=$(cat <<EOF
Filter out all the unique Norwegian words or sentences from subtitle text and provide their English translations for a A2 level language speaker. The output must be formatted as:
norsk << english

Separate words by new line and ensure that no other helpful assistant text is included in the output, so that output can be directly feeded into the system. Subtitle content:
$(printf '%s\n' "${batched_lines[@]}")
EOF
)
    REQUEST_BODY=$(jq -n --arg model "$MODEL_NAME" --arg prompt "$PROMPT" \
        '{"model": $model, "max_tokens": 5000, "messages": [{ "role": "user", "content": $prompt }]}')

    echo "Submitting a chunk of subtitle file to Anthropic API for translation..."
    RESPONSE=$(curl -s -XPOST https://api.anthropic.com/v1/messages \
        -H "x-api-key: $ANTHROPIC_API_KEY" -H "anthropic-version: 2023-06-01" \
        -H "Content-Type: application/json" -d "$REQUEST_BODY")

    if echo "$RESPONSE" | jq -e .error >/dev/null; then
        echo "Error from Anthropic API: $RESPONSE" | jq .error
        exit 1
    fi
    TRANSLATION=$(echo "$RESPONSE" | jq -r '.content[0].text')

    # Append the translation to the output file
    echo "$TRANSLATION" >> "$TRANSLATED_FILE"
done

echo "Translations saved to $TRANSLATED_FILE"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment