Skip to content

Instantly share code, notes, and snippets.

@vijinho
Last active January 7, 2025 09:31
Show Gist options
  • Save vijinho/69e28747ac36b03cb895ff0d0706d04b to your computer and use it in GitHub Desktop.
Save vijinho/69e28747ac36b03cb895ff0d0706d04b to your computer and use it in GitHub Desktop.
A bash script to check a folder for media (video/audio) files and transcribe them using 'whisper' (working) or 'whisper-clli' (TBD later) by detecting if there exists a .srt file already
#!/bin/bash
# NOTE this is actually using whisper-wrapper.sh: https://gist.github.com/vijinho/ad9bd80c990a803efef7f96ae3e7ee98
# Function to display usage information
usage() {
echo "Usage: $0 [-m | --media-directory <directory>] [-a | --transcription-args \"<arguments>\"]"
echo "Defaults: -m `pwd`"
exit 1
}
# Set default values
MEDIA_DIR=`pwd`
TRANSCRIPTION_ARGS=""
# Check if whisper or whisper-cli script is available in the system path
if command -v whisper &> /dev/null; then
TRANSCRIBER=$(command -v whisper)
elif command -v whisper-cli &> /dev/null; then
TRANSCRIBER=$(command -v whisper-cli)
else
echo "The 'whisper' or 'whisper-cli' command was not found or is not executable. Please ensure it's installed and accessible."
exit 1
fi
# hard-coded for script https://gist.github.com/vijinho/ad9bd80c990a803efef7f96ae3e7ee98
cd /home/vijay/src/whisper.cpp
TRANSCRIBER="$HOME/bin/whisper-wrapper.sh"
WAVFILE_TMP=whisper_`date +%s`.wav
TRANSCRIPTION_ARGS="-o /tmp/$WAVFILE_TMP -i"
# Check if the transcriber script is available in the system path
if [[ ! -x "$TRANSCRIBER" ]]; then
echo "The '$TRANSCRIBER' script is not executable."
exit 1
fi
# Parse command-line options
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
-m|--media-directory)
MEDIA_DIR="$2"
shift
;;
-a|--transcription-args)
TRANSCRIPTION_ARGS="$2"
shift
;;
*)
echo "Unknown option: $key"
usage
;;
esac
shift
done
# Find video files and store them in an array - older than 10 mins to help avoid any that's being downloaded atm
mapfile -t video_files < <(find "$MEDIA_DIR" -type f -mmin +10 \( -iname "*.mp4" -o -iname "*.mov" -o -iname "*.m4v" -o -iname "*.avi" -o -iname "*.3gp" -o -iname "*.webm" -o -iname "*.mp3" -o -iname "*.aac" -o -iname "*.ogg" -o -iname "*.wav" \))
# Find srt files and store them in an array
mapfile -t srt_files < <(find "$MEDIA_DIR" -type f -iname "*.srt")
# Create associative arrays to store basenames of video and srt files
declare -A video_basenames
declare -A srt_basenames
# Populate the associative arrays with basenames (ignoring extension)
for file in "${video_files[@]}"; do
base="${file%.*}"
video_basenames["$base"]=1
done
for file in "${srt_files[@]}"; do
base="${file%.*}"
srt_basenames["$base"]=1
done
# Find the non-intersection of basenames and print matching video files
for base in "${!video_basenames[@]}"; do
if [[ -z "${srt_basenames[$base]}" ]]; then
for file in "${video_files[@]}"; do
if [[ "$file" == *"$base"* ]]; then
# Run the command on each video file with additional arguments
eval "$TRANSCRIBER $TRANSCRIPTION_ARGS \"$file\""
break
fi
done
fi
done
echo "Script completed successfully."
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment