Skip to content

Instantly share code, notes, and snippets.

@vijinho
Last active January 8, 2025 12:48
Show Gist options
  • Save vijinho/ad9bd80c990a803efef7f96ae3e7ee98 to your computer and use it in GitHub Desktop.
Save vijinho/ad9bd80c990a803efef7f96ae3e7ee98 to your computer and use it in GitHub Desktop.
wrapper for https://github.com/ggerganov/whisper.cpp whisper.cpp to generate transcriptions from given input media file
#!/bin/bash
# Default values
MODEL="$HOME/src/whisper.cpp/models/ggml-large-v3-turbo-q5_0.bin"
INPUT_FILE=""
OUTPUT_FILE="$TEMP/whisper-$(date "+%Y%m%d%H%M%S").wav"
WHISPER_ARGS="-m $MODEL -l en -t 12 -pp -pc -otxt -ovtt -osrt"
# Function to display usage information
usage() {
echo "Usage: $0 -i <input_file> [-o <output_file>] [-w <whisper-args>]"
echo " -i, --input-file The path to the input media file"
echo " -o, --output-file Optional path to the temporary output audio file (should have .wav extension, defaults to /tmp/whisper.wav)"
echo " -w, --whisper-args Arguments for whisper-cli (defaults to '$WHISPER_ARGS')"
}
# Parse named arguments
while [[ "$#" -gt 0 ]]; do
case $1 in
-i|--input-file) INPUT_FILE="$2"; shift ;;
-o|--output-file) OUTPUT_FILE="$2"; shift ;;
-w|--whisper-args) WHISPER_ARGS="$2"; shift ;;
*) echo "Unknown parameter passed: $1"; usage; exit 1 ;;
esac
shift
done
# Check if the input file is provided
if [ -z "$INPUT_FILE" ]; then
echo "Error: Input file is required."
usage
exit 1
fi
# Check if FFmpeg is installed
if ! command -v ffmpeg &> /dev/null; then
echo "Error: FFmpeg is not installed. Please install it first."
exit 1
fi
# Check if the input file exists
if [ ! -f "$INPUT_FILE" ]; then
echo "Error: Input file '$INPUT_FILE' does not exist."
exit 1
fi
# Ensure the output file has a .wav extension
if [[ "$OUTPUT_FILE" != *.wav ]]; then
OUTPUT_FILE="${OUTPUT_FILE}.wav"
echo "Note: Output file extension changed to .wav ($OUTPUT_FILE)"
fi
# Check if the output file already exists and delete it if so
if [ -f "$OUTPUT_FILE" ]; then
echo "Warning: Output file '$OUTPUT_FILE' already exists. Deleting the existing file..."
rm -f "$OUTPUT_FILE"
fi
# Convert the audio file using FFmpeg
echo "Converting '$INPUT_FILE' to '$OUTPUT_FILE'..."
ffmpeg -i "$INPUT_FILE" -ar 16000 -ac 1 -c:a pcm_s16le "$OUTPUT_FILE"
# Check if the conversion was successful
if [ $? -eq 0 ]; then
echo "Conversion successful."
else
echo "Error: FFmpeg encountered an issue during conversion."
exit 1
fi
# Execute whisper-cli on the output WAV file
echo "Executing whisper-cli with arguments '$WHISPER_ARGS' on '$OUTPUT_FILE'..."
$HOME/src/whisper.cpp/build/bin/whisper-cli $WHISPER_ARGS -of "${INPUT_FILE%.*}" "$OUTPUT_FILE"
# Check if whisper-cli execution was successful
if [ $? -eq 0 ]; then
echo "whisper-cli execution successful."
rm "$OUTPUT_FILE"
else
echo "Error: whisper-cli encountered an issue during execution."
exit 1
fi
@vijinho
Copy link
Author

vijinho commented Jan 4, 2025

Examples

Example 1: Basic Usage with Default Settings

./whisper-wrapper.sh -i /path/to/input_video.mp4

Explanation:

  • This command specifies an input file (input_video.mp4) and uses the default output file path (/tmp/whisper.wav).
  • The script will convert input_video.mp4 to WAV format and then use whisper-cli with default arguments (-t 8 -pp -pc -otxt -ovtt -osrt) to transcribe it.

Example 2: Custom Whisper-CLI Arguments

./whisper-wrapper.sh -i /path/to/input_video.mp4 -w "-t 16 -pp -pc -l fr"

Explanation:

  • This command specifies an input file (input_video.mp4) and custom arguments for whisper-cli (-t 16 -pp -pc -l fr).
  • The script will convert input_video.mp4 to WAV format and then use the specified whisper-cli arguments to transcribe it, with the language set to French (-l fr).

Script Explanation

This script is designed to automate the process of converting a media file into an audio WAV format and then transcribing that audio using the whisper-cli tool. Here's a step-by-step summary:

  1. Usage Information: The script starts by defining a usage function that explains how to use the script, including which arguments are required and optional.

  2. Argument Parsing: It uses getopts to parse named arguments (-i, -o, -w) for the input file, output file, and whisper-cli arguments respectively. If these arguments are not provided, it defaults some values (e.g., the default output file is /tmp/whisper.wav).

  3. Input Validation: The script checks if FFmpeg is installed, verifies that the input file exists, ensures the output file has a .wav extension, and removes any existing file with the same name.

  4. Audio Conversion: Using FFmpeg, the script converts the input media file into a 16kHz mono PCM WAV format, which is suitable for transcription by whisper-cli.

  5. Transcription: The script then runs whisper-cli with the specified arguments to transcribe the audio file. The output format and other settings can be customized through the -w argument.

  6. Final Steps: After transcription, the script checks if the whisper-cli execution was successful and removes the temporary WAV file used for transcription if everything goes well.

In essence, this script streamlines the process of converting any media file into a format suitable for transcription and then transcribing it using a specified tool, with user options to customize various aspects of both steps.

Submitted to the project: ggml-org/whisper.cpp#2703

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment