Skip to content

Instantly share code, notes, and snippets.

@yoursdearboy
Created December 22, 2023 14:22
Show Gist options
  • Save yoursdearboy/4406729834b96f98c400404e4912f9e1 to your computer and use it in GitHub Desktop.
Save yoursdearboy/4406729834b96f98c400404e4912f9e1 to your computer and use it in GitHub Desktop.
Video with speech to text

File conversion

Here we use ffmpeg command, which can be installed on Ubuntu / Debian using apt-get install ffmpeg.

Next commands can be combined into one, though I prefer to keep each part separate.

  1. Convert video (mp4 with aac audio) to audio.
ffmpeg -i video.mp4 -vn -acodec copy video.aac
  1. Convert aac to wav.
ffmpeg -i video.aac audio.wav
  1. Split wav in parts 60 seconds long.
ffmpeg -i audio.wav -f segment -segment_time 60 -c copy part%03d.mp3

Speech to text

For conversion we'll use pretrained model jonatasgrosman/wav2vec2-xls-r-1b-russian

Install HugginSound package and run Python interpreter.

pip install huggingsound
python
from huggingsound import SpeechRecognitionModel

n = 165
model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-russian")
audio_paths = ["part%03d.wav" % i for i in range(0,n + 1)]

transcriptions = model.transcribe(audio_paths)
transcriptions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment