File conversion

Here we use ffmpeg command, which can be installed on Ubuntu / Debian using apt-get install ffmpeg.

Next commands can be combined into one, though I prefer to keep each part separate.

Convert video (mp4 with aac audio) to audio.

ffmpeg -i video.mp4 -vn -acodec copy video.aac

Convert aac to wav.

ffmpeg -i video.aac audio.wav

Split wav in parts 60 seconds long.

ffmpeg -i audio.wav -f segment -segment_time 60 -c copy part%03d.mp3

Speech to text

For conversion we'll use pretrained model jonatasgrosman/wav2vec2-xls-r-1b-russian

Install HugginSound package and run Python interpreter.

pip install huggingsound
python

from huggingsound import SpeechRecognitionModel

n = 165
model = SpeechRecognitionModel("jonatasgrosman/wav2vec2-xls-r-1b-russian")
audio_paths = ["part%03d.wav" % i for i in range(0,n + 1)]

transcriptions = model.transcribe(audio_paths)
transcriptions

yoursdearboy/README.md

File conversion

Speech to text