Captioning with speech to text Automatically caption your content in real-time and offline by transcribing the audio of films, videos, live events, and more. Display the resulting text on a screen to provide an accessible experience.
Common use cases:
- Captioning for video content such as films, live television, sports matches
- Transcribing audio-only content like podcasts or phone conversations
- Technologies used
- (Speech SDK)
- Speech to text
- Phrase list
Continuous Integration and Deployment
Captioning
➜ cd SpeechText
➜ ffmpeg -i .\Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp4 .\Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3
➜ git clone --depth 1 https://github.com/Azure-Samples/cognitive-services-speech-sdk.git
➜ cd .\cognitive-services-speech-sdk\scenarios\python\console\captioning\
# Check GStreamer configuration
➜ $env:SPEECH_KEY='<YOUR_SPEECH_RESOURCE_KEY>'
➜ $env:SPEECH_REGION='<YOUR_SPEECH_RESOURCE_REGION>'
➜ py -m venv venv
➜ .\venv\Scripts\Activate.ps1
➜ py -m pip install azure-cognitiveservices-speech
➜ py ./cognitive-services-speech-sdk/scenarios/python/console/captioning/captioning.py --input Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3 --format mp3 --output caption.output.txt --language en-US --srt --threshold 5 --delay 0 --offline --maxLineLength 60 --lines 1 --profanity mask
# cognitive-services-speech-sdk/scenarios/csharp/dotnetcore/captioning
➜ dotnet run --project .\captioning\captioning.csproj --input Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3 --format mp3 --output caption.output.txt --language en-US --srt --threshold 5 --delay 0 --offline --maxLineLength 60 --lines 1 --profanity mask
# spx
➜ spx recognize --file Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3 --format any --output srt file BPDevs-Codespaces-Coffee-Code-Tech.srt --output each file - @output.each.detailed --property SpeechServiceResponse_StablePartialResultThreshold=5 --profanity masked