Skip to content

Instantly share code, notes, and snippets.

@oleksis
Created November 25, 2023 09:48
Show Gist options
  • Save oleksis/571327a0dbc6198cfb9acb06d8af48ef to your computer and use it in GitHub Desktop.
Save oleksis/571327a0dbc6198cfb9acb06d8af48ef to your computer and use it in GitHub Desktop.
Captioning with speech to text with Azure AI - Speech service

Azure AI - Speech service

Captioning with speech to text Automatically caption your content in real-time and offline by transcribing the audio of films, videos, live events, and more. Display the resulting text on a screen to provide an accessible experience.

Common use cases:

  • Captioning for video content such as films, live television, sports matches
  • Transcribing audio-only content like podcasts or phone conversations
  • Technologies used
    • (Speech SDK)
    • Speech to text
    • Phrase list

Software Development Lifecycle of yt-dlg

Continuous Integration and Deployment

Speech to text

Captioning

➜ cd SpeechText
➜ ffmpeg -i .\Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp4 .\Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3
➜ git clone --depth 1 https://github.com/Azure-Samples/cognitive-services-speech-sdk.git
➜ cd .\cognitive-services-speech-sdk\scenarios\python\console\captioning\
# Check GStreamer configuration$env:SPEECH_KEY='<YOUR_SPEECH_RESOURCE_KEY>'$env:SPEECH_REGION='<YOUR_SPEECH_RESOURCE_REGION>'
➜ py -m venv venv
➜ .\venv\Scripts\Activate.ps1
➜ py -m pip install azure-cognitiveservices-speech
➜ py ./cognitive-services-speech-sdk/scenarios/python/console/captioning/captioning.py --input Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3 --format mp3 --output caption.output.txt --language en-US --srt --threshold 5 --delay 0 --offline --maxLineLength 60 --lines 1 --profanity mask

# cognitive-services-speech-sdk/scenarios/csharp/dotnetcore/captioning
➜ dotnet run --project .\captioning\captioning.csproj --input Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3 --format mp3 --output caption.output.txt --language en-US --srt --threshold 5 --delay 0 --offline --maxLineLength 60 --lines 1 --profanity mask

# spx
➜ spx recognize --file Black_Python_Devs_Meetup_01_-_Oleksis_Preso_1.mp3 --format any --output srt file BPDevs-Codespaces-Coffee-Code-Tech.srt --output each file - @output.each.detailed --property SpeechServiceResponse_StablePartialResultThreshold=5 --profanity masked

Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment