Skip to content

Instantly share code, notes, and snippets.

@wingsryder
Created February 18, 2025 11:13
Show Gist options
  • Save wingsryder/9b29ffe1b8fafe01b40421db5fa48a7d to your computer and use it in GitHub Desktop.
Save wingsryder/9b29ffe1b8fafe01b40421db5fa48a7d to your computer and use it in GitHub Desktop.
Process Vimeo transcription file and give in single para
# Read the uploaded .vtt file and filter out lines that start with a numeric value
# https://player.vimeo.com/texttrack/211225928.vtt?token=XXXXXX save in local and proces with code.
file_path = "Techabuse.vtt"
# Read the file
with open(file_path, "r", encoding="utf-8") as file:
lines = file.readlines()
# Filter lines that do not start with a numeric value
import re
filtered_lines = [line for line in lines if not re.match(r"^\d", line.strip())]
# Remove empty lines and strip spaces
filtered_text = " ".join([line.strip() for line in filtered_lines if line.strip()])
# Save the cleaned and formatted text to a new file
formatted_output_path = "TechAbuse_Webinar.txt"
with open(formatted_output_path, "w", encoding="utf-8") as file:
file.write(filtered_text)
formatted_output_path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment