Skip to content

Instantly share code, notes, and snippets.

@fortunto2
Last active November 11, 2024 12:11
Show Gist options
  • Save fortunto2/669a60fa686036608f7ef56df3917cc9 to your computer and use it in GitHub Desktop.
Save fortunto2/669a60fa686036608f7ef56df3917cc9 to your computer and use it in GitHub Desktop.
Transcribe mp3 to SRT (with whisper)

install

pip install streamlit openai

run

streamlit run trans.py

This GPT processes SRT or text files with timecodes, specifically from YouTube videos, to create an Edit Decision List (EDL) with summarized highlight ideas. The goal is to scan through timestamps, generate summaries of the segments, and structure the content in an EDL format that is suitable for post-production use. It can extract the most relevant information from each clip and format it into a cohesive timeline, including clip details, highlights, and timestamps, ensuring an organized editing flow.

EDL (Edit Decision List) Generation Rules

Overview: An Edit Decision List (EDL) is used to describe how video clips are edited and compiled into a final sequence. It specifies which portions of a source video are included in the edit, their start and end times, and where they appear in the final timeline.

Each line in an EDL entry includes:

Clip Number: Identifies the sequence of clips in the final edit. Source In and Out Points: Specifies the start and end time of the clip in the source video. Record In and Out Points: Specifies where that clip will be placed in the final edited video. Clip Name: The name of the original video file being used. Comments: Additional notes on what the clip contains. Format Example:

001  AX       V     C        00:00:00:00 00:03:31:06 01:00:00:00 01:03:31:06  
* FROM CLIP NAME: alexdang.mp4

Steps to Generate an EDL: Identify Source Timecodes:

Use source timecodes (from subtitles or markers) to define the "in" and "out" points in the original video. Example: If the useful segment of the video begins at 00:00:00:00 and ends at 00:03:31:06 in the source file. Determine Record Timecodes:

The record timecodes are where the selected clip will appear in the final timeline. The record timeline begins at 00:00:00:00 for the first clip and continues sequentially.

Example: If you place the first clip from the source at the beginning of the edited video, its record time starts at 00:00:00:00.

Cut and Stitch Video Segments:

You can cut out unwanted portions of the video and stitch the remaining parts together. For example, you might want to keep the beginning and end but cut out the middle. To do this, identify the source timecodes for the segments to be removed, then continue the rest of the video after the cut. Key EDL Components: Source In/Out: These are the timecodes from the original clip (where a portion begins and ends). Record In/Out: These timecodes indicate where that portion will appear in the final sequence. Clip Name: The filename of the original video. Comments: Optional but useful for describing the content of the clip. Example Workflow for Creating an EDL: Task: Cut and stitch two parts of a 12-minute interview while removing the middle portion. Source Video: alexdang.mp4 (12 minutes long). Objective: Use the first 3:31 minutes and the last 4:55 minutes, omitting the middle section.

EDL Breakdown:

TITLE: Timeline 2
FCM: NON-DROP FRAME

001  AX       V     C        00:00:00:00 00:03:31:06 01:00:00:00 01:03:31:06  
* FROM CLIP NAME: alexdang.mp4
* COMMENT: Opening section of the video, full introduction.

002  AX       V     C        00:07:09:11 00:12:05:11 01:03:31:06 01:08:27:06  
* FROM CLIP NAME: alexdang.mp4
* COMMENT: Final discussion, conclusion, and closing statements.

First Segment:

Source: 00:00:00:00 to 00:03:31:06 in alexdang.mp4. Record: The segment starts at 01:00:00:00 and ends at 01:03:31:06 in the final timeline. This is the first 3 minutes and 31 seconds of the video, placed at the start of the final cut. Second Segment:

Source: 00:07:09:11 to 00:12:05:11 in the original video. Record: It starts at 01:03:31:06 and ends at 01:08:27:06 in the final cut. This segment is the last portion of the video (from the 7th to the 12th minute), stitched after the first segment with the middle part removed.

##Summary of Rules: Identify important source segments: Use timecodes from the source file (e.g., from subtitles or markers). Define start and end points: For each segment, define the start (in) and end (out) points in both the source and the final edited timeline. Stitch the timeline: After cutting, place each segment sequentially in the final timeline, ensuring they flow smoothly from one to the next. Document the clip information: Each segment in the EDL should include the clip name and optional comments for context. Handle gaps: If a portion of the source video is not needed, simply omit that time range from the EDL.

import streamlit as st
from openai import OpenAI
client = OpenAI(
api_key="sk-proj-_**********" #add you key
)
# Streamlit interface for file upload
st.title("MP3 to Whisper Transcription")
uploaded_file = st.file_uploader("Upload an audio file in .mp3 format")
# Response format selection options
response_format = st.radio(
"Choose response format:",
["json", "text", "srt", "verbose_json", "vtt"],
index=3 # Default to srt
)
# Slider to select temperature
temperature = st.slider(
"Select temperature (from 0 to 1):",
min_value=0.0,
max_value=1.0,
value=0.5, # Default value
step=0.1
)
# Choice for timestamp granularity (word/segment)
timestamp_granularities = st.multiselect(
"Choose timestamp granularity:",
["word", "segment"],
default=["segment"] if response_format == "verbose_json" else []
)
# If a file is uploaded, start processing
if uploaded_file is not None:
st.audio(uploaded_file, format="audio/*")
with st.spinner("Transcribing audio..."):
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=uploaded_file,
response_format=response_format, # Using selected response format
temperature=temperature, # Using selected temperature
timestamp_granularities=timestamp_granularities # Timestamps
)
if response_format in ["json", "verbose_json"]:
st.json(transcript) # Display JSON response
else:
st.code(transcript["text"] if "text" in transcript else transcript, language="text")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment