Summarize Youtube Videos without Subtitles with Deepseek R1
In this article, we used a simple Python script to download YouTube video subtitles and analyze/summarize them using AI. However, this only works for videos that has subtitles provided by their creators or auto-generated captions by Youtube. What about videos without either type of subtitle?
This scenario is a bit more complicated, but there are solutions—namely, using speech recognition tools. There are numerous STT (speech-to-text) tools available on GitHub. After trying out several popular options, I eventually selected Whisper, an open-source speech recognition project by OpenAI. In practical tests, it demonstrated high efficiency.
We can modify our previous process slightly to achieve our goal of automatically analyzing video content. The entire process can be broken down into 4 smaller tasks.
Download video audio
We can use the Python module yt-dlp to download video audio. You’ll need to install both yt-dlp and ffmpeg.
import yt-dlp
video_url = 'https://www.youtube.com/watch?v=NBUVP7Hegso'
video_id = url.split('v=')[1].split('&')[0]
mp3_file = os.path.join(your_path_to_mp3,f"{video_id}.mp3")
ydl_opts = {
'format': 'bestaudio/best',
'outtmpl': os.path.join(your_path_to_mp3, video_id),
'quiet': True,
'postprocessors': [{
'key': 'FFmpegExtractAudio',
'preferredcodec': 'mp3',
'preferredquality': '192',
}],
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
After running this script, you’ll have the audio file of your target video. Remember to set the path where you want to store the MP3 files.
Speech to text
Then use Whisper for speech-to-text conversion. Just a few lines of code are needed. While Whisper’s installation is more complicated than yt-dlp’s, with patience and following the installation instructions, it can be done.
import whisper model = whisper.load_model(path_to_your_whisper_model) result = model.transcribe(mp3_file) transcipt = result['text']
Use AI to organize and analyze subtitles
You will need access to an AI tool’s API, such as ChatGPT, Claude, Deepseek, or any other online or local AI tool. The example below shows how to use Deepseek’s API to summarize the transcript. The output is provided in Markdown format, making it easy to read.
from openai import OpenAI
api_key = 'your api key here'
base_url = 'https://api.deepseek.com'
assistant = OpenAI(api_key=api_key, base_url=base_url)
assistant.chat.completions.create(
model='deepseek-reasoner', #Deepseek R1 model used here
messages=[
{"role": "system", "content": "You are an office assistant"},
{"role": "user", "content": f"Text below is transcript of an audio recording. Make it more readable and summary it without translating): {transcipt}"},
],
max_tokens=8192,
temperature=0.8,
stream=False
)
summary = response.choices[0].message.content
Format text as HTML output
The final result is output in Markdown format, making it easy to import into note-taking software like Notion and Obsidian for further processing.
import markdown #pip install markdown if not yet
summary_html = markdown.markdown(summary)
summary_file = mp3_file.replace('.mp3','.html')
open(summary_file,'w').write(summary_html)
After integrating these code segments, we just need to input the video URL, and the program will automatically download the audio, perform speech recognition to get the text script, and finally have AI organize and analyze it, outputting an HTML file.