Part 12: Building an End-to-End YouTube Video Transcriber Using Google Gemini Pro

Introduction

In this blog post, we will create an exciting project that transcribes YouTube videos using Google Gemini Pro. Our aim is to provide a YouTube video link and extract the transcript text automatically, generating a concise summary of the video content. This project showcases the potential of generative AI in simplifying and automating tasks. Let's dive into the steps to build this application from scratch.

Agenda

Here's what we'll cover:

Introduction to the Project
Setting Up Your Environment
Creating the Requirements File
Writing the Code for the Application
Running the Application
Testing the Application
Conclusion

Introduction to the Project

The goal of this project is to transcribe YouTube videos and summarize the content. For example, when you input a YouTube video link, the application will generate a summary of the video's content by extracting the transcript and processing it using Google Gemini Pro.

Setting Up Your Environment

Before we get started, ensure you have the following prerequisites:

Python Version: Python 3.10 or higher.
API Keys: Ensure you have your Google Gemini Pro API key ready.

Step 1: Create a Virtual Environment

Let's start by setting up a virtual environment to manage our dependencies and keep our project organized.

conda create -p venv python=3.10 -y
conda activate venv/

Step 2: Install Required Packages

Create a requirements.txt file with the following content:

youtube_transcript_api
streamlit
google-generativeai
python-dotenv
pathlib

Install the packages by running:

pip install -r requirements.txt

Writing the Code for the Application

Create a Python Script (main.py) for the Application:

import streamlit as st
from dotenv import load_dotenv
import os
import google.generativeai as genai
from youtube_transcript_api import YouTubeTranscriptApi

# Load environment variables
load_dotenv() 

# Configure GenAI Key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Define the prompt for summarization
prompt = """You are a YouTube video summarizer. You will be taking the transcript text
and summarizing the entire video and providing the important summary in points
within 250 words. Please provide the summary of the text given here:  """

## Function to extract transcript details from YouTube videos
def extract_transcript_details(youtube_video_url):
    try:
        video_id = youtube_video_url.split("=")[1]
        transcript_text = YouTubeTranscriptApi.get_transcript(video_id)

        transcript = ""
        for i in transcript_text:
            transcript += " " + i["text"]

        return transcript

    except Exception as e:
        raise e

## Function to generate summary based on prompt from Google Gemini Pro
def generate_gemini_content(transcript_text, prompt):
    model = genai.GenerativeModel("gemini-pro")
    response = model.generate_content(prompt + transcript_text)
    return response.text

# Streamlit UI
st.title("YouTube Transcript to Detailed Notes Converter")
youtube_link = st.text_input("Enter YouTube Video Link:")

if youtube_link:
    video_id = youtube_link.split("=")[1]
    st.image(f"http://img.youtube.com/vi/{video_id}/0.jpg", use_column_width=True)

if st.button("Get Detailed Notes"):
    transcript_text = extract_transcript_details(youtube_link)

    if transcript_text:
        summary = generate_gemini_content(transcript_text, prompt)
        st.markdown("## Detailed Notes:")
        st.write(summary)

Running the Application

To run your Streamlit app, open your terminal and execute:

streamlit run main.py

Testing the Application

Once the app is running, open the Streamlit interface in your web browser. Type a YouTube video link in the input box and click "Get Detailed Notes." The application will display the video's thumbnail and generate a summary of the video's content.

Conclusion

Congratulations! You've successfully built a YouTube video transcriber using Google Gemini Pro. This project demonstrates the power of generative AI in automating complex tasks like video transcription and summarization. Feel free to experiment with different prompts and enhance the application further.

Happy coding!