top of page
  • Writer's pictureRevanth Reddy Tondapu

Part 7: Building a Chatbot to Interact with Multiple PDF Documents Using Google Gemini and LangChain


Chatbot to Interact with Multiple PDF Documents Using Google Gemini and LangChain
Chatbot to Interact with Multiple PDF Documents Using Google Gemini and LangChain

Introduction

In this blog post, we will guide you through creating an end-to-end project where you can chat with multiple PDF documents using Google Gemini and LangChain. This project aims to highlight the integration of Google Gemini Pro with LangChain and demonstrate how you can create vector embeddings from PDF documents. We will also showcase the use of Facebook AI Similarity Search (FAISS) for vector embeddings. Let's dive into the project step-by-step.


Agenda

Here's what we will cover:

  1. Demo of the Multi-PDF Chatbot

  2. Setting Up Your Environment

  3. Creating the Requirements File

  4. Writing the Code Step by Step

  5. Additional Improvements and Enhancements


Demo

In our demo, we will upload multiple PDF files and convert them into vector embeddings. Once processed, you can ask questions about the content of these PDFs, and the chatbot will provide detailed answers based on the context. For example, if you upload research papers and ask, "What is scaled dot product attention?" the chatbot will retrieve and display the relevant information from the PDFs.


Setting Up Your Environment

Before we get started, ensure you have the following prerequisites:

  • Python Version: Python 3.9 or higher.

  • API Key: Make sure you have your Gemini Pro API key ready. You can generate one from the Google API website.


Step 1: Create a Virtual Environment

Let's start by setting up a virtual environment to manage our dependencies and keep our project organized.

conda create -p venv python=3.10
conda activate venv/

Step 2: Install Required Packages

Create a requirements.txt file with the following content:

streamlit
google-generativeai
python-dotenv
langchain
PyPDF2
chromadb
faiss-cpu
langchain_google_genai

Install the packages by running:

pip install -r requirements.txt

Writing the Code

import streamlit as st
from PyPDF2 import PdfReader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
from langchain_google_genai import GoogleGenerativeAIEmbeddings
import google.generativeai as genai
from langchain.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains.question_answering import load_qa_chain
from langchain.prompts import PromptTemplate
from dotenv import load_dotenv

# Load environment variables
load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Function to get text from PDF
def get_pdf_text(pdf_docs):
    text = ""
    for pdf in pdf_docs:
        pdf_reader = PdfReader(pdf)
        for page in pdf_reader.pages:
            text += page.extract_text()
    return text

# Function to split text into chunks
def get_text_chunks(text):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=1000)
    chunks = text_splitter.split_text(text)
    return chunks

# Function to create vector store
def get_vector_store(text_chunks):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    vector_store = FAISS.from_texts(text_chunks, embedding=embeddings)
    vector_store.save_local("faiss_index")

# Function to create conversational chain
def get_conversational_chain():
    prompt_template = """
    Answer the question as detailed as possible from the provided context, make sure to provide all the details.
    If the answer is not in the provided context, just say, 'answer is not available in the context'. Don't provide the wrong answer.

    Context:\n {context}?\n
    Question: \n{question}\n

    Answer:
    """
    model = ChatGoogleGenerativeAI(model="gemini-pro", temperature=0.3)
    prompt = PromptTemplate(template=prompt_template, input_variables=["context", "question"])
    chain = load_qa_chain(model, chain_type="stuff", prompt=prompt)
    return chain

# Function to handle user input
def user_input(user_question):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
    new_db = FAISS.load_local("faiss_index", embeddings)
    docs = new_db.similarity_search(user_question)
    chain = get_conversational_chain()
    response = chain({"input_documents": docs, "question": user_question}, return_only_outputs=True)
    st.write("Reply: ", response["output_text"])

# Main function to run the Streamlit app
def main():
    st.set_page_config("Chat PDF")
    st.header("Chat with PDF using Gemini💁")

    user_question = st.text_input("Ask a Question from the PDF Files")

    if user_question:
        user_input(user_question)

    with st.sidebar:
        st.title("Menu:")
        pdf_docs = st.file_uploader("Upload your PDF Files and Click on the Submit & Process Button", accept_multiple_files=True)
        if st.button("Submit & Process"):
            with st.spinner("Processing..."):
                raw_text = get_pdf_text(pdf_docs)
                text_chunks = get_text_chunks(raw_text)
                get_vector_store(text_chunks)
                st.success("Done")

if __name__ == "__main__":
    main()

Running the Application

To run your Streamlit app, open your terminal and execute:

streamlit run multipdfchat.py

Testing the Chatbot

Once the app is running, open the Streamlit interface in your web browser. Upload multiple PDF files and type a question in the input box. For example:

  1. User: What is scaled dot product attention? Bot: Scaled product attention is a mechanism used in the transformer model for calculating the attention weights between different positions in sequences.

  2. User: Provide a summary about Multi-head attention. Bot: Multi-head attention is a technique used in transformer models to improve its ability to attend to different parts of the input sequence.


Additional Improvements and Enhancements

Here are some additional improvements you can make:

  • Database Integration: Store conversation histories in a database.

  • Multi-Format Support: Extend support to other document formats such as PDFs, Word documents, etc.

  • Advanced NLP: Integrate more advanced Natural Language Processing techniques for better query understanding.


Conclusion

Congratulations! You've successfully built a multi-language PDF chatbot using Google Gemini and LangChain. This project showcases the power and versatility of generative AI in handling complex, real-world tasks. Keep experimenting with different applications and explore the vast potential of AI technologies.

Happy coding!

17 views0 comments

Comentarios


bottom of page