Part 6: Building a Multi-Language Invoice Extractor Using Gemini Pro API

Introduction

In this blog post, we will guide you through creating an end-to-end project for a multi-language invoice extractor using the Gemini Pro API. The Gemini Pro API is versatile and accurate, making it ideal for real-world industry applications. We will walk you through each step, from setting up your environment to writing the code and running the application.

Agenda

Here's what we will cover:

Demo of the Multi-Language Invoice Extractor App
Setting Up Your Environment
Creating the Requirements File
Writing the Code Step by Step
Additional Improvements and Enhancements

Demo

Let’s start by demonstrating the app. In our demo, we will upload an invoice in Hindi and extract information like the address and date using the Gemini Pro API. For example, if we ask "What is the address in the invoice?", the app will return the address in the invoice. Similarly, it can handle other queries such as "What is the date in the invoice?".

Setting Up Your Environment

Before we get started, ensure you have the following prerequisites:

Python Version: Python 3.9 or higher.
API Key: Make sure you have your Gemini Pro API key ready. You can generate one from the Gemini API website.

Step 1: Create a Virtual Environment

Let's start by setting up a virtual environment. This will help manage our dependencies and keep our project organized.

conda create -p venv python=3.10
conda activate venv/

Step 2: Install Required Packages

Next, create a requirements.txt file with the following content:

streamlit
google-generativeai
python-dotenv
langchain
PyPDF2
chromadb
faiss-cpu

Install the packages by running:

pip install -r requirements.txt

Writing the Code

Create a Python Script (invoice.py):

from dotenv import load_dotenv
import streamlit as st
import os
from PIL import Image
import google.generativeai as genai

# Load environment variables
load_dotenv()

# Configure Gemini Pro API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Function to get response from Gemini Pro
def get_gemini_response(input_prompt, image, query):
    model = genai.GenerativeModel('gemini-pro-vision')
    response = model.generate_content([input_prompt, image[0], query])
    return response.text

# Function to prepare the uploaded image
def input_image_setup(uploaded_file):
    if uploaded_file is not None:
        bytes_data = uploaded_file.getvalue()
        image_parts = [
            {
                "mime_type": uploaded_file.type,
                "data": bytes_data
            }
        ]
        return image_parts
    else:
        raise FileNotFoundError("No file uploaded")

# Initialize Streamlit app
st.set_page_config(page_title="Gemini Image Demo")
st.header("Gemini Application")

# Input prompt and file uploader
input_query = st.text_input("Input Query: ", key="input")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])
image = ""
if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption="Uploaded Image.", use_column_width=True)

submit = st.button("Tell me about the image")

# Default input prompt
input_prompt = """
               You are an expert in understanding invoices.
               You will receive input images as invoices &
               you will have to answer questions based on the input image
               """

# Handle submit button click
if submit:
    image_data = input_image_setup(uploaded_file)
    response = get_gemini_response(input_prompt, image_data, input_query)
    st.subheader("The Response is")
    st.write(response)

Running the Application

To run your Streamlit app, open your terminal and execute:

streamlit run invoice.py

Replace invoice.py with the actual name of your Python script.

Testing the Chatbot

Once the app is running, open the Streamlit interface in your web browser. Upload an invoice image and type a query. For example:

User: What is the address in the invoice? Bot: 123 SBC Building, DEF Street
User: What is the date in the invoice? Bot: 2012-07-27

Additional Improvements and Enhancements

Here are some additional improvements you can make:

Database Integration: Store conversation histories in a database.
Multi-Format Support: Extend support to other document formats such as PDFs.
Advanced NLP: Integrate more advanced Natural Language Processing techniques for better query understanding.

Conclusion

Congratulations! You've successfully built a multi-language invoice extractor using the Gemini Pro API. This project showcases the power and versatility of generative AI in handling complex, real-world tasks. Keep experimenting with different applications and explore the vast potential of AI technologies.

Happy coding!