Building a Multi-Language Invoice Extractor Using Generative AI

Hello All, In this blog, we are going to create a multi-language invoice extractor application using Generative AI. The goal is to develop an app that can handle invoices in various languages, extract relevant information, and provide responses to user queries. This project will involve setting up the environment, installing necessary libraries, and writing the code step-by-step.

Agenda

Setting Up the Environment
Creating the Requirements File
Writing the Code Step-by-Step
Additional Improvements and Considerations

Let's start by discussing the functionality of our final product. The app will allow users to upload invoices in different languages. Using a powerful Generative AI API, it will extract specific information based on user queries. For example, if an invoice is in Telugu or Hindi and you ask for the address, the app will extract and display the address from the invoice.

Setting Up the Environment

First, we need to set up a virtual environment to ensure our project dependencies do not interfere with other projects.

conda create -n invoice_extractor_env python=3.10
conda activate invoice_extractor_env

Next, create a .env file to store your API key:

GOOGLE_API_KEY=your_api_key_here

Creating the Requirements File

Create a requirements.txt file with the following content:

streamlit
google-generativeai
python-dotenv
langchain
PyPDF2
chromadb
faiss-cpu

Install the dependencies:

pip install -r requirements.txt

Writing the Code Step-by-Step

Now, let’s start writing the code. Create a file named app.py and add the following code:

Import Libraries and Load Environment Variables

import streamlit as st
from dotenv import load_dotenv
import os
from PIL import Image
import google.generativeai as genai

# Load environment variables from .env file
load_dotenv()
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

Function to Load the Generative AI Model and Get Responses

def get_gemini_response(input_prompt, image, user_prompt):
    model = genai.GenerativeModel('gemini-pro-vision')
    response = model.generate_content([input_prompt, image[0], user_prompt])
    return response.text

Function to Process the Uploaded Image

def input_image_setup(uploaded_file):
    if uploaded_file is not None:
        bytes_data = uploaded_file.getvalue()
        image_parts = [
            {
                "mime_type": uploaded_file.type,
                "data": bytes_data
            }
        ]
        return image_parts
    else:
        raise FileNotFoundError("No file uploaded")

Streamlit App Setup

# Initialize Streamlit app
st.set_page_config(page_title="Multi-Language Invoice Extractor")
st.header("Multi-Language Invoice Extractor")

# Input prompt for the AI model
input_prompt = """
               You are an expert in understanding invoices.
               You will receive input images as invoices &
               you will have to answer questions based on the input image.
               """

# User input and file uploader
user_prompt = st.text_input("Ask a question about the invoice:")
uploaded_file = st.file_uploader("Choose an invoice image...", type=["jpg", "jpeg", "png"])

if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption="Uploaded Invoice Image", use_column_width=True)

# Submit button
submit = st.button("Extract Information")

# If submit button is clicked, process the image and get the response
if submit and uploaded_file is not None:
    image_data = input_image_setup(uploaded_file)
    response = get_gemini_response(input_prompt, image_data, user_prompt)
    st.subheader("Extracted Information")
    st.write(response)

Additional Improvements and Considerations

While the basic functionality is now in place, there are several improvements and additional features you could consider:

Error Handling: Add more robust error handling to manage issues like unsupported file types or missing API keys.
Multi-Page Support: Enhance the app to handle multi-page invoices.
Language Detection: Automatically detect the language of the invoice and adjust the prompts accordingly.
Database Integration: Store extracted information in a database for further analysis and reporting.
UI/UX Enhancements: Improve the user interface for better user experience, perhaps by adding more descriptive labels or progress indicators.

Running the Application

To start the application, run the following command:

streamlit run app.py

This will open a web interface where you can upload invoice images, ask questions about them, and get responses using the Generative AI model.

Conclusion

Congratulations! You've built a complete end-to-end multi-language invoice extractor using Generative AI. This project showcases how powerful AI can be in automating the extraction of information from complex documents, making it a valuable tool for businesses of all sizes.

If you have any questions or suggestions, feel free to leave a comment. Happy coding!