top of page
Writer's pictureRevanth Reddy Tondapu

Part 4: Exploring Gemini Pro: Multimodal AI for Text and Image Applications


Multimodal AI for Text and Image Applications
Multimodal AI for Text and Image Applications

Hello everyone! Today, we are diving into the fascinating world of Gemini Pro, a powerful Large Language Model (LLM) that can handle both text and image inputs. Gemini Pro is multimodal, which means it can process and generate responses for both textual and visual data. In this blog post, we will explore two different projects that showcase the capabilities of Gemini Pro: one focused on text and the other on images.


Setting Up Your Environment

Before we get started, ensure you have the following prerequisites:

  1. Python Version: Python 3.9 or higher.

  2. API Key: Make sure you have your Gemini Pro API key ready. You can generate one from the Gemini API website.


Step 1: Create a Virtual Environment

Let's start by setting up a virtual environment. This will help manage our dependencies and keep our project organized.

conda create -p venv python=3.10

conda activate venv/

Step 2: Install Required Packages

Next, create a requirements.txt file with the following content:

streamlit
google-generativeai
python-dotenv

Install the packages by running:

pip install -r requirements.txt

Step 3: Set Up Environment Variables

Create a .env file to securely store your API key:

GOOGLE_API_KEY=your_api_key_here

Project 1: Text-Based Application

We'll start with a simple text-based application that leverages the Gemini Pro API to generate text responses.

Step 1: Create app.py

Create a file named app.py and add the following code:

import os
import streamlit as st
from dotenv import load_dotenv
import google.generativeai as genai

# Load environment variables
load_dotenv()

# Configure the API key
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))

# Function to get response from Gemini Pro
def get_gemini_response(question):
    model = genai.GenerativeModel('gemini-pro')
    response = model.generate_content(question)
    return response.text

# Streamlit app setup
st.set_page_config(page_title="Gemini LM Application")
st.header("Gemini LM Application")

# Input box
question = st.text_input("Input: ", key="input")

# Submit button
submit = st.button("Submit")

if submit:
    response = get_gemini_response(question)
    st.subheader("Response:")
    st.write(response)

Step 2: Run the Application

Run your Streamlit application:

streamlit run app.py

You should now see a web interface where you can input a question and get a response generated by Gemini Pro.


Project 2: Image-Based Application

Next, let's create an application that can analyze images and generate descriptive text.

Step 1: Create vision.py

Create a new file named vision.py and add the following code:

import os
import streamlit as st
from dotenv import load_dotenv
import google.generativeai as genai
from PIL import Image

# Load environment variables
load_dotenv()

# Configure the API key
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))

# Function to get response from Gemini Pro Vision
def get_gemini_response(input_text, image):
    model = genai.GenerativeModel('gemini-pro-vision')
    if input_text:
        response = model.generate_content([input_text, image])
    else:
        response = model.generate_content(image)
    return response.text

# Streamlit app setup
st.set_page_config(page_title="Gemini Image Application")
st.header("Gemini Image Application")

# Input box
input_text = st.text_input("Describe what you want to know about the image:")

# Image upload
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "png"])

if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption="Uploaded Image", use_column_width=True)

# Submit button
submit = st.button("Submit")

if submit and uploaded_file is not None:
    image = Image.open(uploaded_file)  # Convert to PIL Image
    response = get_gemini_response(input_text, image)
    st.subheader("Response:")
    st.write(response)

Step 2: Run the Application

Run your Streamlit application:

streamlit run vision.py

You should now see a web interface where you can upload an image and get a descriptive response generated by Gemini Pro Vision.


Conclusion

In this blog post, we explored the capabilities of Gemini Pro by creating two different applications: one for text-based interactions and the other for image analysis. Gemini Pro's multimodal capabilities make it a versatile tool for a wide range of applications, from text summarization and Q&A to image description and more.

Stay tuned for more tutorials on how to leverage advanced AI models to build powerful applications. Happy coding!

41 views0 comments

Recent Posts

See All

Comments


bottom of page