Hello everyone! Today, we are diving into the fascinating world of Gemini Pro, a powerful Large Language Model (LLM) that can handle both text and image inputs. Gemini Pro is multimodal, which means it can process and generate responses for both textual and visual data. In this blog post, we will explore two different projects that showcase the capabilities of Gemini Pro: one focused on text and the other on images.
Setting Up Your Environment
Before we get started, ensure you have the following prerequisites:
Python Version: Python 3.9 or higher.
API Key: Make sure you have your Gemini Pro API key ready. You can generate one from the Gemini API website.
Step 1: Create a Virtual Environment
Let's start by setting up a virtual environment. This will help manage our dependencies and keep our project organized.
conda create -p venv python=3.10
conda activate venv/
Step 2: Install Required Packages
Next, create a requirements.txt file with the following content:
streamlit
google-generativeai
python-dotenv
Install the packages by running:
pip install -r requirements.txt
Step 3: Set Up Environment Variables
Create a .env file to securely store your API key:
GOOGLE_API_KEY=your_api_key_here
Project 1: Text-Based Application
We'll start with a simple text-based application that leverages the Gemini Pro API to generate text responses.
Step 1: Create app.py
Create a file named app.py and add the following code:
import os
import streamlit as st
from dotenv import load_dotenv
import google.generativeai as genai
# Load environment variables
load_dotenv()
# Configure the API key
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
# Function to get response from Gemini Pro
def get_gemini_response(question):
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content(question)
return response.text
# Streamlit app setup
st.set_page_config(page_title="Gemini LM Application")
st.header("Gemini LM Application")
# Input box
question = st.text_input("Input: ", key="input")
# Submit button
submit = st.button("Submit")
if submit:
response = get_gemini_response(question)
st.subheader("Response:")
st.write(response)
Step 2: Run the Application
Run your Streamlit application:
streamlit run app.py
You should now see a web interface where you can input a question and get a response generated by Gemini Pro.
Project 2: Image-Based Application
Next, let's create an application that can analyze images and generate descriptive text.
Step 1: Create vision.py
Create a new file named vision.py and add the following code:
import os
import streamlit as st
from dotenv import load_dotenv
import google.generativeai as genai
from PIL import Image
# Load environment variables
load_dotenv()
# Configure the API key
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
# Function to get response from Gemini Pro Vision
def get_gemini_response(input_text, image):
model = genai.GenerativeModel('gemini-pro-vision')
if input_text:
response = model.generate_content([input_text, image])
else:
response = model.generate_content(image)
return response.text
# Streamlit app setup
st.set_page_config(page_title="Gemini Image Application")
st.header("Gemini Image Application")
# Input box
input_text = st.text_input("Describe what you want to know about the image:")
# Image upload
uploaded_file = st.file_uploader("Upload an image", type=["jpg", "png"])
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Image", use_column_width=True)
# Submit button
submit = st.button("Submit")
if submit and uploaded_file is not None:
image = Image.open(uploaded_file) # Convert to PIL Image
response = get_gemini_response(input_text, image)
st.subheader("Response:")
st.write(response)
Step 2: Run the Application
Run your Streamlit application:
streamlit run vision.py
You should now see a web interface where you can upload an image and get a descriptive response generated by Gemini Pro Vision.
Conclusion
In this blog post, we explored the capabilities of Gemini Pro by creating two different applications: one for text-based interactions and the other for image analysis. Gemini Pro's multimodal capabilities make it a versatile tool for a wide range of applications, from text summarization and Q&A to image description and more.
Stay tuned for more tutorials on how to leverage advanced AI models to build powerful applications. Happy coding!
Comments