Building Amazing UIs with Google's Gemini 1.5 Pro

Welcome! Today, we’re diving into a fascinating new tool from Google that can revolutionize the way you create and interact with user interfaces. Google recently launched Gemini 1.5 Pro, a flagship model with an impressive 2 million tokens of context windows. What sets this model apart is its multimodal capability, meaning it can process images, videos, audio, and other inputs. This capability opens up endless possibilities for generating content and developing applications.

The Power of Gemini 1.5 Pro

Gemini 1.5 Pro is designed to handle vast amounts of information effortlessly. Whether it's processing an hour of video, 11 hours of audio, 30,000 lines of code, or over 700,000 words, this model can do it all. This extensive context capacity is particularly useful for generating large amounts of text or code. Plus, its multimodal nature allows you to send in images or videos and get contextual responses.

Gemini UI to Code

One of the standout applications of Gemini 1.5 Pro is its ability to convert UI screenshots into code. Today, we’ll be exploring a project called Gemini UI to Code, which leverages the power of Gemini 1.5 Pro to generate usable UIs from screenshots. This tool is incredibly straightforward to set up and use, making it a game-changer for developers looking to streamline their workflow.

How It Works

The beauty of Gemini UI to Code lies in its simplicity and effectiveness. Here’s a step-by-step breakdown of how it operates:

Upload a Screenshot: You start by providing a screenshot of the UI you want to generate.
Generate Description: The tool first generates a detailed description of the UI elements it sees in the screenshot.
Refine Outline: It then asks the model to refine this outline, checking for any errors.
Create HTML: Next, it generates an HTML file for the UI based on the refined outline.
Error Checking: The HTML file and the screenshot are sent back to the model for error checking.
Final Output: After fixing any mistakes, the final HTML file is generated and made available for download.

Setting Up Gemini UI to Code

Ready to give it a try? Here’s how you can set up and use Gemini UI to Code:

1. Create a Project Directory

Create a directory for your project and navigate into it:

mkdir my_project
cd my_project

2. Create a Virtual Environment

Create a virtual environment to manage your project dependencies:

conda create -p venv python=3.10
conda activate venv/

3. Create a requirements.txt File

Create a requirements.txt file in your project directory to list the necessary dependencies:

streamlit
Pillow
google-generativeai
python-dotenv

4. Install Required Packages

Install the packages listed in requirements.txt:

pip install -r requirements.txt

5. Create a .env File

Create a .env file in your project directory and add your GOOGLE_API_KEY:

GOOGLE_API_KEY=your_api_key_here

6. Create a Python Script

Create a new Python script file (e.g., main.py) in your project directory. Open this file in Visual Studio Code.

7. Write the Code

Copy the following code into your main.py file:

import streamlit as st
import pathlib
from PIL import Image
import google.generativeai as genai
from dotenv import load_dotenv

import os

# Load environment variables from .env file
load_dotenv()

# Set up your API key
google_api_key = os.getenv('GOOGLE_API_KEY')

genai.configure(api_key=google_api_key)

generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 64,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
}

safety_settings = [
    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
]

MODEL_NAME = "gemini-1.5-pro-latest"
framework = "Regular CSS use flex grid etc"

model = genai.GenerativeModel(
    model_name=MODEL_NAME,
    safety_settings=safety_settings,
    generation_config=generation_config,
)

chat_session = model.start_chat(history=[])

def send_message_to_model(message, image_path):
    image_input = {
        'mime_type': 'image/jpeg',
        'data': pathlib.Path(image_path).read_bytes()
    }
    response = chat_session.send_message([message, image_input])
    return response.text

def main():
    st.title("Gemini 1.5 Pro, UI to Code 👨‍💻 ")
    st.subheader('For more blogs, visit [Revanth Quick Learn](https://www.revanthquicklearn.com/)')

    uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])

    if uploaded_file is not None:
        try:
            # Load and display the image
            image = Image.open(uploaded_file)
            st.image(image, caption='Uploaded Image.', use_column_width=True)

            if image.mode == 'RGBA':
                image = image.convert('RGB')

            temp_image_path = pathlib.Path("temp_image.jpg")
            image.save(temp_image_path, format="JPEG")

            if st.button("Code UI"):
                st.write("🧑‍💻 Looking at your UI...")
                prompt = "Describe this UI in accurate details. When you reference a UI element put its name and bounding box in the format: [object name (y_min, x_min, y_max, x_max)]. Also Describe the color of the elements."
                description = send_message_to_model(prompt, temp_image_path)
                st.write(description)

                st.write("🔍 Refining description with visual comparison...")
                refine_prompt = f"Compare the described UI elements with the provided image and identify any missing elements or inaccuracies. Also Describe the color of the elements. Provide a refined and accurate description of the UI elements based on this comparison. Here is the initial description: {description}"
                refined_description = send_message_to_model(refine_prompt, temp_image_path)
                st.write(refined_description)

                st.write("🛠️ Generating website...")
                html_prompt = f"Create an HTML file based on the following UI description, using the UI elements described in the previous response. Include {framework} CSS within the HTML file to style the elements. Make sure the colors used are the same as the original UI. The UI needs to be responsive and mobile-first, matching the original UI as closely as possible. Do not include any explanations or comments. Avoid using ```html. and ``` at the end. ONLY return the HTML code with inline CSS. Here is the refined description: {refined_description}"
                initial_html = send_message_to_model(html_prompt, temp_image_path)
                st.code(initial_html, language='html')

                st.write("🔧 Refining website...")
                refine_html_prompt = f"Validate the following HTML code based on the UI description and image and provide a refined version of the HTML code with {framework} CSS that improves accuracy, responsiveness, and adherence to the original design. ONLY return the refined HTML code with inline CSS. Avoid using ```html. and ``` at the end. Here is the initial HTML: {initial_html}"
                refined_html = send_message_to_model(refine_html_prompt, temp_image_path)
                st.code(refined_html, language='html')

                with open("index.html", "w") as file:
                    file.write(refined_html)
                st.success("HTML file 'index.html' has been created.")

                st.download_button(label="Download HTML", data=refined_html, file_name="index.html", mime="text/html")
        except Exception as e:
            st.error(f"An error occurred: {e}")

if __name__ == "__main__":
    main()

8. Run the Script

Open a terminal in Visual Studio Code and run your script:

python main.py

Let's break down the code step-by-step to understand how it works.

Importing Libraries

import streamlit as st
import pathlib
from PIL import Image
import google.generativeai as genai
from dotenv import load_dotenv
import os

streamlit: Used to create the web application interface.
pathlib: Used for file system paths manipulation.
PIL (Python Imaging Library): Used for image processing.
google.generativeai: Used to interact with the Google Gemini API.
dotenv: Used to load environment variables from a .env file.
os: Provides a way of using operating system dependent functionality.

Loading Environment Variables

# Load environment variables from .env file
load_dotenv()

# Set up your API key
google_api_key = os.getenv('GOOGLE_API_KEY')

load_dotenv(): Loads environment variables from a .env file.
os.getenv('GOOGLE_API_KEY'): Retrieves the API key stored in the environment variable GOOGLE_API_KEY.

Configuring the Google Gemini API

genai.configure(api_key=google_api_key)

generation_config = {
    "temperature": 1,
    "top_p": 0.95,
    "top_k": 64,
    "max_output_tokens": 8192,
    "response_mime_type": "text/plain",
}

safety_settings = [
    {"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_NONE"},
    {"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"},
]

MODEL_NAME = "gemini-1.5-pro-latest"
framework = "Regular CSS use flex grid etc"

genai.configure(api_key=google_api_key): Configures the Google Gemini API with the provided API key.
generation_config: Configuration for generating responses, including parameters like temperature, top_p, top_k, etc.
safety_settings: Configures safety settings to avoid generating harmful content.
MODEL_NAME: Specifies the model name to be used.
framework: Specifies the CSS framework to be used (e.g., Flexbox, Grid).

Creating the Model and Chat Session

model = genai.GenerativeModel(
    model_name=MODEL_NAME,
    safety_settings=safety_settings,
    generation_config=generation_config,
)

chat_session = model.start_chat(history=[])

genai.GenerativeModel(): Initializes the generative model with the specified settings.
model.start_chat(history=[]): Starts a chat session with the model.

Function to Send Messages to the Model

def send_message_to_model(message, image_path):
    image_input = {
        'mime_type': 'image/jpeg',
        'data': pathlib.Path(image_path).read_bytes()
    }
    response = chat_session.send_message([message, image_input])
    return response.text

send_message_to_model(message, image_path): This function sends a message and an image to the model.
image_input: Prepares the image to be sent by reading it as bytes.
chat_session.send_message([message, image_input]): Sends the message and image to the model and returns the response.

Main Function for Streamlit App

def main():
    st.title("Gemini 1.5 Pro, UI to Code 👨‍💻 ")
    st.subheader('For more blogs, visit [Revanth Quick Learn](https://www.revanthquicklearn.com/)')

    uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "jpeg", "png"])

    if uploaded_file is not None:
        try:
            # Load and display the image
            image = Image.open(uploaded_file)
            st.image(image, caption='Uploaded Image.', use_column_width=True)

            if image.mode == 'RGBA':
                image = image.convert('RGB')

            temp_image_path = pathlib.Path("temp_image.jpg")
            image.save(temp_image_path, format="JPEG")

            if st.button("Code UI"):
                st.write("🧑‍💻 Looking at your UI...")
                prompt = "Describe this UI in accurate details. When you reference a UI element put its name and bounding box in the format: [object name (y_min, x_min, y_max, x_max)]. Also Describe the color of the elements."
                description = send_message_to_model(prompt, temp_image_path)
                st.write(description)

                st.write("🔍 Refining description with visual comparison...")
                refine_prompt = f"Compare the described UI elements with the provided image and identify any missing elements or inaccuracies. Also Describe the color of the elements. Provide a refined and accurate description of the UI elements based on this comparison. Here is the initial description: {description}"
                refined_description = send_message_to_model(refine_prompt, temp_image_path)
                st.write(refined_description)

                st.write("🛠️ Generating website...")
                html_prompt = f"Create an HTML file based on the following UI description, using the UI elements described in the previous response. Include {framework} CSS within the HTML file to style the elements. Make sure the colors used are the same as the original UI. The UI needs to be responsive and mobile-first, matching the original UI as closely as possible. Do not include any explanations or comments. Avoid using ```html. and ``` at the end. ONLY return the HTML code with inline CSS. Here is the refined description: {refined_description}"
                initial_html = send_message_to_model(html_prompt, temp_image_path)
                st.code(initial_html, language='html')

                st.write("🔧 Refining website...")
                refine_html_prompt = f"Validate the following HTML code based on the UI description and image and provide a refined version of the HTML code with {framework} CSS that improves accuracy, responsiveness, and adherence to the original design. ONLY return the refined HTML code with inline CSS. Avoid using ```html. and ``` at the end. Here is the initial HTML: {initial_html}"
                refined_html = send_message_to_model(refine_html_prompt, temp_image_path)
                st.code(refined_html, language='html')

                with open("index.html", "w") as file:
                    file.write(refined_html)
                st.success("HTML file 'index.html' has been created.")

                st.download_button(label="Download HTML", data=refined_html, file_name="index.html", mime="text/html")
        except Exception as e:
            st.error(f"An error occurred: {e}")

if __name__ == "__main__":
    main()

st.title() and st.subheader(): Set the title and subheader for the Streamlit app.
st.file_uploader(): Allows the user to upload an image file.
Image.open(uploaded_file): Opens the uploaded image file.
st.image(): Displays the uploaded image in the Streamlit app.
image.convert('RGB'): Converts the image to RGB mode if it has an alpha channel.
image.save(temp_image_path, format="JPEG"): Saves the uploaded image temporarily.
st.button("Code UI"): Adds a button to trigger the UI-to-code generation process.
send_message_to_model(prompt, temp_image_path): Sends the prompt and image to the model to generate descriptions and HTML code.
st.write(): Displays text or data in the Streamlit app.
st.code(): Displays code in a formatted manner in the Streamlit app.
with open("index.html", "w") as file: Saves the generated HTML code to a file.
st.download_button(): Adds a button to download the generated HTML file.

Generate Code: Once the application is running, head over to the provided URL, upload your screenshot, and click the "Generate Code" button. The tool will process the screenshot and generate the corresponding code.

Examples and Use Cases

To illustrate the tool’s capabilities, let's walk through a few examples:

Chat Interface

Upload a screenshot of a chat interface. The tool will generate a detailed description and then create a matching HTML file. This HTML file can be further customized or used as-is for building similar interfaces.

Google Homepage

Try uploading a screenshot of the Google homepage. The generated code may not be 100% accurate, but it will provide a solid starting point for developing a similar layout.

Custom Landing Pages

You can also use the tool to create boilerplate code for custom landing pages. Simply upload a screenshot of your desired layout, and let the tool generate the initial HTML and CSS.

Customization and Flexibility

One of the standout features of Gemini UI to Code is its flexibility. Because it’s built using Streamlit, you can easily modify the application to suit your needs. Whether you want to change the prompts, add new steps, or integrate with other models, the possibilities are endless.

Conclusion

Gemini 1.5 Pro and the Gemini UI to Code tool offer an exciting new way to create and interact with user interfaces. With its multimodal capabilities and extensive context windows, Gemini 1.5 Pro makes it easier than ever to generate high-quality, usable UIs from screenshots. Whether you’re a developer looking to streamline your workflow or a designer wanting to quickly prototype new ideas, this tool is a must-try.

If you found this post helpful, consider sharing it with your network. Your feedback and insights are always appreciated. Thank you for reading, and happy coding!