Exploring OpenAI Vision: A Local Tool for Vision Models

Revanth Reddy Tondapu

Jul 15, 20242 min read

OpenAI Vision: A Local Tool for Vision Models

In the world of large language models, we have a plethora of choices for text-to-text formats. However, when it comes to local tools for vision models, our options are quite limited. Today, I want to introduce you to OpenAI Vision, a free, open-source tool that is API-based and allows you to interact with your images locally, freely, and privately. This blog post will guide you through the installation process and demonstrate how to use this tool on Windows, Mac, or any flavor of Linux.

What is OpenAI Vision?

OpenAI Vision is an API server that you can install and run on your local machine. It enables you to interact with your images without the need for external API calls, ensuring that your data remains private and offline. The tool is lightweight, focusing on providing a simple yet effective way to chat with your images.

Key Features

Local and Private: Your data stays on your machine, ensuring privacy.
API-Based: Easy to integrate with your existing applications.
Supports Multiple Models: Compatible with a wide range of vision models.
Cross-Platform: Works on Windows, Mac, and Linux.

Installation Guide

Let's walk through the installation process on a Linux system. The steps are similar for other operating systems.

Prerequisites

Docker: Ensure Docker is installed on your system. If not, you can find installation guides online.
Python: Make sure you have Python installed.

Step-by-Step Installation

Clone the Repository:

git clone https://github.com/matatonic/openedai-vision.git
cd openedai-vision

Create a Virtual Environment:

conda create -p venv python==3.10
conda activate venv/

Install Requirements:

pip install -r requirements.txt

Configure Environment Variables: Rename the sample environment files:

cp vision.sample.env vision.env
cp vision.dasal.sample.env vision.dasal.env

Edit vision.env to specify your model and any necessary tokens.

CLI_COMMAND="python vision.py -m vikhyatk/mondream2 --use-flash-attn --load-in-4bit"

Run Docker Compose:

sudo chmod 666 /var/run/docker.
sudo docker-compose up -d

Verify Installation: Check if the server is running:

sudo docker ps

You should see the server running on localhost:50006.

Using OpenAI Vision

Once the server is up and running, you can start interacting with your images. Here are a few examples:

Chat with an Image from a URL

python chat_with_images.py -1 https://example.com/image.jpg "Describe the image."

Chat with a Local Image

python chat_with_images.py -1 /path/to/your/image.jpg "Describe the image."

Example Outputs

Image Description:

Input: An image of a green leaf.
Output: "This image features a large green leaf with a distinct fan-like shape."

Scene Description:

Input: An image of a sunset with kangaroos.
Output: "A serene sunset scene with silhouettes of kangaroos and birds against a vibrant orange sky. The sun is setting behind a large tree, casting a warm glow over the landscape."

Conclusion

OpenAI Vision is a powerful tool for anyone looking to work with vision models locally. It offers a simple, private, and efficient way to interact with your images. Whether you're a developer integrating vision capabilities into your application or a researcher experimenting with different models, OpenAI Vision provides a robust solution.

Feel free to explore the tool and let us know your thoughts. If you find this content helpful, consider sharing it with your network. Happy experimenting!