Revanth Reddy Tondapu

Jul 45 min read

Creating a Math Gesture Program with AI Integration

Updated: Jul 5

Hello everyone! Today, we're diving into an exciting project that combines computer vision and artificial intelligence to create a math gesture program. The idea is simple: you'll be able to use your hands to draw shapes or write equations, and our AI model will solve them for you. We'll be using Python and various libraries to bring this idea to life.

Project Overview

We can break down our project into four main parts:

Detecting Hand Gestures: Using computer vision to detect hand movements and gestures.
Drawing: Capturing the drawing based on hand gestures.
Sending Data to AI: Sending the drawing to an AI model for interpretation and solving.
Creating an App: Wrapping everything in a user-friendly interface using Streamlit.

Step 1: Environment Setup

Before we start coding, let's set up our environment. We'll use a virtual environment to manage our dependencies.

Create and activate a virtual environment:

conda create -p venv python==3.10 
conda activate venv/

2. Install the required packages:

pip install cvzone opencv-python numpy google-generativeai pillow streamlit

3. Create requirements.txt:

After installing all the required packages, generate the requirements.txt file:

opencv-python
google-generativeai
Pillow
streamlit
python-dotenv

Run :

pip install -r requirements.txt

Step 2: Writing the Code

Create a new Python file, main.py, and start by importing the necessary libraries:

import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import google.generativeai as genai
from PIL import Image
import streamlit as st
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

Configure the Streamlit page:

st.set_page_config(layout="wide")

col1, col2 = st.columns([3, 2])
with col1:
    run = st.checkbox('Run', value=True)
    FRAME_WINDOW = st.image([])

with col2:
    st.title("Answer")
    output_text_area = st.subheader("")

Configure the AI model:

api_key = os.environ.get("GOOGLE_GENAI_API_KEY")
model_name = os.environ.get("GENAI_MODEL_NAME", "gemini-1.5-flash")

Initialize the webcam and hand detector:

cap = cv2.VideoCapture(1)  # Use 0 if you have only one camera
cap.set(3, 1280)
cap.set(4, 720)

detector = HandDetector(staticMode=False, maxHands=1, modelComplexity=1, detectionCon=0.7, minTrackCon=0.5)

Define functions for hand detection, drawing, and sending data to the AI:

def getHandInfo(img):
    hands, img = detector.findHands(img, draw=False, flipType=True)
    if hands:
        hand = hands[0]
        lmList = hand["lmList"]
        fingers = detector.fingersUp(hand)
        return fingers, lmList
    else:
        return None

def draw(info, prev_pos, canvas):
    fingers, lmList = info
    current_pos = None
    if fingers == [0, 1, 0, 0, 0]:
        current_pos = lmList[8][0:2]
        if prev_pos is None: prev_pos = current_pos
        cv2.line(canvas, tuple(prev_pos), tuple(current_pos), (255, 0, 255), 10)
    elif fingers == [1, 0, 0, 0, 0]:
        canvas = np.zeros_like(img)
    return current_pos, canvas

def sendToAI(model, canvas, fingers):
    if fingers == [1, 1, 1, 1, 0]:
        pil_image = Image.fromarray(canvas)
        response = model.generate_content(["Solve this math problem", pil_image])
        return response.text

Main Loop

Now, let's implement the main loop to capture frames from the webcam, detect hand gestures, draw on the canvas, and send data to the AI model:

prev_pos = None
canvas = None
image_combined = None
output_text = ""

while True:
    success, img = cap.read()
    img = cv2.flip(img, 1)

    if canvas is None:
        canvas = np.zeros_like(img)

    info = getHandInfo(img)
    if info:
        fingers, lmList = info
        prev_pos, canvas = draw(info, prev_pos, canvas)
        output_text = sendToAI(model, canvas, fingers)

    image_combined = cv2.addWeighted(img, 0.7, canvas, 0.3, 0)
    FRAME_WINDOW.image(image_combined, channels="BGR")

    if output_text:
        output_text_area.text(output_text)

    cv2.waitKey(1)

Step 3: Sending Data to AI

Configure the AI model to interpret the drawings and provide solutions:

genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-1.5-flash')

Step 4: Creating the App Interface

We will use Streamlit to create a user-friendly interface for our application. Streamlit makes it easy to build web apps for machine learning and data science projects.

Final Project Structure

Your project directory should look something like this:

math-gesture-project/
├── main.py
├── requirements.txt

Complete main.py Code

import os  # For working with environment variables
import cv2  # OpenCV for computer vision tasks
from cvzone.HandTrackingModule import HandDetector  # Hand detection module
import numpy as np  # Numerical operations (for image arrays)
import google.generativeai as genai  # Google Generative AI library
from PIL import Image  # Image processing library (for AI interaction)
import streamlit as st  # Web app framework
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# --- Streamlit App Configuration ---
st.set_page_config(layout="wide")  # Set wide layout for the app

# --- Streamlit Columns ---
col1, col2 = st.columns([3, 2])  # Create two columns (3:2 ratio)

# --- Column 1: Video Feed and Drawing Canvas ---
with col1:
    run = st.checkbox('Run', value=True)  # Checkbox to start/stop the app
    FRAME_WINDOW = st.image([])  # Placeholder to display the video feed

# --- Column 2: AI Answer ---
with col2:
    st.title("Answer")  # Title for the answer section
    output_text_area = st.subheader("")  # Placeholder for AI's response

# --- Load API Key and Model from Environment Variables ---
api_key = os.environ.get("GOOGLE_GENAI_API_KEY")
model_name = os.environ.get("GENAI_MODEL_NAME", "gemini-1.5-flash")

if not api_key:
    raise ValueError("GOOGLE_GENAI_API_KEY environment variable not set!")

# --- Initialize Google Generative AI ---
genai.configure(api_key=api_key)
model = genai.GenerativeModel(model_name)

# --- OpenCV Video Capture Setup ---
cap = cv2.VideoCapture(0)  # Initialize webcam (0 for default camera)
cap.set(3, 1280)  # Set frame width to 1280 pixels
cap.set(4, 720)  # Set frame height to 720 pixels

# --- Hand Detector Initialization ---
detector = HandDetector(staticMode=False, maxHands=1, modelComplexity=1, 
                        detectionCon=0.7, minTrackCon=0.5) 

# --- Helper Functions ---

def getHandInfo(img):
    """Detects hand and returns finger status and landmark list."""
    hands, img = detector.findHands(img, draw=True, flipType=True)
    if hands:
        hand = hands[0]  # Get the first hand detected
        lmList = hand["lmList"]  # Get list of landmark coordinates
        fingers = detector.fingersUp(hand)  # Get finger up/down status
        return fingers, lmList
    else:
        return None  # Return None if no hand detected

def draw(info, prev_pos, canvas):
    """Draws on the canvas based on finger gestures."""
    fingers, lmList = info
    current_pos = None
    if fingers == [0, 1, 0, 0, 0]:  # Index finger up (drawing mode)
        current_pos = lmList[8][0:2]  # Get tip of the index finger
        if prev_pos is None: 
            prev_pos = current_pos  # Initialize previous position
        cv2.line(canvas, tuple(prev_pos), tuple(current_pos), (0, 0, 255), 10) # Draw a red line
    elif fingers == [1, 0, 0, 0, 0]:  # Thumb up (clear canvas)
        canvas = np.zeros_like(img)  # Create a blank canvas
    return current_pos, canvas

def sendToAI(model, canvas, fingers):
    """Sends canvas to Google Generative AI for problem-solving."""
    if fingers == [1, 1, 1, 1, 0]: # All fingers except pinky up (trigger AI)
        pil_image = Image.fromarray(canvas)  # Convert to PIL Image
        response = model.generate_content(["Solve this math problem", pil_image])
        return response.text 

# --- Main Application Loop ---
prev_pos = None  # Variable to store previous fingertip position
canvas = None  # Initialize canvas as None (will be created in the loop)
output_text = ""  # Variable to store the AI's response

while True:
    success, img = cap.read()  # Read a frame from the webcam
    img = cv2.flip(img, 1)  # Flip the frame horizontally (mirror effect)

    if canvas is None:
        canvas = np.zeros_like(img)  # Create a blank canvas if it doesn't exist

    info = getHandInfo(img)  # Get hand information from the current frame
    if info:  # If a hand is detected
        fingers, lmList = info  
        prev_pos, canvas = draw(info, prev_pos, canvas) # Update canvas with drawing
        output_text = sendToAI(model, canvas, fingers) # Get answer from AI 

    # Combine video frame and canvas for transparent drawing effect
    image_combined = cv2.addWeighted(img, 0.7, canvas, 0.3, 0)  

    # Update the Streamlit app with the combined image
    FRAME_WINDOW.image(image_combined, channels="BGR") 

    # Display the AI's answer in the output text area
    if output_text:
        output_text_area.text(output_text) 

    cv2.waitKey(1)  # Wait for 1 millisecond

Running the Project

To run the project on a new machine:

Clone the repository or copy the project folder.
Create and activate a virtual environment:

conda create -p venv python==3.10 
conda activate venv/

3. Install the dependencies from requirements.txt:

pip install -r requirements.txt

4. Run the Streamlit app:

streamlit run main.py

Conclusion

In this post, we created a math gesture program that uses computer vision to detect hand gestures, allows you to draw shapes or write equations, and sends the drawing to an AI model to get the solution. We wrapped everything in a user-friendly interface using Streamlit.

This project demonstrates the power of combining computer vision and AI to create interactive and intelligent applications. Feel free to explore further and customize the project to suit your needs. If you have any questions or run into issues, drop a comment below. Happy coding!

I hope this comprehensive guide helps you in creating your math gesture program. If you have any further questions or need additional assistance, feel free to ask!