Hello everyone! Today, we're diving into an exciting project that combines computer vision and artificial intelligence to create a math gesture program. The idea is simple: you'll be able to use your hands to draw shapes or write equations, and our AI model will solve them for you. We'll be using Python and various libraries to bring this idea to life.
Project Overview
We can break down our project into four main parts:
Detecting Hand Gestures: Using computer vision to detect hand movements and gestures.
Drawing: Capturing the drawing based on hand gestures.
Sending Data to AI: Sending the drawing to an AI model for interpretation and solving.
Creating an App: Wrapping everything in a user-friendly interface using Streamlit.
Step 1: Environment Setup
Before we start coding, let's set up our environment. We'll use a virtual environment to manage our dependencies.
Create and activate a virtual environment:
conda create -p venv python==3.10
conda activate venv/
2. Install the required packages:
pip install cvzone opencv-python numpy google-generativeai pillow streamlit
3. Create requirements.txt:
After installing all the required packages, generate the requirements.txt file:
opencv-python
google-generativeai
Pillow
streamlit
python-dotenv
Run :
pip install -r requirements.txt
Step 2: Writing the Code
Create a new Python file, main.py, and start by importing the necessary libraries:
import cv2
from cvzone.HandTrackingModule import HandDetector
import numpy as np
import google.generativeai as genai
from PIL import Image
import streamlit as st
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
Configure the Streamlit page:
st.set_page_config(layout="wide")
col1, col2 = st.columns([3, 2])
with col1:
run = st.checkbox('Run', value=True)
FRAME_WINDOW = st.image([])
with col2:
st.title("Answer")
output_text_area = st.subheader("")
Configure the AI model:
api_key = os.environ.get("GOOGLE_GENAI_API_KEY")
model_name = os.environ.get("GENAI_MODEL_NAME", "gemini-1.5-flash")
Initialize the webcam and hand detector:
cap = cv2.VideoCapture(1) # Use 0 if you have only one camera
cap.set(3, 1280)
cap.set(4, 720)
detector = HandDetector(staticMode=False, maxHands=1, modelComplexity=1, detectionCon=0.7, minTrackCon=0.5)
Define functions for hand detection, drawing, and sending data to the AI:
def getHandInfo(img):
hands, img = detector.findHands(img, draw=False, flipType=True)
if hands:
hand = hands[0]
lmList = hand["lmList"]
fingers = detector.fingersUp(hand)
return fingers, lmList
else:
return None
def draw(info, prev_pos, canvas):
fingers, lmList = info
current_pos = None
if fingers == [0, 1, 0, 0, 0]:
current_pos = lmList[8][0:2]
if prev_pos is None: prev_pos = current_pos
cv2.line(canvas, tuple(prev_pos), tuple(current_pos), (255, 0, 255), 10)
elif fingers == [1, 0, 0, 0, 0]:
canvas = np.zeros_like(img)
return current_pos, canvas
def sendToAI(model, canvas, fingers):
if fingers == [1, 1, 1, 1, 0]:
pil_image = Image.fromarray(canvas)
response = model.generate_content(["Solve this math problem", pil_image])
return response.text
Main Loop
Now, let's implement the main loop to capture frames from the webcam, detect hand gestures, draw on the canvas, and send data to the AI model:
prev_pos = None
canvas = None
image_combined = None
output_text = ""
while True:
success, img = cap.read()
img = cv2.flip(img, 1)
if canvas is None:
canvas = np.zeros_like(img)
info = getHandInfo(img)
if info:
fingers, lmList = info
prev_pos, canvas = draw(info, prev_pos, canvas)
output_text = sendToAI(model, canvas, fingers)
image_combined = cv2.addWeighted(img, 0.7, canvas, 0.3, 0)
FRAME_WINDOW.image(image_combined, channels="BGR")
if output_text:
output_text_area.text(output_text)
cv2.waitKey(1)
Step 3: Sending Data to AI
Configure the AI model to interpret the drawings and provide solutions:
genai.configure(api_key="YOUR_API_KEY")
model = genai.GenerativeModel('gemini-1.5-flash')
Step 4: Creating the App Interface
We will use Streamlit to create a user-friendly interface for our application. Streamlit makes it easy to build web apps for machine learning and data science projects.
Final Project Structure
Your project directory should look something like this:
math-gesture-project/
├── main.py
├── requirements.txt
Complete main.py Code
import os # For working with environment variables
import cv2 # OpenCV for computer vision tasks
from cvzone.HandTrackingModule import HandDetector # Hand detection module
import numpy as np # Numerical operations (for image arrays)
import google.generativeai as genai # Google Generative AI library
from PIL import Image # Image processing library (for AI interaction)
import streamlit as st # Web app framework
from dotenv import load_dotenv
# Load environment variables
load_dotenv()
# --- Streamlit App Configuration ---
st.set_page_config(layout="wide") # Set wide layout for the app
# --- Streamlit Columns ---
col1, col2 = st.columns([3, 2]) # Create two columns (3:2 ratio)
# --- Column 1: Video Feed and Drawing Canvas ---
with col1:
run = st.checkbox('Run', value=True) # Checkbox to start/stop the app
FRAME_WINDOW = st.image([]) # Placeholder to display the video feed
# --- Column 2: AI Answer ---
with col2:
st.title("Answer") # Title for the answer section
output_text_area = st.subheader("") # Placeholder for AI's response
# --- Load API Key and Model from Environment Variables ---
api_key = os.environ.get("GOOGLE_GENAI_API_KEY")
model_name = os.environ.get("GENAI_MODEL_NAME", "gemini-1.5-flash")
if not api_key:
raise ValueError("GOOGLE_GENAI_API_KEY environment variable not set!")
# --- Initialize Google Generative AI ---
genai.configure(api_key=api_key)
model = genai.GenerativeModel(model_name)
# --- OpenCV Video Capture Setup ---
cap = cv2.VideoCapture(0) # Initialize webcam (0 for default camera)
cap.set(3, 1280) # Set frame width to 1280 pixels
cap.set(4, 720) # Set frame height to 720 pixels
# --- Hand Detector Initialization ---
detector = HandDetector(staticMode=False, maxHands=1, modelComplexity=1,
detectionCon=0.7, minTrackCon=0.5)
# --- Helper Functions ---
def getHandInfo(img):
"""Detects hand and returns finger status and landmark list."""
hands, img = detector.findHands(img, draw=True, flipType=True)
if hands:
hand = hands[0] # Get the first hand detected
lmList = hand["lmList"] # Get list of landmark coordinates
fingers = detector.fingersUp(hand) # Get finger up/down status
return fingers, lmList
else:
return None # Return None if no hand detected
def draw(info, prev_pos, canvas):
"""Draws on the canvas based on finger gestures."""
fingers, lmList = info
current_pos = None
if fingers == [0, 1, 0, 0, 0]: # Index finger up (drawing mode)
current_pos = lmList[8][0:2] # Get tip of the index finger
if prev_pos is None:
prev_pos = current_pos # Initialize previous position
cv2.line(canvas, tuple(prev_pos), tuple(current_pos), (0, 0, 255), 10) # Draw a red line
elif fingers == [1, 0, 0, 0, 0]: # Thumb up (clear canvas)
canvas = np.zeros_like(img) # Create a blank canvas
return current_pos, canvas
def sendToAI(model, canvas, fingers):
"""Sends canvas to Google Generative AI for problem-solving."""
if fingers == [1, 1, 1, 1, 0]: # All fingers except pinky up (trigger AI)
pil_image = Image.fromarray(canvas) # Convert to PIL Image
response = model.generate_content(["Solve this math problem", pil_image])
return response.text
# --- Main Application Loop ---
prev_pos = None # Variable to store previous fingertip position
canvas = None # Initialize canvas as None (will be created in the loop)
output_text = "" # Variable to store the AI's response
while True:
success, img = cap.read() # Read a frame from the webcam
img = cv2.flip(img, 1) # Flip the frame horizontally (mirror effect)
if canvas is None:
canvas = np.zeros_like(img) # Create a blank canvas if it doesn't exist
info = getHandInfo(img) # Get hand information from the current frame
if info: # If a hand is detected
fingers, lmList = info
prev_pos, canvas = draw(info, prev_pos, canvas) # Update canvas with drawing
output_text = sendToAI(model, canvas, fingers) # Get answer from AI
# Combine video frame and canvas for transparent drawing effect
image_combined = cv2.addWeighted(img, 0.7, canvas, 0.3, 0)
# Update the Streamlit app with the combined image
FRAME_WINDOW.image(image_combined, channels="BGR")
# Display the AI's answer in the output text area
if output_text:
output_text_area.text(output_text)
cv2.waitKey(1) # Wait for 1 millisecond
Running the Project
To run the project on a new machine:
Clone the repository or copy the project folder.
Create and activate a virtual environment:
conda create -p venv python==3.10
conda activate venv/
3. Install the dependencies from requirements.txt:
pip install -r requirements.txt
4. Run the Streamlit app:
streamlit run main.py
Conclusion
In this post, we created a math gesture program that uses computer vision to detect hand gestures, allows you to draw shapes or write equations, and sends the drawing to an AI model to get the solution. We wrapped everything in a user-friendly interface using Streamlit.
This project demonstrates the power of combining computer vision and AI to create interactive and intelligent applications. Feel free to explore further and customize the project to suit your needs. If you have any questions or run into issues, drop a comment below. Happy coding!
I hope this comprehensive guide helps you in creating your math gesture program. If you have any further questions or need additional assistance, feel free to ask!
Comments