With the release of the Gemma 2 model by Google, the landscape of large language models (LLMs) has taken a significant leap forward. This model comes in two variants: a 9-billion parameter model and a 27-billion parameter model. What makes Gemma 2 particularly fascinating is its performance, which rivals and sometimes surpasses other prominent models like LLaMA 3 and GPT-3.
In this blog post, we'll delve into the capabilities of Gemma 2, explore its performance on various benchmarks, and demonstrate how to integrate this model into your own applications using tools like LangChain, Llama Index, and Chainlit. Let's get started!
Unpacking Gemma 2
The Gemma 2 model is available in two configurations:
9-billion parameters: Competes well with larger models in terms of efficiency and performance.
27-billion parameters: Matches or even outperforms some of the most advanced models like LLaMA 3 (70-billion parameters).
Key Features
Instruction fine-tuning: Enhanced performance on instruction-based tasks.
Commercial-friendly license: Available for a wide range of applications.
Compatibility: Runs efficiently on hardware like Nvidia A100 80GB and Nvidia H100.
Performance Benchmarks
Gemma 2 has shown impressive performance across various benchmarks:
GSM 8K: Nearly matches the performance of much larger models.
Coding Challenges: Demonstrates strong capabilities in generating functional code.
Logical and Reasoning Tests: Shows high accuracy in answering complex questions.
Getting Started with Gemma 2
To explore the capabilities of Gemma 2, we'll start with a simple coding challenge and then move on to logical and reasoning tests.
Setting Up
First, make sure you have the necessary Python packages installed. You can install them using the following command:
pip install accelerate sentencepiece transformers
Coding Challenge
Let's test the Gemma 2 model on a series of Python coding challenges.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
"google/gemma-2-9b-it",
device_map="auto",
torch_dtype=torch.bfloat16
)
def generate(input_text, system_prompt="", max_length=512):
if system_prompt != "":
system_prompt = system_prompt
else:
system_prompt = "You are a friendly and helpful assistant"
# Create the prompt
messages = [
{"role": "user", "content": system_prompt + '\n\n' + input_text},
]
# Convert the prompt to tokens
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt").to("cuda")
# Generate the response
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=max_length, do_sample=True, temperature=0.1, top_k=50)
text = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
print(text)
# Example usage
generate('Give me a meal plan for me today based on your preference', system_prompt="You are a Helpful Assistant", max_length=512)
Logical and Reasoning Test
Next, we test the model's reasoning capabilities using the GSM 8K dataset.
questions = [
"If I have 8 apples and give away 3, how many do I have left?",
"What is the capital of France?",
"How many continents are there on Earth?"
]
for question in questions:
generate(question, system_prompt="You are a Helpful Assistant", max_length=512)
Integrating Gemma 2 with Applications
You can integrate Gemma 2 into your applications using various tools such as Ollama, LangChain, and Llama Index. Here's how you can do it.
Using Ollama
First, install the necessary packages:
pip install ollama langchain_community chainlit llama-index
Then, pull the Gemma 2 model:
ollama pull gemma2
Simple Usage with Ollama
import ollama
stream = ollama.chat(
model='gemma2',
messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
stream=True,
)
for chunk in stream:
print(chunk['message']['content'], end='', flush=True)
Using LangChain
LangChain provides a simple interface to interact with LLMs.
from langchain_community.llms import Ollama
llm = Ollama(model="gemma2")
print(llm.invoke("Why is the sky blue?"))
Using Llama Index
Llama Index can be used to manage and query large datasets with LLMs.
from llama_index.llms.ollama import Ollama
llm = Ollama(model="gemma2")
llm.complete("Why is the sky blue?")
Creating a User Interface with Chainlit
Chainlit allows you to create a user interface for interacting with the model.
Example Code
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import Runnable
from langchain.schema.runnable.config import RunnableConfig
import chainlit as cl
@cl.on_chat_start
async def on_chat_start():
await cl.Message(content="Hello there, I am Gemma. How can I help you ?").send()
model = Ollama(model="gemma2")
prompt = ChatPromptTemplate.from_messages(
[
("system", "You're a very knowledgeable assistant who provides accurate and eloquent answers."),
("human", "{question}"),
]
)
runnable = prompt | model | StrOutputParser()
cl.user_session.set("runnable", runnable)
@cl.on_message
async def on_message(message: cl.Message):
runnable = cl.user_session.get("runnable") # type: Runnable
msg = cl.Message(content="")
async for chunk in runnable.astream(
{"question": message.content},
config=RunnableConfig(callbacks=[cl.LangchainCallbackHandler()]),
):
await msg.stream_token(chunk)
await msg.send()
# Run the Chainlit server
if __name__ == "__main__":
cl.run()
Run Chainlit
To run the Chainlit server:
chainlit run ui.py
Test the User Interface
Open the provided URL in your browser and interact with the model by asking questions.
Conclusion
Gemma 2 is a powerful addition to the landscape of large language models. Its impressive performance on coding challenges, logical reasoning tasks, and its ease of integration with various tools make it a valuable asset for developers. Whether you are looking to build sophisticated AI applications or simply explore the capabilities of state-of-the-art models, Gemma 2 offers a versatile and robust solution.
We hope you found this tutorial helpful. Happy coding! 🚀
If you have any questions or need further assistance, feel free to reach out. Don't forget to like, share, and subscribe for more content!
Comments