Exploring Gemma 2: A New Frontier in Large Language Models

With the release of the Gemma 2 model by Google, the landscape of large language models (LLMs) has taken a significant leap forward. This model comes in two variants: a 9-billion parameter model and a 27-billion parameter model. What makes Gemma 2 particularly fascinating is its performance, which rivals and sometimes surpasses other prominent models like LLaMA 3 and GPT-3.

In this blog post, we'll delve into the capabilities of Gemma 2, explore its performance on various benchmarks, and demonstrate how to integrate this model into your own applications using tools like LangChain, Llama Index, and Chainlit. Let's get started!

Unpacking Gemma 2

The Gemma 2 model is available in two configurations:

9-billion parameters: Competes well with larger models in terms of efficiency and performance.
27-billion parameters: Matches or even outperforms some of the most advanced models like LLaMA 3 (70-billion parameters).

Key Features

Instruction fine-tuning: Enhanced performance on instruction-based tasks.
Commercial-friendly license: Available for a wide range of applications.
Compatibility: Runs efficiently on hardware like Nvidia A100 80GB and Nvidia H100.

Performance Benchmarks

Gemma 2 has shown impressive performance across various benchmarks:

GSM 8K: Nearly matches the performance of much larger models.
Coding Challenges: Demonstrates strong capabilities in generating functional code.
Logical and Reasoning Tests: Shows high accuracy in answering complex questions.

Getting Started with Gemma 2

To explore the capabilities of Gemma 2, we'll start with a simple coding challenge and then move on to logical and reasoning tests.

Setting Up

First, make sure you have the necessary Python packages installed. You can install them using the following command:

pip install accelerate sentencepiece transformers

Coding Challenge

Let's test the Gemma 2 model on a series of Python coding challenges.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-9b-it")
model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-9b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

def generate(input_text, system_prompt="", max_length=512):
    if system_prompt != "":
        system_prompt = system_prompt
    else:
        system_prompt = "You are a friendly and helpful assistant"
        
    # Create the prompt
    messages = [
        {"role": "user", "content": system_prompt + '\n\n' + input_text},
    ]

    # Convert the prompt to tokens
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer.encode(prompt, add_special_tokens=True, return_tensors="pt").to("cuda")

    # Generate the response
    outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=max_length, do_sample=True, temperature=0.1, top_k=50)
    text = tokenizer.decode(outputs[0], skip_special_tokens=True, clean_up_tokenization_spaces=True)
    print(text)

# Example usage
generate('Give me a meal plan for me today based on your preference', system_prompt="You are a Helpful Assistant", max_length=512)

Logical and Reasoning Test

Next, we test the model's reasoning capabilities using the GSM 8K dataset.

questions = [
    "If I have 8 apples and give away 3, how many do I have left?",
    "What is the capital of France?",
    "How many continents are there on Earth?"
]

for question in questions:
    generate(question, system_prompt="You are a Helpful Assistant", max_length=512)

Integrating Gemma 2 with Applications

You can integrate Gemma 2 into your applications using various tools such as Ollama, LangChain, and Llama Index. Here's how you can do it.

Using Ollama

First, install the necessary packages:

pip install ollama langchain_community chainlit llama-index

Then, pull the Gemma 2 model:

ollama pull gemma2

Simple Usage with Ollama

import ollama

stream = ollama.chat(
    model='gemma2',
    messages=[{'role': 'user', 'content': 'Why is the sky blue?'}],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

Using LangChain

LangChain provides a simple interface to interact with LLMs.

from langchain_community.llms import Ollama

llm = Ollama(model="gemma2")
print(llm.invoke("Why is the sky blue?"))

Using Llama Index

Llama Index can be used to manage and query large datasets with LLMs.

from llama_index.llms.ollama import Ollama

llm = Ollama(model="gemma2")
llm.complete("Why is the sky blue?")

Creating a User Interface with Chainlit

Chainlit allows you to create a user interface for interacting with the model.

Example Code

from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from langchain.schema.runnable import Runnable
from langchain.schema.runnable.config import RunnableConfig

import chainlit as cl

@cl.on_chat_start
async def on_chat_start():
    await cl.Message(content="Hello there, I am Gemma. How can I help you ?").send()
    model = Ollama(model="gemma2")
    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", "You're a very knowledgeable assistant who provides accurate and eloquent answers."),
            ("human", "{question}"),
        ]
    )
    runnable = prompt | model | StrOutputParser()
    cl.user_session.set("runnable", runnable)

@cl.on_message
async def on_message(message: cl.Message):
    runnable = cl.user_session.get("runnable")  # type: Runnable

    msg = cl.Message(content="")

    async for chunk in runnable.astream(
        {"question": message.content},
        config=RunnableConfig(callbacks=[cl.LangchainCallbackHandler()]),
    ):
        await msg.stream_token(chunk)

    await msg.send()

# Run the Chainlit server
if __name__ == "__main__":
    cl.run()

Run Chainlit

To run the Chainlit server:

chainlit run ui.py

Test the User Interface

Open the provided URL in your browser and interact with the model by asking questions.

Conclusion

Gemma 2 is a powerful addition to the landscape of large language models. Its impressive performance on coding challenges, logical reasoning tasks, and its ease of integration with various tools make it a valuable asset for developers. Whether you are looking to build sophisticated AI applications or simply explore the capabilities of state-of-the-art models, Gemma 2 offers a versatile and robust solution.

We hope you found this tutorial helpful. Happy coding! 🚀

If you have any questions or need further assistance, feel free to reach out. Don't forget to like, share, and subscribe for more content!