Intel®Gaudi®2AI Accelerator Text Generation Pipeline

Siddhant Jagtap avatar

Generation AI (Genai) revolution is in earnest and innovative, so text generation using open source transformers models like the Llama 2 has become a town talk. AI enthusiasts and developers are trying to leverage the generation capabilities of such models for their own use cases and applications. This article shows how easy it is to generate text on a Llama 2 family of models (7b, 13b, 70b) using Optimum Habana and custom pipeline classes. You can run the model with just a few lines of code.

This custom pipeline class is designed to provide excellent flexibility and ease of use. Additionally, it provides a high level of abstraction and performs end-to-end text generation with pre- and post-processing. There are several ways to use pipelines. You can run the run_pipeline.py script from the best Habana repository, add pipeline classes to your own Python script, and initialize the Langchain class.

Prerequisites

The Llama 2 model is part of the Gate Repo, so you will need to request access if you haven’t done it yet. First, you must visit the META website and accept the terms and conditions. After you have been granted access from Meta (which may take 1-2 days), you will need to hug your face and request access using the same email address provided in Meta format.

Once access is granted, run the following command to log in to your hugged face account (you will need an access token that can be obtained from the user profile page):

Huggingface-Cli Login

You will also need to install the latest version of Optimum Habana and clone the repository to access the pipeline scripts. Here is the command to do so:

PIP Install Optimum-habana == 1.10.4 git clone -b v1.10-release https://github.com/huggingface/optimum-habana.git

If you plan to perform distributed inference, install DeepSpeed according to your Synapseai version. In this case, you are using Synapseai 1.14.0.

pipinstallation git+https://github.com/habanai/deepspeed.git@1.14.0

You are now set up to run text generation in your pipeline!

Use a pipeline

First, go to the next directory in the best Havana checkout where your pipeline script is located and follow the instructions in README to update your PythonPath.

CD Optimum-habana/examples/text-generation pip install -r compoestion.txt
CD Text Generation Pipeline

If you want to generate a set of text from the selected prompt, here is a sample command:

python run_pipeline.py – model_name_or_path meta-llama/llama-2-7b-hf-use_hpu_graphs-use_kv_cache – max_new_tokens 100 – do_sample-prompt “This is my prompt.”

You can also pass multiple prompts as input and change the generated temperature and TOP_P value as follows:

python run_pipeline.py – model_name_or_path meta-lama/llama-2-13b-hf – use_hpu_graphs – use_kv_cache – max_new_tokens 100 —do_sample – temperature 0.5 -top_p 0.95 -prompttpttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt “Hello World” “how are you?”

Below is a sample command to launch a pipeline with Deepspeed to generate text on a large model such as the Llama-2-70b.

python ../../gaudi_spawn.py – use_deepspeed -world_size 8 run_pipeline.py – model_name_or_path meta-lama/llama-2-70b-hf – max_new_tokens 100 – -bf16-use_hpu_grap 0.95 -prompt “Hello World” “how are you?” “This is my prompt.” “Once upon a time”

Using in Python scripts

You can use pipeline classes in your own scripts, as shown in the example below: Run the following sample script from Optim-Habana/Examples/Text-Generation/Text-Generation-Pipeline:

Import Argparse
Import Logging

from Pipeline Import gauditextgenerationpipeline
from run_generation Import setup_parser logging.basicconfig (
format=“%(asctime)s -%(levelname)s -%(name)s -%(message)s”datefmt =“%m/%d/%y%h:%m:%s”level = logging.info, ) logger = logging.getlogger(__ name__) parser = argparse.argumentparser() args = setup_parser(parser) args.num_return_sequences = 1
args.model_name_or_path = “Metalama/llama-2-7b-hf”
args.max_new_tokens = 100
args.use_hpu_graphs = truth
args.use_kv_cache = truth
args.do_sample = truth

pipe = gauditextgenerationpipeline(args,logger)prompts =(“He’s working on it.”, “Once upon a time”, “far”))

for prompt in prompt:
printing(f “Prompt: {prompt}“)Output = Pipe (Prompt)
printing(f “Generated text: {repr(output)}“))

You must run the above script using python .py – model_name_or_path a_model_name. However, you can change the model name programmatically, as shown in the Python snippet.

This indicates that the pipeline class works with string input and performs data preprocessing and postprocessing.

Lang Chain Compatibility

The text generation pipeline can be provided as input to the Langchain class via the use_with_langchain constructor argument. You can install Langchain like this:

pip install langchain == 0.0.191

Below is a sample script that shows how to use pipeline classes with Langchain:

Import Argparse
Import Logging

from langchain.llms Import Hagging facepipeline
from langchain.prompts Import prosptTemplate
from langchain.chains Import llmchain

from Pipeline Import gauditextgenerationpipeline
from run_generation Import setup_parser logging.basicconfig (
format=“%(asctime)s -%(levelname)s -%(name)s -%(message)s”datefmt =“%m/%d/%y%h:%m:%s”level = logging.info, ) logger = logging.getlogger(__ name__) parser = argparse.argumentparser() args = setup_parser(parser) args.num_return_sequences = 1
args.model_name_or_path = “Metalama/llama-2-13b-chat-hf”
args.max_input_tokens = 2048
args.max_new_tokens = 1000
args.use_hpu_graphs = truth
args.use_kv_cache = truth
args.do_sample = truth
args.temperature = 0.2
args.top_p = 0.95

pipe = gauditextgenerationpipeline(args,logger,use_with_langchain =truth)llm = huggingfacepipeline(pipeline=pipe)template= “” “Answer the question at the end using the following context. If you don’t know the answer\
Don’t try to make up for the answer by saying you don’t know.

Context: Large-scale Language Models (LLMS) is the latest model used in NLP.
It’s incredible due to better performance than the smaller model
Helps developers building NLP-enabled applications. These models
It can be accessed via Openai through Face’s “Transformers” library
Use the “Openai” library and use the “Cohere” library through Cohere.

Question: {Question}
answer: “””

PRONT = PROMPTTEMPLATE(input_variables =(“question”),Template =Template)llm_chain =llmChain(prompt = prompt, llm = llm)question = “Which library and model providers offer LLM?”
Response = LLM_CHAIN (prompt.format(Question = Question))
printing(f “Question 1: {question}“))
printing(f “Response 1: {response(‘Sentence’)}“)Question= “What context was provided?”
Response = LLM_CHAIN (prompt.format(Question = Question))
printing(f “\nquestion 2: {question}“))
printing(f “Response 2: {response(‘Sentence’)}“))

The pipeline class has been validated with Langchain version 0.0.191 and may not work with other versions of the package.

Conclusion

I presented a custom text generation pipeline that accepts single or multiple prompts as input with Intel®Gaudi®2AI accelerator. This pipeline offers excellent flexibility in terms of parameters that affect model size and text generation quality. Plus, it’s very easy to use, can also be connected to scripts, and is compatible with Langchain.

Use of the pre-protected model is subject to compliance with third-party licenses, including the LLAMA 2 Community License Agreement (Llamav2). Please read the instructions at this link https://ai.meta.com/llama/license/ for guidance regarding the intended use of the LLAMA2 model as intended users and what is considered additional terms misuse and out-of-scope use. You are sole responsibility and responsibility for following and abide by third party licenses. Havana Lab is not responsible for your use or compliance with your third party licenses. To be able to run a gate model like this llama-2-70b-hf, you need:

Accept the terms of use of the model on HF HUB model card Agree to set up read token login for your account using the HF CLI

versatileai

See Full Bio

What's Hot

Intel®Gaudi®2AI Accelerator Text Generation Pipeline

CAC has announced AI-powered business registration portal – thisdaylive

Research shows that AI can reduce global carbon emissions

Research shows that AI can reduce global carbon emissions

Allow communities to use Argilla to embrace face spaces to collectively build better datasets

How much more jointly can a multimodal model be inferred than text-and-images in a rich scene?

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Presight plans to expand its AI business internationally

PlanetScale Vectors GA: MySQL and AI Database Game Changer

Most Popular

Impact International | EU AI ACT Enforcement: Business Transparency and Human Rights Impact in 2025

Presight plans to expand its AI business internationally

PlanetScale Vectors GA: MySQL and AI Database Game Changer

Don't Miss

Intel®Gaudi®2AI Accelerator Text Generation Pipeline

CAC has announced AI-powered business registration portal – thisdaylive

Research shows that AI can reduce global carbon emissions

Subscribe to Updates

What's Hot

Intel®Gaudi®2AI Accelerator Text Generation Pipeline

Prerequisites

Use a pipeline

Using in Python scripts

Lang Chain Compatibility

Conclusion

Related Posts