使用 Llama 3.2 3B 进行 Agentic RAG

在 Colab 中打开下载

_{最后更新：2024 年 9 月 26 日}

Meta 在其 Llama 3.2 系列中发布了两款小巧但强大的语言模型。

在此 notebook 中，我们将使用 3B 模型构建一个智能检索增强生成应用程序。

🎯 我们的目标是创建一个系统，该系统可以利用专注于古代世界七大奇迹的知识库来回答问题。如果检索到的文档不包含答案，应用程序将回退到网络搜索以获取更多上下文。

技术栈

🏗️ Haystack：一个开源的 LLM 编排框架，可简化您的 LLM 应用程序的开发。
🦙 Llama-3.2-3B-Instruct：一款小巧且优秀的语言模型。
🦆🌐 DuckDuckGo API 网络搜索，用于在网络上搜索结果。

设置

! pip install haystack-ai duckduckgo-api-haystack transformers sentence-transformers datasets

创建我们的知识库

在本节中，我们将下载一个关于古代世界七大奇迹的数据集，为每个文档添加语义向量，并将文档存储在内存数据库中。

为了更好地理解这个过程，您可以浏览 Haystack 入门教程。

from datasets import load_dataset
from haystack import Document

from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.embedders import SentenceTransformersDocumentEmbedder

document_store = InMemoryDocumentStore()

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()

docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])

/usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(



Batches:   0%|          | 0/5 [00:00<?, ?it/s]





151

加载并试用 Llama 3.2

我们将使用 Hugging Face Transformers 在 Colab 上加载模型。

Haystack 中还有许多其他使用开源模型的选项，例如用于本地推理的 Ollama 或使用 Groq 进行服务。

( 📕 选择合适的生成器)。

授权

您需要一个 Hugging Face 账户
您需要在此处接受 Meta 的条款：https://hugging-face.cn/meta-llama/Llama-3.2-3B-Instruct，然后等待授权。

import getpass, os

os.environ["HF_TOKEN"] = getpass.getpass("Your Hugging Face token")

Your Hugging Face token··········

import torch
from haystack.components.generators import HuggingFaceLocalGenerator

generator = HuggingFaceLocalGenerator(
    model="meta-llama/Llama-3.2-3B-Instruct",
    huggingface_pipeline_kwargs={"device_map":"auto",
                                 "torch_dtype":torch.bfloat16},
    generation_kwargs={"max_new_tokens": 256})

generator.warm_up()

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

prompt = """<|begin_of_text|><|start_header_id|>user<|end_header_id|>
  What is the capital of France?<|eot_id|>
  <|start_header_id|>assistant<|end_header_id|>"""

generator.run(prompt)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.





{'replies': ['\n\nThe capital of France is Paris.']}

构建 🕵🏻 智能 RAG 管道

以下是思路 👇

使用查询对我们的知识库执行向量搜索。
将排名前 5 的文档传递给 Llama，并将其注入到一个特定的 prompt 中。
在 prompt 中，指示模型在无法从文档中推断出答案时回复“no_answer”；否则，提供答案。
如果返回“no_answer”，则运行网络搜索并将结果注入新的 prompt 中。
让 Llama 根据网络搜索结果生成最终答案。

有关类似用例的详细说明，请参阅本教程：使用条件路由构建回退到网络搜索。

检索部分

让我们初始化用于初始检索阶段的组件。

from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
retriever = InMemoryEmbeddingRetriever(document_store, top_k=5)

Prompt 模板

让我们定义第一个 prompt 模板，它指示模型

根据检索到的文档回答查询，如果可能的话
否则回复 'no_answer'

from haystack.components.builders import PromptBuilder

prompt_template = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Answer the following query given the documents.
If the answer is not contained within the documents reply with 'no_answer'.
If the answer is contained within the documents, start the answer with "FROM THE KNOWLEDGE BASE: ".

Documents:
{% for document in documents %}
  {{document.content}}
{% endfor %}

Query: {{query}}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>
"""

prompt_builder = PromptBuilder(template=prompt_template)

条件路由器

这是将根据语言模型的回复执行数据路由的组件。

from haystack.components.routers import ConditionalRouter

routes = [
    {
        "condition": "{{'no_answer' in replies[0]}}",
        "output": "{{query}}",
        "output_name": "go_to_websearch",
        "output_type": str,
    },
    {
        "condition": "{{'no_answer' not in replies[0]}}",
        "output": "{{replies[0]}}",
        "output_name": "answer",
        "output_type": str,
    },
]

router = ConditionalRouter(routes)

router.run(replies=["this is the answer!"])

{'answer': 'this is the answer!'}

router.run(replies=["no_answer"], query="my query")

{'go_to_websearch': 'my query'}

网络搜索

from duckduckgo_api_haystack import DuckduckgoApiWebSearch

websearch = DuckduckgoApiWebSearch(top_k=5)

# Perform a search
results = websearch.run(query="Where is Tanzania?")

# Access the search results
documents = results["documents"]
links = results["links"]

print("Found documents:")
for doc in documents:
    print(f"Content: {doc.content}")

print("\nSearch Links:")
for link in links:
    print(link)

Found documents:
Content: Tanzania is a country in East Africa within the African Great Lakes region. It is bordered by Uganda, Kenya, the Indian Ocean, Mozambique, Malawi, Zambia, Rwanda, Burundi, and the Democratic Republic of the Congo.
Content: Tanzania is a country in East Africa's Great Lakes Region, located just below the Equator. It is bordered by eight countries and the Indian Ocean, and has diverse geographical features such as mountains, lakes, rivers, and islands.
Content: Tanzania is an East African country formed by the union of Tanganyika and Zanzibar in 1964. It has diverse landscapes, including Mount Kilimanjaro, Lake Victoria, and the Great Rift Valley, and a rich cultural heritage.
Content: Tanzania is the largest and most populous country in East Africa, with a total area of 947,300 sq km and a coastline of 1,424 km. It has diverse natural features, including mountains, lakes, rivers, and islands, and borders eight other countries.
Content: Tanzania is a country in Eastern Africa, bordering the Indian Ocean, between Kenya and Mozambique. It has many lakes, national parks, and mountains, including Mount Kilimanjaro, the highest point in Africa.

Search Links:
https://en.wikipedia.org/wiki/Tanzania
https://www.worldatlas.com/maps/tanzania
https://www.britannica.com/place/Tanzania
https://www.cia.gov/the-world-factbook/countries/tanzania/
https://en.wikipedia.org/wiki/Geography_of_Tanzania

网络搜索后的 Prompt 模板

prompt_template_after_websearch = """
<|begin_of_text|><|start_header_id|>user<|end_header_id|>

Answer the following query given the documents retrieved from the web.
Start the answer with "FROM THE WEB: ".

Documents:
{% for document in documents %}
  {{document.content}}
{% endfor %}

Query: {{query}}<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>
"""

prompt_builder_after_websearch = PromptBuilder(template=prompt_template_after_websearch)

组装管道

现在我们有了所有组件，可以组装完整的管道了。

为了处理不同的 prompt 源，我们将使用 BranchJoiner。这允许我们将多个输出套接字（带有 prompt）连接到我们的语言模型。在本例中，prompt 将来自初始的 prompt_builder 或来自 prompt_builder_after_websearch。

from haystack.components.joiners import BranchJoiner
prompt_joiner  = BranchJoiner(str)

from haystack import Pipeline

pipe = Pipeline()
pipe.add_component("text_embedder", text_embedder)
pipe.add_component("retriever", retriever)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("prompt_joiner", prompt_joiner)
pipe.add_component("llm", generator)
pipe.add_component("router", router)
pipe.add_component("websearch", websearch)
pipe.add_component("prompt_builder_after_websearch", prompt_builder_after_websearch)

pipe.connect("text_embedder", "retriever")
pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "prompt_joiner")
pipe.connect("prompt_joiner", "llm")
pipe.connect("llm.replies", "router.replies")
pipe.connect("router.go_to_websearch", "websearch.query")
pipe.connect("router.go_to_websearch", "prompt_builder_after_websearch.query")
pipe.connect("websearch.documents", "prompt_builder_after_websearch.documents")
pipe.connect("prompt_builder_after_websearch", "prompt_joiner")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7cd028903ca0>
🚅 Components
  - text_embedder: SentenceTransformersTextEmbedder
  - retriever: InMemoryEmbeddingRetriever
  - prompt_builder: PromptBuilder
  - prompt_joiner: BranchJoiner
  - llm: HuggingFaceLocalGenerator
  - router: ConditionalRouter
  - websearch: DuckduckgoApiWebSearch
  - prompt_builder_after_websearch: PromptBuilder
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (List[float])
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> prompt_joiner.value (str)
  - prompt_joiner.value -> llm.prompt (str)
  - llm.replies -> router.replies (List[str])
  - router.go_to_websearch -> websearch.query (str)
  - router.go_to_websearch -> prompt_builder_after_websearch.query (str)
  - websearch.documents -> prompt_builder_after_websearch.documents (List[Document])
  - prompt_builder_after_websearch.prompt -> prompt_joiner.value (str)

pipe.show()

智能 RAG 实战！🔎

def get_answer(query):
  result = pipe.run({"text_embedder": {"text": query}, "prompt_builder": {"query": query}, "router": {"query": query}})
  print(result["router"]["answer"])

query = "Why did people build Great Pyramid of Giza?"

get_answer(query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




FROM THE KNOWLEDGE BASE: The Great Pyramid of Giza was built as the tomb of Fourth Dynasty pharaoh Khufu, and its construction is believed to have taken around 27 years to complete.

query = "Where is Munich?"

get_answer(query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




FROM THE WEB: Munich is located in the south of Germany, and is the capital of the federal state of Bavaria. It is connected to other major cities in Germany and Austria, and has direct access to Italy.

query = "What does Rhodes Statue look like?"

get_answer(query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




FROM THE KNOWLEDGE BASE: The head of the Colossus of Rhodes was of a standard rendering at the time, with curly hair and evenly spaced spikes of bronze or silver flame radiating from it, similar to the images found on contemporary Rhodian coins.

query = "Was the the Tower of Pisa part of the 7 wonders of the ancient world?"

get_answer(query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




FROM THE WEB: No, the Leaning Tower of Pisa was one of the Seven Wonders of the Medieval World, but not of the ancient world.

query = "Who was general Muawiyah?"

get_answer(query)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




FROM THE KNOWLEDGE BASE: Muawiyah I was a Muslim general who conquered Rhodes in 653.

（笔记本由 Stefano Fiorucci 编写）