📘 **TELUS Agriculture & Consumer Goods** 如何通过 **Haystack Agents** 转变促销交易

Chroma 索引和 RAG 示例


安装依赖项

# Install the Chroma integration, Haystack will come as a dependency
!pip install -U chroma-haystack "huggingface_hub>=0.22.0"

索引管道:预处理、分割和索引文档

在本节中,我们将通过构建 Haystack 索引管道将文档索引到 Chroma DB 集合中。在这里,我们将 VIM 用户手册中的文档索引到 Haystack 的 ChromaDocumentStore 中。

ChromaDocumentStore 的示例文件夹中,我们有这些页面的 .txt 文件,因此我们使用 TextFileToDocumentDocumentWriter 组件来构建此索引管道。

# Fetch data files from the Github repo
!curl -sL https://github.com/deepset-ai/haystack-core-integrations/tarball/main -o main.tar
!mkdir main
!tar xf main.tar -C main --strip-components 1
!mv main/integrations/chroma/example/data .
mkdir: main: File exists
mv: rename main/integrations/chroma/example/data to ./data: Directory not empty
import os
from pathlib import Path

from haystack import Pipeline
from haystack.components.converters import TextFileToDocument
from haystack.components.writers import DocumentWriter

from haystack_integrations.document_stores.chroma import ChromaDocumentStore

file_paths = ["data" / Path(name) for name in os.listdir("data")]

# Chroma is used in-memory so we use the same instances in the two pipelines below
document_store = ChromaDocumentStore()

indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"sources": file_paths}})
{'writer': {'documents_written': 36}}

查询管道:构建检索增强生成 (RAG) 管道

一旦文档进入 ChromaDocumentStore,我们就可以使用配套的 Chroma 检索器来构建查询管道。下面的查询管道是一个简单的检索增强生成 (RAG) 管道,它使用 Chroma 的 查询 API

您可以通过使用其中一个 Haystack Embedders 配合 ChromaEmbeddingRetriever 来更改此处的索引管道和查询管道以进行嵌入搜索。

在此示例中,我们使用了

  • 带有 gpt-4o-miniOpenAIChatGenerator。(您需要 OpenAI API 密钥才能使用此模型)。您可以将其替换为任何其他 Generators
  • 包含提示模板的 ChatPromptBuilder。您可以将其调整为您选择的提示。
  • ChromaQueryTextRetriver 期望一个查询列表,并从您的 Chroma 集合中检索 top_k 个最相关的文档。
import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")
Enter OpenAI API key: ········
from haystack_integrations.components.retrievers.chroma import ChromaQueryTextRetriever
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses.chat_message import ChatMessage

prompt = """
Answer the query based on the provided context.
If the context does not contain the answer, say 'Answer not found'.
Context:
{% for doc in documents %}
  {{ doc.content }}
{% endfor %}
query: {{query}}
Answer:
"""

template = [ChatMessage.from_user(prompt)]
prompt_builder = ChatPromptBuilder(template=template)

llm = OpenAIChatGenerator()
retriever = ChromaQueryTextRetriever(document_store)

querying = Pipeline()
querying.add_component("retriever", retriever)
querying.add_component("prompt_builder", prompt_builder)
querying.add_component("llm", llm)

querying.connect("retriever.documents", "prompt_builder.documents")
querying.connect("prompt_builder", "llm")
ChatPromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.





<haystack.core.pipeline.pipeline.Pipeline object at 0x308f29880>
🚅 Components
  - retriever: ChromaQueryTextRetriever
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
🛤️ Connections
  - retriever.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.messages (List[ChatMessage])
query = "Should I write documentation for my plugin?"
results = querying.run({"retriever": {"query": query, "top_k": 3}, "prompt_builder": {"query": query}})
print(results["llm"]["replies"][0].text)
Yes, it is a good idea to write documentation for your plugin. This helps users understand how to use it, especially when its behavior can be changed by the user.