📘 **TELUS Agriculture & Consumer Goods** 如何通过 **Haystack Agents** 转变促销交易
由 deepset 维护

集成:Pinecone

将 Pinecone 数据库与 Haystack 一起使用

作者
deepset
Ashwin Mathur
Varun Mathur

目录

概述

Pinecone 是一个快速且可扩展的向量数据库,您可以在 Haystack 流水线中使用它,并搭配 PineconeDocumentStore

有关 PineconeDocumentStore 所有可用方法和设置的详细概述,请访问 Haystack API 参考

安装

pip install pinecone-haystack

使用

要将 Pinecone 用作 Haystack LLM 流水线的数据存储,您必须拥有 Pinecone 账户和 API 密钥。一旦拥有这些,您就可以为 Haystack 初始化 PineconeDocumentStore

from haystack_integrations.document_stores.pinecone import PineconeDocumentStore

# Make sure you have the PINECONE_API_KEY environment variable set
document_store = PineconeDocumentStore(
  index="YOUR_INDEX_NAME",
  metric="cosine",
  dimension=768,
  spec={"serverless": {"region": "us-east-1", "cloud": "aws"}},
  )

将文档写入 PineconeDocumentStore

要将文档写入您的 PineconeDocumentStore,请创建一个索引流水线,或使用 write_documents() 函数。在此步骤中,您可以利用可用的 转换器预处理器,以及其他可以帮助您从其他资源获取数据的 集成。下面是一个索引您的 Markdown 文件到 Pinecone 数据库的示例索引流水线。

索引管道

from haystack import Pipeline
from haystack.components.converters import MarkdownToDocument
from haystack.components.writers import DocumentWriter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.preprocessors import DocumentSplitter
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore

# Make sure you have the PINECONE_API_KEY environment variable set
document_store = PineconeDocumentStore(
  index="YOUR_INDEX_NAME",
  metric="cosine",
  dimension=768,
  spec={"serverless": {"region": "us-east-1", "cloud": "aws"}},
  )

indexing = Pipeline()
indexing.add_component("converter", MarkdownToDocument())
indexing.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=2))
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")

indexing.run({"converter": {"sources": ["filename.md"]}})

在 RAG 流水线中使用 Pinecone

一旦您的 PineconeDocumentStore 中有了文档,就可以在任何 Haystack 流水线中使用它们。然后,您可以使用 PineconeEmbeddingRetriever 从您的 PineconeDocumentStore 中检索数据。例如,下面是一个使用自定义提示设计的流水线,用于回答检索到的文档的问题。

from haystack.utils import Secret
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_integrations.document_stores.pinecone import PineconeDocumentStore
from haystack_integrations.components.retrievers.pinecone import PineconeEmbeddingRetriever

# Make sure you have the PINECONE_API_KEY environment variable set
document_store = PineconeDocumentStore(
  index="YOUR_INDEX_NAME",
  metric="cosine",
  dimension=768,
  spec={"serverless": {"region": "us-east-1", "cloud": "aws"}},
  )
              
prompt_template = """Answer the following query based on the provided context. If the context does
                     not include an answer, reply with 'I don't know'.\n
                     Query: {{query}}
                     Documents:
                     {% for doc in documents %}
                        {{ doc.content }}
                     {% endfor %}
                     Answer: 
                  """

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component("retriever", PineconeEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
query_pipeline.add_component("generator", OpenAIGenerator(api_key=Secret.from_token("YOUR_OPENAI_API_KEY"), model="gpt-4"))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "generator")

query = "What is Pinecone?"
results = query_pipeline.run(
    {
        "text_embedder": {"text": query},
        "prompt_builder": {"query": query},
    }
)