📘 **TELUS Agriculture & Consumer Goods** 如何通过 **Haystack Agents** 转变促销交易
由 deepset 维护

集成:OpenSearch

用于存储和检索 OpenSearch 文档的文档存储

作者
Thomas Stadelmann
Julian Risch
deepset

目录

概述

PyPI - Version PyPI - Python Version test


安装

使用 pip 安装 OpenSearch

pip install opensearch-haystack

使用

安装完成后,初始化您的 OpenSearch 数据库,以便与 Haystack 一起使用

from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore

document_store = OpenSearchDocumentStore()

将文档写入 OpenSearchDocumentStore

要将文档写入 OpenSearchDocumentStore,请创建一个索引管道。

from haystack.components.file_converters import TextFileToDocument
from haystack.components.writers import DocumentWriter

indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"paths": file_paths}})

混合检索器

此集成还提供了一个混合检索器。OpenSearchHybridRetriever 结合了向量搜索和关键字搜索的功能。它使用 OpenSearch 文档存储,根据语义和关键字查询来检索文档。

您可以将 OpenSearchHybridRetrieverOpenSearchDocumentStore 一起使用来执行混合检索。

有关如何索引文档和使用混合检索器的示例,请参见下面的示例

from haystack import Document
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore

# Initialize the document store
doc_store = OpenSearchDocumentStore(
    hosts=["https://:9200"],
    index="document_store",
    embedding_dim=384,
)

# Create some sample documents
docs = [
    Document(content="Machine learning is a subset of artificial intelligence."),
    Document(content="Deep learning is a subset of machine learning."),
    Document(content="Natural language processing is a field of AI."),
    Document(content="Reinforcement learning is a type of machine learning."),
    Document(content="Supervised learning is a type of machine learning."),
]

# Embed the documents and add them to the document store
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
docs = doc_embedder.run(docs)

# Write the documents to the OpenSearch document store
doc_store.write_documents(docs['documents'])

# Initialize some haystack text embedder, in this case the SentenceTransformersTextEmbedder
embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

# Initialize the hybrid retriever
retriever = OpenSearchHybridRetriever(
    document_store=doc_store,
    embedder=embedder,
    top_k_bm25=3,
    top_k_embedding=3,
    join_mode="reciprocal_rank_fusion"
)

# Run the retriever
results = retriever.run(query="What is reinforcement learning?", filters_bm25=None, filters_embedding=None)

>> results['documents']
{'documents': [Document(id=..., content: 'Reinforcement learning is a type of machine learning.', score: 1.0),
  Document(id=..., content: 'Supervised learning is a type of machine learning.', score: 0.9760624679979518),
  Document(id=..., content: 'Deep learning is a subset of machine learning.', score: 0.4919354838709677),
  Document(id=..., content: 'Machine learning is a subset of artificial intelligence.', score: 0.4841269841269841)]}

您可以在 文档 中了解有关 OpenSearchHybridRetriever 的更多信息。

许可证

opensearch-haystackApache-2.0 许可协议下分发。