由 deepset 维护
集成:OpenSearch
用于存储和检索 OpenSearch 文档的文档存储
目录
概述
安装
使用 pip 安装 OpenSearch
pip install opensearch-haystack
使用
安装完成后,初始化您的 OpenSearch 数据库,以便与 Haystack 一起使用
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
document_store = OpenSearchDocumentStore()
将文档写入 OpenSearchDocumentStore
要将文档写入 OpenSearchDocumentStore,请创建一个索引管道。
from haystack.components.file_converters import TextFileToDocument
from haystack.components.writers import DocumentWriter
indexing = Pipeline()
indexing.add_component("converter", TextFileToDocument())
indexing.add_component("writer", DocumentWriter(document_store))
indexing.connect("converter", "writer")
indexing.run({"converter": {"paths": file_paths}})
混合检索器
此集成还提供了一个混合检索器。OpenSearchHybridRetriever 结合了向量搜索和关键字搜索的功能。它使用 OpenSearch 文档存储,根据语义和关键字查询来检索文档。
您可以将 OpenSearchHybridRetriever 与 OpenSearchDocumentStore 一起使用来执行混合检索。
有关如何索引文档和使用混合检索器的示例,请参见下面的示例
from haystack import Document
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack_integrations.components.retrievers.opensearch import OpenSearchHybridRetriever
from haystack_integrations.document_stores.opensearch import OpenSearchDocumentStore
# Initialize the document store
doc_store = OpenSearchDocumentStore(
hosts=["https://:9200"],
index="document_store",
embedding_dim=384,
)
# Create some sample documents
docs = [
Document(content="Machine learning is a subset of artificial intelligence."),
Document(content="Deep learning is a subset of machine learning."),
Document(content="Natural language processing is a field of AI."),
Document(content="Reinforcement learning is a type of machine learning."),
Document(content="Supervised learning is a type of machine learning."),
]
# Embed the documents and add them to the document store
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
docs = doc_embedder.run(docs)
# Write the documents to the OpenSearch document store
doc_store.write_documents(docs['documents'])
# Initialize some haystack text embedder, in this case the SentenceTransformersTextEmbedder
embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
# Initialize the hybrid retriever
retriever = OpenSearchHybridRetriever(
document_store=doc_store,
embedder=embedder,
top_k_bm25=3,
top_k_embedding=3,
join_mode="reciprocal_rank_fusion"
)
# Run the retriever
results = retriever.run(query="What is reinforcement learning?", filters_bm25=None, filters_embedding=None)
>> results['documents']
{'documents': [Document(id=..., content: 'Reinforcement learning is a type of machine learning.', score: 1.0),
Document(id=..., content: 'Supervised learning is a type of machine learning.', score: 0.9760624679979518),
Document(id=..., content: 'Deep learning is a subset of machine learning.', score: 0.4919354838709677),
Document(id=..., content: 'Machine learning is a subset of artificial intelligence.', score: 0.4841269841269841)]}
您可以在 文档 中了解有关 OpenSearchHybridRetriever 的更多信息。
许可证
opensearch-haystack 在 Apache-2.0 许可协议下分发。
