由 deepset 维护

集成：MongoDB

将 MongoDB Atlas 数据库与 Haystack 结合使用

作者

deepset

GitHub 仓库 PyPI 包

概述

MongoDB 是一款旨在简化应用程序开发和扩展的文档数据库。MongoDB Atlas 是由 MongoDB 背后的团队构建的多云数据库服务。MongoDB Atlas 简化了数据库的部署和管理，同时提供了构建云提供商选择的弹性、高性能全球应用程序所需的灵活性。

您可以通过 MongoDBAtlasFullTextRetriever 和 MongoDBAtlasEmbeddingRetriever 使用 MongoDB Atlas 的全文和语义搜索功能。有关 MongoDBAtlasDocumentStore 所有设置的详细概述，请访问 Haystack 文档。

安装

pip install mongodb-atlas-haystack

使用

要使用 MongoDBAtlasDocumentStore，您必须有一个正在运行的 MongoDB Atlas 数据库。有关详细信息，请参阅 Atlas 入门指南。

设置好数据库后，将环境变量 MONGO_CONNECTION_STRING 设置为 MongoDB Atlas 数据库的连接字符串。格式应类似于："mongodb+srv://{mongo_atlas_username}:{mongo_atlas_password}@{mongo_atlas_host}/?{mongo_atlas_params_string}"

然后，您可以使用所需的配置初始化一个用于 Haystack 的 MongoDBAtlasDocumentStore。

from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore

document_store = MongoDBAtlasDocumentStore(
    database_name="haystack_test",
    collection_name="test_collection",
    vector_search_index="test_vector_search_index",
)

示例管道

以下是一个基于 MongoDB Atlas 构建的端到端 RAG 应用程序的示例代码：一个用于嵌入文档的索引管道，以及一个可用于问答的生成管道。

from haystack import Pipeline, Document
from haystack.document_stores.types import DuplicatePolicy
from haystack.components.writers import DocumentWriter
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore
from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever

# Create some example documents
documents = [
    Document(content="My name is Jean and I live in Paris."),
    Document(content="My name is Mark and I live in Berlin."),
    Document(content="My name is Giorgio and I live in Rome."),
]

document_store = MongoDBAtlasDocumentStore(
    database_name="haystack_test",
    collection_name="test_collection",
    vector_search_index="test_vector_search_index",
)

# Define some more components
doc_writer = DocumentWriter(document_store=document_store, policy=DuplicatePolicy.SKIP)
doc_embedder = SentenceTransformersDocumentEmbedder(model="intfloat/e5-base-v2")
query_embedder = SentenceTransformersTextEmbedder(model="intfloat/e5-base-v2")

# Pipeline that ingests document for retrieval
indexing_pipe = Pipeline()
indexing_pipe.add_component(instance=doc_embedder, name="doc_embedder")
indexing_pipe.add_component(instance=doc_writer, name="doc_writer")

indexing_pipe.connect("doc_embedder.documents", "doc_writer.documents")
indexing_pipe.run({"doc_embedder": {"documents": documents}})

# Build a RAG pipeline with a Retriever to get documents relevant to 
# the query, a PromptBuilder to create a custom prompt and the OpenAIGenerator (LLM)
prompt_template = """
Given these documents, answer the question.\nDocuments:
{% for doc in documents %}
    {{ doc.content }}
{% endfor %}

\nQuestion: {{question}}
\nAnswer:
"""
rag_pipeline = Pipeline()
rag_pipeline.add_component(instance=query_embedder, name="query_embedder")
rag_pipeline.add_component(instance=MongoDBAtlasEmbeddingRetriever(document_store=document_store), name="retriever")
rag_pipeline.add_component(instance=PromptBuilder(template=prompt_template), name="prompt_builder")
rag_pipeline.add_component(instance=OpenAIGenerator(), name="llm")
rag_pipeline.connect("query_embedder", "retriever.query_embedding")
rag_pipeline.connect("embedding_retriever", "prompt_builder.documents")
rag_pipeline.connect("prompt_builder", "llm")

# Ask a question on the data you just added.
question = "Where does Mark live?"
result = rag_pipeline.run(
    {
        "query_embedder": {"text": question},
        "prompt_builder": {"question": question},
    }
)
print(result)

集成：MongoDB

目录

概述

安装

使用

示例管道