集成:langfuse
监控和追踪您的 Haystack 请求。
目录
概述
langfuse-haystack 通过 Langfuse 将追踪功能集成到 Haystack 流水线中。该包通过捕获执行追踪的全面详细信息(包括 API 调用、上下文数据、提示等)来增强流水线运行的可视性。无论您是在监控模型性能、找出需要改进的地方,还是从流水线执行中创建用于微调和测试的数据集,langfuse-haystack 都是适合您的工具。
特性
- 易于集成到 Haystack 流水线
- 捕获执行的完整上下文
- 追踪模型使用情况和成本
- 收集用户反馈
- 识别低质量输出
- 构建微调和测试数据集
要使用此集成,请注册 Langfuse 账户。有关最新功能和定价信息,请参阅Langfuse 文档。
安装
pip install langfuse-haystack
使用
组件
此集成引入了一个组件
-
LangfuseConnectorLangfuseConnector将 Haystack LLM 框架与 Langfuse 连接起来,以便能够追踪流水线各个组件中的操作和数据流。只需将此组件添加到您的流水线中,但不要将其连接到任何其他组件。LangfuseConnector将自动追踪流水线中的操作和数据流。请注意,您需要设置
LANGFUSE_SECRET_KEY和LANGFUSE_PUBLIC_KEY环境变量才能使用此组件。LANGFUSE_SECRET_KEY和LANGFUSE_PUBLIC_KEY是 Langfuse 提供的密钥。您可以通过在Langfuse 网站上注册账户来获取这些密钥。此外,您还需要将HAYSTACK_CONTENT_TRACING_ENABLED环境变量设置为true,以启用流水线中的 Haystack 追踪。这些代码示例还需要设置
OPENAI_API_KEY环境变量。Haystack 与模型无关,您可以通过更改下面代码示例中的生成器来使用我们支持的任何模型提供商。
在 RAG 流水线中使用 LangfuseConnector
首先,安装一些额外的依赖项。
pip install sentence-transformers datasets
from datasets import load_dataset
from haystack import Document, Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.generators import OpenAIGenerator
from haystack.components.retrievers import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.connectors.langfuse import LangfuseConnector
def get_pipeline(document_store: InMemoryDocumentStore):
retriever = InMemoryEmbeddingRetriever(document_store=document_store, top_k=2)
template = """
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
"""
prompt_builder = PromptBuilder(template=template)
basic_rag_pipeline = Pipeline()
# Add components to your pipeline
basic_rag_pipeline.add_component("tracer", LangfuseConnector("Basic RAG Pipeline"))
basic_rag_pipeline.add_component(
"text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", OpenAIGenerator(generation_kwargs={"n": 2}))
# Now, connect the components to each other
# NOTE: the tracer component doesn't need to be connected to anything in order to work
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder.documents")
basic_rag_pipeline.connect("prompt_builder", "llm")
return basic_rag_pipeline
document_store = InMemoryDocumentStore()
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
embedder = SentenceTransformersDocumentEmbedder("sentence-transformers/all-MiniLM-L6-v2")
embedder.warm_up()
docs_with_embeddings = embedder.run([Document(**ds) for ds in dataset]).get("documents") or [] # type: ignore
document_store.write_documents(docs_with_embeddings)
pipeline = get_pipeline(document_store)
question = "What does Rhodes Statue look like?"
response = pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})
# {'tracer': {'name': 'Basic RAG Pipeline', 'trace_url': 'https://cloud.langfuse.com/trace/3d52b8cc-87b6-4977-8927-5e9f3ff5b1cb'}, 'llm': {'replies': ['The Rhodes Statue was described as being about 105 feet tall, with iron tie bars and brass plates forming the skin. It was built on a white marble pedestal near the Rhodes harbour entrance. The statue was filled with stone blocks as construction progressed.', 'The Rhodes Statue was described as being about 32 meters (105 feet) tall, built with iron tie bars, brass plates for skin, and filled with stone blocks. It stood on a 15-meter-high white marble pedestal near the Rhodes harbor entrance.'], 'meta': [{'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 100, 'prompt_tokens': 453, 'total_tokens': 553}}, {'model': 'gpt-4o-mini', 'index': 1, 'finish_reason': 'stop', 'usage': {'completion_tokens': 100, 'prompt_tokens': 453, 'total_tokens': 553}}]}}
运行完这些代码示例后,您还可以使用 Langfuse 仪表板查看和交互追踪。
在包含 OpenAIChatGenerator 和 ChatPromptBuilder 的流水线中使用 LangfuseConnector
from haystack import Pipeline
from haystack.components.builders import ChatPromptBuilder
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
from haystack_integrations.components.connectors.langfuse import LangfuseConnector
pipe = Pipeline()
pipe.add_component("tracer", LangfuseConnector("Chat example"))
pipe.add_component("prompt_builder", ChatPromptBuilder())
pipe.add_component("llm", OpenAIChatGenerator())
pipe.connect("prompt_builder.prompt", "llm.messages")
messages = [
ChatMessage.from_system("Always respond in German even if some input data is in other languages."),
ChatMessage.from_user("Tell me about {{location}}"),
]
response = pipe.run(
data={"prompt_builder": {"template_variables": {"location": "Berlin"}, "template": messages}}
)
print(response["llm"]["replies"][0])
print(response["tracer"]["trace_url"])
# ChatMessage(content='Berlin ist die Hauptstadt von Deutschland und zugleich eines der bekanntesten kulturellen Zentren Europas. Die Stadt hat eine faszinierende Geschichte, die bis in die Zeiten des Zweiten Weltkriegs und des Kalten Krieges zurückreicht. Heute ist Berlin für seine vielfältige Kunst- und Musikszene, seine historischen Stätten wie das Brandenburger Tor und die Berliner Mauer sowie seine lebendige Street-Food-Kultur bekannt. Berlin ist auch für seine grünen Parks und Seen beliebt, die den Bewohnern und Besuchern Raum für Erholung bieten.', role=<ChatRole.ASSISTANT: 'assistant'>, name=None, meta={'model': 'gpt-4o-mini', 'index': 0, 'finish_reason': 'stop', 'usage': {'completion_tokens': 137, 'prompt_tokens': 29, 'total_tokens': 166}})
# https://cloud.langfuse.com/trace/YOUR_UNIQUE_IDENTIFYING_STRING
许可证
langfuse-haystack 根据 Apache-2.0 许可证分发。
