集成：Flow Judge

使用 Flow Judge 评估 Haystack 管道

作者

Flow AI

GitHub 仓库 PyPI 包

概述

此集成允许您使用 Flow Judge 评估 Haystack 管道。

Flow Judge 是一个开源、轻量级 (3.8B) 的语言模型，针对 LLM 系统评估进行了优化。它专为准确性、速度和定制性而设计。

在此处阅读技术报告。

安装

使用 vLLM 引擎运行 Flow Judge

pip install flow-judge[vllm]
pip install 'flash_attn>=2.6.3' --no-build-isolation

使用 Transformers 运行 Flow Judge

pip install flow-judge[hf]

如果使用 Flash Attention

pip install 'flash_attn>=2.6.3' --no-build-isolation

在 macOS 上使用 Llamafile 运行 Flow Judge

pip install flow-judge[llamafile]
pip install 'flash_attn>=2.6.3' --no-build-isolation

要了解更多关于安装的信息，请访问 Flow Judge 安装页面。

最后安装 Haystack

pip install haystack-ai

使用

Flow Judge 与 Haystack 的集成旨在促进使用 Flow Judge 评估 Haystack 管道。此集成允许您将 Flow Judge 无缝集成到您的 Haystack 工作流中，使您能够精确高效地评估和改进您的 LLM 系统。

Flow Judge 提供了一套内置指标和易于创建的自定义指标。

可用的内置指标

内置指标具有 3 种不同的评分量表：二元、3 点李克特和 5 点李克特。

响应正确性
响应忠实度
响应相关性

要检查可用指标，您可以运行

from flow_judge.metrics import list_all_metrics
list_all_metrics()

虽然这些预设指标为评估提供了坚实的基础，但 Flow Judge 的真正强大之处在于其能够创建定制指标以满足您的特定需求。这种灵活性允许更细致、更全面的 LLM 系统评估。有关创建自定义指标的更多详细信息，请参阅我们的教程。

组件

此集成引入了 HaystackFlowJudge 组件，该组件的使用方式与其他 Haystack 中的评估器组件类似。

有关此组件的使用和参数的详细信息，请参阅 HaystackFlowJudge 类以及 Haystack 的 LLMEvaluator 组件。

将 Flow Judge 与 Haystack 结合使用

我们创建了一个全面的指南，介绍如何有效地将 Flow Judge 与 Haystack 结合使用。您可以在此处访问它。本教程演示了如何使用 Flow Judge 评估使用 Haystack 构建的 RAG 管道。

快速示例

下面的代码片段提供了一个更简单的示例，说明如何将 Flow Judge 与 Haystack 集成。但是，我们建议遵循完整的教程以更深入地理解概念和实现。

from flow_judge.integrations.haystack import HaystackFlowJudge
from flow_judge.metrics.presets import RESPONSE_FAITHFULNESS_5POINT
from flow_judge import Hf

from haystack import Pipeline

# Create a model using Hugging Face Transformers with Flash Attention
model = Hf() # We support also Vllm, Llamafile

# Evaluation sample 
questions = ["What is the termination clause in the contract?"] 
contexts = ["This contract may be terminated by either party upon providing thirty (30) days written notice to the other party. In the event of a breach of contract, the non-breaching party may terminate the contract immediately."]
answers = ["The contract can be terminated by either party with thirty days written notice."] 

# Define the HaystackFlowJudge evaluator, we will use the built-in metric for faithfulness 
# For parameters refer to Haystack's [LLMEvaluator](https://docs.haystack.com.cn/reference/evaluators-api#module-llm_evaluator) and HaystackFlowJudge class. 
ff_evaluator = HaystackFlowJudge(
    metric=RESPONSE_FAITHFULNESS_5POINT,
    model=model,
    progress_bar=True,
    raise_on_failure=True,
    save_results=True,
    fail_on_parse_error=False
)

# Setup the pipeline
eval_pipeline = Pipeline()

# Add components to the pipeline
eval_pipeline.add_component("ff_evaluator", ff_evaluator)

# Run the eval pipeline
results = eval_pipeline.run(
    {
        "ff_evaluator": {
            'query': questions,
            'context': contexts,
            'response': answers,
        }
    }
)

# Print eval results 
for result in results['ff_evaluator']['results']:
    score = result['score']
    feedback = result['feedback']
    print(f"Score: {score}")
    print(f"Feedback: {feedback}\n")

许可证

该代码根据 Apache 2.0 许可证获得许可。

集成：Flow Judge

目录

概述

安装

使用

可用的内置指标

组件

将 Flow Judge 与 Haystack 结合使用

快速示例

许可证