📘 **TELUS Agriculture & Consumer Goods** 如何通过 **Haystack Agents** 转变促销交易
由 deepset 维护

集成:DeepEval

使用 DeepEval 评估框架计算基于模型的指标

作者
deepset

目录

概述

DeepEval (由 Confident AI 提供) 是一个开源的模型驱动的评估框架,用于通过量化 LLM 应用程序在忠实性、答案相关性、上下文召回率等方面的性能来进行评估。更多信息可以在 文档页面找到。

安装

安装 DeepEval 集成

pip install deepeval-haystack

使用

安装后,您将可以使用 DeepEvalEvaluator,它支持各种基于模型的评估指标。

  • 答案相关性
  • 忠实度
  • 上下文精确度
  • 上下文召回率
  • 上下文相关性

除了评估分数,DeepEval 的评估器还为每次评估提供了额外的推理。

DeepEvalEvaluator

要使用此集成计算基于模型的评估指标,请初始化一个 DeepEvalEvaluator 并提供指标名称和指标输入参数。

from haystack import Pipeline
from haystack_integrations.components.evaluators.deepeval import DeepEvalEvaluator, DeepEvalMetric

QUESTIONS = [
    "Which is the most popular global sport?",
    "Who created the Python language?",
]
CONTEXTS = [
    [
        "The popularity of sports can be measured in various ways, including TV viewership, social media presence, number of participants, and economic impact.",
        "Football is undoubtedly the world's most popular sport with major events like the FIFA World Cup and sports personalities like Ronaldo and Messi, drawing a followership of more than 4 billion people.",
    ],
    [
        "Python, created by Guido van Rossum in the late 1980s, is a high-level general-purpose programming language.",
        "Its design philosophy emphasizes code readability, and its language constructs aim to help programmers write clear, logical code for both small and large-scale software projects.",
    ],
]
RESPONSES = [
    "Football is the most popular sport with around 4 billion followers worldwide",
    "Python language was created by Guido van Rossum.",
]

pipeline = Pipeline()
evaluator = DeepEvalEvaluator(
    metric=DeepEvalMetric.FAITHFULNESS,
    metric_params={"model": "gpt-4"},
)
pipeline.add_component("evaluator", evaluator)

# Each metric expects a specific set of parameters as input. Refer to the
# DeepEvalMetric class' documentation for more details.
results = pipeline.run({"evaluator": {"questions": QUESTIONS, "contexts": CONTEXTS, "responses": RESPONSES}})

for output in results["evaluator"]["results"]:
    print(output)