使用 OpenAI ChatGenerator 计算幻觉分数

在 Colab 中打开下载

_{最后更新：2025 年 9 月 4 日}

在此 Cookbook 中，我们将展示如何根据研究论文 LLMs are Bayesian, in Expectation, not in Realization 和此 GitHub 仓库 https://github.com/leochlon/hallbayes 来计算幻觉风险。

在此 Notebook 中，我们将使用 haystack-experimental 中的 OpenAIChatGenerator。

设置环境

%pip install haystack-experimental -q

设置 OpenAI API 密钥

import os
from getpass import getpass

if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = getpass("Enter OpenAI API key:")

Enter OpenAI API key: ········

封闭书籍示例

基于原始 GitHub 仓库此处的示例

from haystack.dataclasses import ChatMessage

from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator

llm = OpenAIChatGenerator(model="gpt-4o")

closed_book_result = llm.run(
    messages=[ChatMessage.from_user(text="Who won the 2019 Nobel Prize in Physics?")],
    hallucination_score_config=HallucinationScoreConfig(
        skeleton_policy="closed_book" # NOTE: We set "closed_book" here for closed-book hallucination risk calculation
    ),
)
print(f"Decision: {closed_book_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {closed_book_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {closed_book_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{closed_book_result['replies'][0].text}")

Decision: ANSWER
Risk bound: 0.000
Rationale: Δ̄=8.2088 nats, B2T=1.8947, ISR=4.332 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.000; y='answer'
Answer:
The 2019 Nobel Prize in Physics was awarded to three scientists for their contributions to understanding the universe. Half of the prize went to James Peebles for his theoretical discoveries in physical cosmology. The other half was jointly awarded to Michel Mayor and Didier Queloz for their discovery of an exoplanet orbiting a solar-type star.

基于证据的示例

基于原始 GitHub 仓库此处的示例

from haystack.dataclasses import ChatMessage

from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator

llm = OpenAIChatGenerator(model="gpt-4o")

rag_result = llm.run(
    messages=[
        ChatMessage.from_user(
            text="Task: Answer strictly based on the evidence provided below.\n"
            "Question: Who won the Nobel Prize in Physics in 2019?\n"
            "Evidence:\n"
            "- Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2).\n"
            "Constraints: If evidence is insufficient or conflicting, refuse."
        )
    ],
    hallucination_score_config=HallucinationScoreConfig(
        skeleton_policy="evidence_erase"  # NOTE: We set "evidence_erase" here for evidence-based hallucination risk calculation
    ),
)
print(f"Decision: {rag_result['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {rag_result['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {rag_result['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{rag_result['replies'][0].text}")

Decision: ANSWER
Risk bound: 0.541
Rationale: Δ̄=12.0000 nats, B2T=1.8947, ISR=6.333 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.541; y='answer'
Answer:
The Nobel Prize in Physics in 2019 was awarded to James Peebles, who received half of the prize, and to Michel Mayor and Didier Queloz, who shared the other half of the prize.

基于 RAG 的示例

创建 Document Store 并索引一些文档

from haystack import Document
from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

docs = [
    Document(content="Nobel Prize press release (2019): James Peebles (1/2); Michel Mayor & Didier Queloz (1/2)"),
    Document(content="Nikola Tesla was a Serbian-American engineer, futurist, and inventor. He is known for his contributions to the design of the modern alternating current (AC) electricity supply system.")
]
document_store.write_documents(docs)

创建 RAG 问答流水线

from haystack import Pipeline
from haystack.dataclasses import ChatMessage
from haystack.components.builders import ChatPromptBuilder
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever

from haystack_experimental.utils.hallucination_risk_calculator.dataclasses import HallucinationScoreConfig
from haystack_experimental.components.generators.chat.openai import OpenAIChatGenerator

# Create the pipeline
pipe = Pipeline()

# Add components
user_template = """Task: Answer strictly based on the evidence provided below.
Question: {{query}}
Evidence:
{%- for document in documents %}
- {{document.content}}
{%- endfor -%}
Constraints: If evidence is insufficient or conflicting, refuse.
"""
pipe.add_component("retriever", InMemoryBM25Retriever(document_store))
pipe.add_component(
    "prompt_builder",
    ChatPromptBuilder(template=[ChatMessage.from_user(user_template)], required_variables="*")
)
pipe.add_component("llm", OpenAIChatGenerator(model="gpt-4o"))

# Connect the components
pipe.connect("retriever.documents", "prompt_builder.documents")
pipe.connect("prompt_builder.prompt", "llm.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x1426632e0>
🚅 Components
  - retriever: InMemoryBM25Retriever
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
🛤️ Connections
  - retriever.documents -> prompt_builder.documents (list[Document])
  - prompt_builder.prompt -> llm.messages (list[ChatMessage])

运行一个可基于证据回答的查询

query = "Who won the Nobel Prize in Physics in 2019?"

result = pipe.run(
    data={
        "retriever": {"query": query},
        "prompt_builder": {"query": query},
        "llm": {
            "hallucination_score_config": HallucinationScoreConfig(skeleton_policy="evidence_erase")
        }
    }
)

print(f"Decision: {result['llm']['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {result['llm']['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {result['llm']['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{result['llm']['replies'][0].text}")

Decision: ANSWER
Risk bound: 0.541
Rationale: Δ̄=12.0000 nats, B2T=1.8947, ISR=6.333 (thr=1.000), extra_bits=0.200; EDFL RoH bound=0.541; y='answer'
Answer:
The Nobel Prize in Physics in 2019 was awarded to James Peebles (1/2), and Michel Mayor & Didier Queloz (1/2).

运行一个不应被回答的查询

query = "Who won the Nobel Prize in Physics in 2022?"

result = pipe.run(
    data={
        "retriever": {"query": query},
        "prompt_builder": {"query": query},
        "llm": {
            "hallucination_score_config": HallucinationScoreConfig(skeleton_policy="evidence_erase")
        }
    }
)

print(f"Decision: {result['llm']['replies'][0].meta['hallucination_decision']}")
print(f"Risk bound: {result['llm']['replies'][0].meta['hallucination_risk']:.3f}")
print(f"Rationale: {result['llm']['replies'][0].meta['hallucination_rationale']}")
print(f"Answer:\n{result['llm']['replies'][0].text}")

Decision: REFUSE
Risk bound: 1.000
Rationale: Δ̄=0.0000 nats, B2T=1.8947, ISR=0.000 (thr=1.000), extra_bits=0.200; EDFL RoH bound=1.000; y='refuse'
Answer:
The evidence provided does not include information about the Nobel Prize in Physics for the year 2022. Therefore, I cannot answer the question based on the evidence provided.