构建 GitHub Issue 解决者 Agent

在 Colab 中打开下载

_{最后更新：2025 年 6 月 9 日}

在本教程中，我们将使用Anthropic Claude 4 Sonnet创建一个 GitHub Issue 解决代理。给定一个 issue URL，该代理将：

抓取并解析 issue 描述和评论
识别相关的仓库、目录和文件
检索和处理文件内容
确定解决步骤，并将它们作为评论发布

为此，我们将使用新的 Agent 组件。Agent 是一个 Haystack 组件，它实现了支持提供商无关的聊天模型的工具调用功能。我们可以将 Agent 用作独立组件，也可以在管道中使用它。

这是我们的GitHub Issue 解决管道的外观

安装依赖项

!pip install anthropic-haystack github-haystack -q

import os
from getpass import getpass
from typing import List

from haystack import logging, Document,Pipeline
from haystack.components.agents import Agent
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from haystack.tools.from_function import tool
from haystack_integrations.components.generators.anthropic.chat.chat_generator import AnthropicChatGenerator

logger = logging.getLogger(__name__)

初始化一个 `GitHubIssueViewer` 组件

GitHubIssueViewer 组件接收一个 GitHub issue URL 并返回一个 Haystack 文档列表。第一个文档包含主要的 issue 内容，而后续文档包含 issue 评论。

from haystack_integrations.components.connectors.github import GitHubIssueViewer
issue_viewer = GitHubIssueViewer()
issue_viewer.run(url="https://github.com/deepset-ai/haystack/issues/8903")["documents"]

[Document(id=401aeab38ff82756caddcf20be6191917e0a8d262347f4acc2adb24869c842e9, content: '**Is your feature request related to a problem? Please describe.**
 Most of our components require so...', meta: {'type': 'issue', 'title': 'Proposal to make input variables to `PromptBuilder` and `ChatPromptBuilder` required by default', 'number': 8903, 'state': 'closed', 'created_at': '2025-02-21T14:03:22Z', 'updated_at': '2025-03-21T14:53:27Z', 'author': 'sjrl', 'url': 'https://github.com/deepset-ai/haystack/issues/8903'}),
 Document(id=463748463715f2c4f988273caf73d5006e5a95beeecd04c91a142fa93ce78354, content: 'Old related issue: https://github.com/deepset-ai/haystack/issues/7441', meta: {'type': 'comment', 'issue_number': 8903, 'created_at': '2025-02-21T14:07:54Z', 'updated_at': '2025-02-21T14:07:54Z', 'author': 'anakin87', 'url': 'https://github.com/deepset-ai/haystack/issues/8903#issuecomment-2674648879'}),
 Document(id=d7eb9351f9c74a0d8eac616bfc92f97e06bdb6276c54b3e6ec437e3fc7378cb2, content: '@sjrl with the new run-logic released in 2.10 the component will not always trigger anymore. It need...', meta: {'type': 'comment', 'issue_number': 8903, 'created_at': '2025-02-21T21:32:12Z', 'updated_at': '2025-02-21T21:32:12Z', 'author': 'mathislucka', 'url': 'https://github.com/deepset-ai/haystack/issues/8903#issuecomment-2675585679'}),
 Document(id=1c5ea7c3f07f3db061bf2169b11d59ea675bd8269b7168598e88f5a072ed3a5e, content: '@mathislucka thanks for the additional info. I'll need to talk with @ju-gu again about how exactly h...', meta: {'type': 'comment', 'issue_number': 8903, 'created_at': '2025-02-24T07:02:02Z', 'updated_at': '2025-02-24T07:02:02Z', 'author': 'sjrl', 'url': 'https://github.com/deepset-ai/haystack/issues/8903#issuecomment-2677577510'}),
 Document(id=3994326a4e33ee897938e8cff215d6e407d63d9b800fa088df27aec2cb24ad03, content: '> PromptBuilder with documents (pipeline provided) and query (user provided) will trigger even if it...', meta: {'type': 'comment', 'issue_number': 8903, 'created_at': '2025-02-25T14:55:26Z', 'updated_at': '2025-02-25T14:55:26Z', 'author': 'ju-gu', 'url': 'https://github.com/deepset-ai/haystack/issues/8903#issuecomment-2682266205'}),
 Document(id=630987248d564b2a538b56a1bc2ada63f66d4d53031708b9c004bfc9e1bf9346, content: '> I think this can still cause problems as it can run before the correct input is created inside the...', meta: {'type': 'comment', 'issue_number': 8903, 'created_at': '2025-02-26T08:01:12Z', 'updated_at': '2025-02-26T08:01:12Z', 'author': 'mathislucka', 'url': 'https://github.com/deepset-ai/haystack/issues/8903#issuecomment-2684214013'}),
 Document(id=622bc0e5219da00bfcb908519611a649f7eaa28eebedb3c30ad90d66cc0191ab, content: '> for the PromptBuilder, and ChatPromptBuilder we set all Jinja2 variables as optional by default.
 
 ...', meta: {'type': 'comment', 'issue_number': 8903, 'created_at': '2025-03-11T10:11:54Z', 'updated_at': '2025-03-11T10:12:29Z', 'author': 'LastRemote', 'url': 'https://github.com/deepset-ai/haystack/issues/8903#issuecomment-2713495950'}),
 Document(id=22d69cc52cb789306cf54f56a568d1526cdfe766d42818d18f6fc7ec2f9163ad, content: '@sjrl and I decided that we don't want to make breaking changes to the current behavior of the `Prom...', meta: {'type': 'comment', 'issue_number': 8903, 'created_at': '2025-03-21T14:53:24Z', 'updated_at': '2025-03-21T14:53:24Z', 'author': 'julian-risch', 'url': 'https://github.com/deepset-ai/haystack/issues/8903#issuecomment-2743603353'})]

初始化一个 `GitHubRepoViewer` 工具

此工具根据给定的 repo 和 path 从 GitHub 仓库检索内容

如果 path 指向一个目录，它将返回一个文档列表（每个项目一个），其中每个文档包含项目的名称（文件或目录），以及完整的路径和 Document.meta 中的元数据。
如果 path 指向一个文件，它将返回一个包含文件内容的单个文档，其中包含完整的路径和 Document.meta 中的元数据。
如果发生错误，它将返回一个包含错误消息的单个文档，并且 Document.meta 包括 type="error"。

from haystack_integrations.tools.github import GitHubRepoViewerTool
github_repo_viewer_tool = GitHubRepoViewerTool()

通过函数模拟 `GitHubIssueCommenter` 工具

使用 @tool 装饰器，我们可以轻松地将函数转换为工具，并使用其文档字符串作为描述。

现在，让我们创建一个工具（一个模拟器），允许代理在 GitHub issue 上编写评论。此工具还将作为代理的退出条件，指示其任务何时完成。

如果您愿意，以后可以用 GitHubIssueCommenterTool 替换此模拟器。您需要一个 GitHub 个人访问令牌才能启用在 GitHub 上进行评论。有关详细信息，请参阅 GitHub-Haystack 集成页面。

@tool
def write_github_comment(comment: str) -> str:
    """
    Use this to create a comment on GitHub once you finished your exploration.
    """
    return comment

使用工具创建“Issue 解决代理”

要初始化代理，我们需要：

工具列表 (✅)
聊天生成器
系统提示

我们将首先创建 ChatGenerator。在此示例中，我们将使用 AnthropicChatGenerator 和 claude-sonnet-4-20250514 模型。

os.environ["ANTHROPIC_API_KEY"] = getpass("Anthropic Key: ")

Anthropic Key: ··········

chat_generator = AnthropicChatGenerator(model="claude-sonnet-4-20250514", generation_kwargs={"max_tokens": 8000})

在此示例中，我们将使用预定义的系统提示，指导代理分析 GitHub issue，探索仓库中的相关文件，并生成包含解决步骤的详细评论。当然，您也可以使用自己的自定义提示。

from haystack_integrations.prompts.github import SYSTEM_PROMPT
print(SYSTEM_PROMPT[:100]+"...")

The assistant is Haystack-Agent, created by deepset.
Haystack-Agent helps developers to develop soft...

最后，我们使用 chat_generator、SYSTEM_PROMPT 和 tools 创建代理。我们将 exit_conditions=["write_github_comment"] 设置为确保代理在 write_github_comment 工具使用后停止。对于 state_schema，我们定义了 {"documents": {"type": List[Document]}}，允许代理累积从工具（如 github_repo_viewer_tool）检索的文档。

issue_resolver_agent = Agent(
    chat_generator=chat_generator,
    system_prompt=SYSTEM_PROMPT,
    tools=[github_repo_viewer_tool, write_github_comment],
    exit_conditions=["write_github_comment"],
    state_schema={"documents": {"type": List[Document]}},
)

💡 提示：您可以将内置的 print_streaming_chunk 或您自定义的函数传递给 Agent，以启用流式传输并实时查看工具调用和结果。

from haystack.components.generators.utils import print_streaming_chunk

issue_resolver_agent = Agent(
    chat_generator=chat_generator,
    system_prompt=SYSTEM_PROMPT,
    tools=[github_repo_viewer_tool, write_github_comment],
    exit_conditions=["write_github_comment"],
    state_schema={"documents": {"type": List[Document]}},
    streaming_callback=print_streaming_chunk
)

构建 Issue 解决管道

有了所有组件，我们现在可以组装 issue 解决管道。

issue_viewer = GitHubIssueViewer()
issue_template = """
Issue from: {{ url }}
{% for document in documents %}
{% if loop.index == 1 %}
**Title: {{ document.meta.title }}**
{% endif %}
<issue-comment>
{{document.content}}
</issue-comment>
{% endfor %}
"""

issue_builder = ChatPromptBuilder(template=[ChatMessage.from_user(issue_template)], required_variables="*")

issue_resolver = Pipeline()
issue_resolver.add_component("issue_viewer", issue_viewer)
issue_resolver.add_component("issue_builder", issue_builder)
issue_resolver.add_component("issue_resolver_agent", issue_resolver_agent)

issue_resolver.connect("issue_viewer.documents", "issue_builder.documents")
issue_resolver.connect("issue_builder.prompt", "issue_resolver_agent.messages")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7c205115ab90>
🚅 Components
  - issue_viewer: GitHubIssueViewer
  - issue_builder: ChatPromptBuilder
  - issue_resolver_agent: Agent
🛤️ Connections
  - issue_viewer.documents -> issue_builder.documents (List[Document])
  - issue_builder.prompt -> issue_resolver_agent.messages (List[ChatMessage])

让我们来尝试我们的管道

现在，让我们使用 issue URL 运行管道，看看代理是如何工作的。

issue_url = "https://github.com/deepset-ai/haystack-core-integrations/issues/1819"
result = issue_resolver.run({"url": issue_url})

让我们看看我们管道生成的评论，以解决给定的 issue。

print(result["issue_resolver_agent"]["last_message"].tool_call_result.result)

# Implementation: Adding `component_info` Support to `AmazonBedrockChatGenerator`

I've analyzed the codebase and the PR mentioned in the issue. Here's my proposed implementation to add `component_info` support to `AmazonBedrockChatGenerator`, following the same pattern used in the main Haystack repository.

## Changes Required

### 1. Update Imports in `utils.py`

The `ComponentInfo` class needs to be imported:

```python
from haystack.dataclasses import (
    AsyncStreamingCallbackT,
    ChatMessage,
    ChatRole,
    ComponentInfo,  # Add this import
    StreamingChunk,
    SyncStreamingCallbackT,
    ToolCall,
)
```

### 2. Update `_convert_event_to_streaming_chunk` Function

Modify the function signature to accept an optional `component_info` parameter and include it in all `StreamingChunk` creations:

```python
def _convert_event_to_streaming_chunk(
    event: Dict[str, Any], 
    model: str, 
    component_info: Optional[ComponentInfo] = None
) -> StreamingChunk:
    """
    Convert a Bedrock streaming event to a Haystack StreamingChunk.

    Handles different event types (contentBlockStart, contentBlockDelta, messageStop, metadata) and extracts relevant
    information to create StreamingChunk objects in the same format used by Haystack's OpenAIChatGenerator.

    :param event: Dictionary containing a Bedrock streaming event.
    :param model: The model ID used for generation, included in chunk metadata.
    :param component_info: An optional `ComponentInfo` object containing information about the component that
        generated the chunk, such as the component name and type.
    :returns: StreamingChunk object containing the content and metadata extracted from the event.
    """
    # Initialize an empty StreamingChunk to return if no relevant event is found
    # (e.g. for messageStart and contentBlockStop)
    streaming_chunk = StreamingChunk(
        content="", 
        component_info=component_info,  # Add component_info here
        meta={"model": model, "received_at": datetime.now(timezone.utc).isoformat()}
    )

    if "contentBlockStart" in event:
        # contentBlockStart always has the key "contentBlockIndex"
        block_start = event["contentBlockStart"]
        block_idx = block_start["contentBlockIndex"]
        if "start" in block_start and "toolUse" in block_start["start"]:
            tool_start = block_start["start"]["toolUse"]
            streaming_chunk = StreamingChunk(
                content="",
                component_info=component_info,  # Add component_info here
                meta={
                    "model": model,
                    # This is always 0 b/c it represents the choice index
                    "index": 0,
                    # We follow the same format used in the OpenAIChatGenerator
                    "tool_calls": [  # Optional[List[ChoiceDeltaToolCall]]
                        {
                            "index": block_idx,  # int
                            "id": tool_start["toolUseId"],  # Optional[str]
                            "function": {  # Optional[ChoiceDeltaToolCallFunction]
                                # Will accumulate deltas as string
                                "arguments": "",  # Optional[str]
                                "name": tool_start["name"],  # Optional[str]
                            },
                            "type": "function",  # Optional[Literal["function"]]
                        }
                    ],
                    "finish_reason": None,
                    "received_at": datetime.now(timezone.utc).isoformat(),
                },
            )

    elif "contentBlockDelta" in event:
        # contentBlockDelta always has the key "contentBlockIndex" and "delta"
        block_idx = event["contentBlockDelta"]["contentBlockIndex"]
        delta = event["contentBlockDelta"]["delta"]
        # This is for accumulating text deltas
        if "text" in delta:
            streaming_chunk = StreamingChunk(
                content=delta["text"],
                component_info=component_info,  # Add component_info here
                meta={
                    "model": model,
                    # This is always 0 b/c it represents the choice index
                    "index": 0,
                    "tool_calls": None,
                    "finish_reason": None,
                    "received_at": datetime.now(timezone.utc).isoformat(),
                },
            )
        # This only occurs when accumulating the arguments for a toolUse
        # The content_block for this tool should already exist at this point
        elif "toolUse" in delta:
            streaming_chunk = StreamingChunk(
                content="",
                component_info=component_info,  # Add component_info here
                meta={
                    "model": model,
                    # This is always 0 b/c it represents the choice index
                    "index": 0,
                    "tool_calls": [  # Optional[List[ChoiceDeltaToolCall]]
                        {
                            "index": block_idx,  # int
                            "id": None,  # Optional[str]
                            "function": {  # Optional[ChoiceDeltaToolCallFunction]
                                # Will accumulate deltas as string
                                "arguments": delta["toolUse"].get("input", ""),  # Optional[str]
                                "name": None,  # Optional[str]
                            },
                            "type": "function",  # Optional[Literal["function"]]
                        }
                    ],
                    "finish_reason": None,
                    "received_at": datetime.now(timezone.utc).isoformat(),
                },
            )

    elif "messageStop" in event:
        finish_reason = event["messageStop"].get("stopReason")
        streaming_chunk = StreamingChunk(
            content="",
            component_info=component_info,  # Add component_info here
            meta={
                "model": model,
                # This is always 0 b/c it represents the choice index
                "index": 0,
                "tool_calls": None,
                "finish_reason": finish_reason,
                "received_at": datetime.now(timezone.utc).isoformat(),
            },
        )

    elif "metadata" in event and "usage" in event["metadata"]:
        metadata = event["metadata"]
        streaming_chunk = StreamingChunk(
            content="",
            component_info=component_info,  # Add component_info here
            meta={
                "model": model,
                # This is always 0 b/c it represents the choice index
                "index": 0,
                "tool_calls": None,
                "finish_reason": None,
                "received_at": datetime.now(timezone.utc).isoformat(),
                "usage": {
                    "prompt_tokens": metadata["usage"].get("inputTokens", 0),
                    "completion_tokens": metadata["usage"].get("outputTokens", 0),
                    "total_tokens": metadata["usage"].get("totalTokens", 0),
                },
            },
        )

    return streaming_chunk
```

### 3. Update Streaming Response Parsing Functions

Update both sync and async versions to create `ComponentInfo` and pass it to the conversion function:

```python
def _parse_streaming_response(
    response_stream: EventStream,
    streaming_callback: SyncStreamingCallbackT,
    model: str,
    component_info: Optional[ComponentInfo] = None,  # Add this parameter
) -> List[ChatMessage]:
    """
    Parse a streaming response from Bedrock.

    :param response_stream: EventStream from Bedrock API
    :param streaming_callback: Callback for streaming chunks
    :param model: The model ID used for generation
    :param component_info: An optional `ComponentInfo` object containing information about the component that
        generated the chunk, such as the component name and type.
    :return: List of ChatMessage objects
    """
    chunks: List[StreamingChunk] = []
    for event in response_stream:
        streaming_chunk = _convert_event_to_streaming_chunk(
            event=event, 
            model=model, 
            component_info=component_info  # Pass component_info here
        )
        streaming_callback(streaming_chunk)
        chunks.append(streaming_chunk)
    replies = [_convert_streaming_chunks_to_chat_message(chunks=chunks)]
    return replies


async def _parse_streaming_response_async(
    response_stream: EventStream,
    streaming_callback: AsyncStreamingCallbackT,
    model: str,
    component_info: Optional[ComponentInfo] = None,  # Add this parameter
) -> List[ChatMessage]:
    """
    Parse a streaming response from Bedrock.

    :param response_stream: EventStream from Bedrock API
    :param streaming_callback: Callback for streaming chunks
    :param model: The model ID used for generation
    :param component_info: An optional `ComponentInfo` object containing information about the component that
        generated the chunk, such as the component name and type.
    :return: List of ChatMessage objects
    """
    chunks: List[StreamingChunk] = []
    async for event in response_stream:
        streaming_chunk = _convert_event_to_streaming_chunk(
            event=event, 
            model=model, 
            component_info=component_info  # Pass component_info here
        )
        await streaming_callback(streaming_chunk)
        chunks.append(streaming_chunk)
    replies = [_convert_streaming_chunks_to_chat_message(chunks=chunks)]
    return replies
```

### 4. Update Chat Generator Import

In `chat_generator.py`, add the `ComponentInfo` import:

```python
from haystack.dataclasses import ChatMessage, StreamingCallbackT, ComponentInfo, select_streaming_callback
```

### 5. Update Chat Generator Methods

Update both `run` and `run_async` methods to create and pass `ComponentInfo`:

```python
# In the run method, update the streaming section:
if callback:
    response = self.client.converse_stream(**params)
    response_stream: EventStream = response.get("stream")
    if not response_stream:
        msg = "No stream found in the response."
        raise AmazonBedrockInferenceError(msg)
    
    # Create ComponentInfo from this component instance
    component_info = ComponentInfo.from_component(self)
    replies = _parse_streaming_response(response_stream, callback, self.model, component_info)
else:
    response = self.client.converse(**params)
    replies = _parse_completion_response(response, self.model)

# In the run_async method, update the streaming section:
if callback:
    response = await async_client.converse_stream(**params)
    response_stream: EventStream = response.get("stream")
    if not response_stream:
        msg = "No stream found in the response."
        raise AmazonBedrockInferenceError(msg)
    
    # Create ComponentInfo from this component instance
    component_info = ComponentInfo.from_component(self)
    replies = await _parse_streaming_response_async(response_stream, callback, self.model, component_info)
else:
    response = await async_client.converse(**params)
    replies = _parse_completion_response(response, self.model)
```

## Testing

The existing tests should continue to pass with these changes. The `ComponentInfo` will be automatically populated for streaming chunks, making the component information available to any streaming callback functions.

You can test the implementation by running streaming inference and checking that `chunk.component_info` is properly populated:

```python
def test_streaming_callback(chunk):
    assert chunk.component_info is not None
    assert "AmazonBedrockChatGenerator" in chunk.component_info.type
    # component name will be None unless the component is added to a pipeline with a specific name

generator = AmazonBedrockChatGenerator(model="anthropic.claude-3-5-sonnet-20240620-v1:0", streaming_callback=test_streaming_callback)
```

This implementation follows the exact same pattern as the OpenAI chat generator and ensures consistency across all chat generators in the Haystack ecosystem.

# Render it in markdown format
from IPython.display import Markdown, display

display(Markdown("# Comment from Agent\n\n" + result["issue_resolver_agent"]["last_message"].tool_call_result.result))

通过查看其他 messages，您可以观察到我们的 Issue 解决代理的迭代过程，它逐步生成 GitHub 评论，进行工具调用并处理其结果。

result["issue_resolver_agent"]["messages"]

我们还可以查看代理查看过的文件

for document in result["issue_resolver_agent"]["documents"]:
    if document.meta["type"] in ["file_content"]:
        display(Markdown(f"[{document.meta['url']}]({document.meta['url']})"))