使用自定义组件进行 Hacker News 摘要
最后更新:2025 年 3 月 10 日
作者 Tuana Celik:Twitter,LinkedIn
📚 阅读 使用 Haystack 自定义 RAG 管道以总结 Hacker News 最新帖子 文章,了解此示例的详细演练。
安装依赖项
!pip install newspaper3k
!pip install haystack-ai
创建自定义 Haystack 组件
此 HackernewsNewestFetcher 获取 Hacker News 上 last_k 条最新帖子,并将内容作为 Haystack Document 对象列表返回。
from typing import List
from haystack import component, Document
from newspaper import Article
import requests
@component
class HackernewsNewestFetcher():
@component.output_types(articles=List[Document])
def run(self, last_k: int):
newest_list = requests.get(url='https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
articles = []
for id in newest_list.json()[0:last_k]:
article = requests.get(url=f"https://hacker-news.firebaseio.com/v0/item/{id}.json?print=pretty")
if 'url' in article.json():
articles.append(article.json()['url'])
docs = []
for url in articles:
try:
article = Article(url)
article.download()
article.parse()
docs.append(Document(content=article.text, meta={'title': article.title, 'url': url}))
except:
print(f"Couldn't download {url}, skipped")
return {'articles': docs}
创建 Haystack 2.0 RAG 管道
此管道使用了撰写本文时(2023 年 9 月 22 日)Haystack 2.0 预览版中可用的组件,以及我们上面创建的自定义组件。
最终结果是一个 RAG 管道,旨在提供 Hacker News 上 last_k 条帖子的摘要列表,并在后面附带源 URL。
from getpass import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass("OpenAI Key: ")
from haystack import Pipeline
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
prompt_template = """
You will be provided a few of the latest posts in HackerNews, followed by their URL.
For each post, provide a brief summary followed by the URL the full post can be found in.
Posts:
{% for article in articles %}
{{article.content}}
URL: {{article.meta['url']}}
{% endfor %}
"""
prompt_builder = PromptBuilder(template=prompt_template)
llm = OpenAIGenerator(model="gpt-4")
fetcher = HackernewsNewestFetcher()
pipe = Pipeline()
pipe.add_component("hackernews_fetcher", fetcher)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", llm)
pipe.connect("hackernews_fetcher.articles", "prompt_builder.articles")
pipe.connect("prompt_builder.prompt", "llm.prompt")
result = pipe.run(data={"hackernews_fetcher": {"last_k": 3}})
print(result['llm']['replies'][0])
