使用 Apify 和 Haystack 分析您的 Instagram 评论的情绪

在 Colab 中打开下载

_{最后更新：2024 年 10 月 3 日}

作者：Jiri Spilka ( Apify)
创意：Bilge Yücel ( deepset.ai)

您是否曾想过您的 Instagram 帖子是否真的能引起观众的共鸣？在本指南中，我们将向您展示如何使用 Instagram Comment Scraper Actor 从您的 Instagram 帖子下载评论，并使用大型语言模型对其进行分析。所有这些都将在 Haystack 生态系统中通过 apify-haystack 集成完成。

我们将首先使用 Actor 下载评论，使用 DocumentCleaner 清理数据，然后使用 OpenAIGenerator 来发现 Instagram 帖子的“氛围”。

安装依赖项

!pip install apify-haystack==0.1.4 haystack-ai

设置 API 密钥

您需要拥有一个 Apify 账户并获取 APIFY_API_TOKEN。

您还需要一个 OpenAI 账户和 OPENAI_API_KEY

import os
from getpass import getpass

os.environ["APIFY_API_TOKEN"] = getpass("Enter YOUR APIFY_API_TOKEN")
os.environ["OPENAI_API_KEY"] = getpass("Enter YOUR OPENAI_API_KEY")

Enter YOUR APIFY_API_TOKEN··········
Enter YOUR OPENAI_API_KEY··········

使用 Haystack Pipeline 编排 Instagram 评论抓取器、评论清理和使用 LLM 进行分析

现在，让我们决定分析哪个帖子。我们可以从这两个帖子开始，它们可能会揭示一些有趣的见解

@tiffintech 的如何轻松跟上科技潮流？
@kamaharishis 的平价医疗法案

我们将使用 Instagram Scraper Actor 下载评论。但首先，我们需要了解 Actor 的输出格式。

输出格式如下

[
  {
    "text": "You've just uncovered the goldmine for me 😍 but I still love your news and updates!",
    "timestamp": "2024-09-02T16:27:09.000Z",
    "ownerUsername": "codingmermaid.ai",
    "ownerProfilePicUrl": "....",
    "postUrl": "https://www.instagram.com/p/C_a9jcRuJZZ/"
  },
  {
    "text": "Will check it out🙌",
    "timestamp": "2024-09-02T16:29:28.000Z",
    "ownerUsername": "author.parijat",
    "postUrl": "https://www.instagram.com/p/C_a9jcRuJZZ/"
  }
]

我们将使用 dataset_mapping_function 将此 JSON 转换为 Haystack Document，如下所示

from haystack import Document

def dataset_mapping_function(dataset_item: dict) -> Document:
    return Document(content=dataset_item.get("text"), meta={"ownerUsername": dataset_item.get("ownerUsername")})

一旦我们了解了 Actor 的输出格式并拥有了 dataset_mapping_function，我们就可以设置 Haystack 组件，以实现 Haystack 和 Apify 之间的交互。

首先，我们需要提供 actor_id、dataset_mapping_function 以及输入参数 run_input。

我们可以通过三种方式定义 run_input

i) 在创建 ApifyDatasetFromActorCall 类时
ii) 在 pipeline 中作为参数。
iii) 在调用 ApifyDatasetFromActorCall.run() 时作为参数传递给 run() 函数
iv) 如本指南所示，将 i) 和 ii) 结合使用。

有关输入参数的详细描述，请访问 Instagram Comments Scraper 页面。

让我们设置 ApifyDatasetFromActorCall

from apify_haystack import ApifyDatasetFromActorCall

document_loader = ApifyDatasetFromActorCall(
    actor_id="apify/instagram-comment-scraper",
    run_input={"resultsLimit": 50},
    dataset_mapping_function=dataset_mapping_function,
)

接下来，我们将定义一个用于 LLM 的 prompt，并将所有组件连接到 Pipeline 中。

from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.components.preprocessors import DocumentCleaner

prompt = """
Analyze these Instagram comments to determine if the post is generating positive energy, excitement,
or high engagement. Focus on sentiment, emotional tone, and engagement patterns to conclude if
the post is 'vibrating' with high energy. Be concise."

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Analysis:
"""

cleaner = DocumentCleaner(remove_empty_lines=True, remove_extra_whitespaces=True, remove_repeated_substrings=True)
prompt_builder = PromptBuilder(template=prompt)
generator = OpenAIGenerator(model="gpt-4o-mini")


pipe = Pipeline()
pipe.add_component("loader", document_loader)
pipe.add_component("cleaner", cleaner)
pipe.add_component("prompt_builder", prompt_builder)
pipe.add_component("llm", generator)
pipe.connect("loader", "cleaner")
pipe.connect("cleaner", "prompt_builder")
pipe.connect("prompt_builder", "llm")

<haystack.core.pipeline.pipeline.Pipeline object at 0x7b45ef117be0>
🚅 Components
  - loader: ApifyDatasetFromActorCall
  - cleaner: DocumentCleaner
  - prompt_builder: PromptBuilder
  - llm: OpenAIGenerator
🛤️ Connections
  - loader.documents -> cleaner.documents (list[Document])
  - cleaner.documents -> prompt_builder.documents (List[Document])
  - prompt_builder.prompt -> llm.prompt (str)

之后，我们就可以运行 pipeline 了。执行和分析大约需要 30-60 秒。

# \@tiffintech on How to easily keep up with tech?
url = "https://www.instagram.com/p/C_a9jcRuJZZ/"

res = pipe.run({"loader": {"run_input": {"directUrls": [url]}}})
res.get("llm", {}).get("replies", ["No response"])[0]

'Overall, the Instagram comments on the post reflect positive energy, excitement, and high engagement. The use of emojis such as 😂, 😍, 🙌, ❤️, and 🔥 indicate enthusiasm and excitement. Many comments express gratitude, appreciation, and eagerness to explore the resources mentioned in the post. There are also interactions between users tagging each other and discussing their interest in the topic, further increasing engagement. Overall, the post seems to be generating high energy and positive vibes from the audience.'

现在，让我们进行同样的分析。这次使用 @kamalaharris 的帖子

# \@kamalaharris on Affordable Care Act
url = "https://www.instagram.com/p/C_RgBzogufK/"

res = pipe.run({"loader": {"run_input": {"directUrls": [url]}}})
res.get("llm", {}).get("replies", ["No response"])[0]

'The comments on this post are highly polarized, with strong opinions expressed on both sides of the political spectrum. There is a mix of negative and positive sentiment, with some users expressing excitement and support for the current administration (e.g., emojis like 💙💙💙💙, Kamala 👏👏) while others criticize past policies and individuals associated with them (e.g., Trump 2024, lack of education). Overall, the engagement on this post is high, with users actively debating and defending their viewpoints. Despite the divisive nature of the comments, the post is generating a high level of energy and engagement.'

分析显示，关于如何轻松跟上科技潮流？的第一个帖子具有很高的能量

Instagram 评论揭示了高度的参与度和积极的能量。表情符号如 😍、😂、❤️、🙌 和 🔥 被频繁使用，表明了兴奋和热情。评论者表达了感谢、兴奋和对内容的赞赏。总体语气非常积极、支持和鼓励，许多用户标记了其他人来分享内容。总的来说，这个帖子正在引起积极和高度参与的响应。

然而，@kamalaharris 关于平价医疗法案的帖子（不出所料）引发了很多争议和负面评论。

这个帖子的评论产生了负面能量，但参与度很高。评论高度关注政治观点，特别是关于保险公司、平价医疗法案、特朗普和拜登。许多评论表达了沮丧、批评和不同意见，一些用户讨论了党派归属或对特定政治家的支持。还有人提到了错误信息和阴谋论。参与度很高，许多评论串深入讨论了各种政治问题。总的来说，这个帖子因政治观点、分歧和积极的讨论而充满强烈的能量。

💡 您可能会收到略有不同的结果，因为自上次运行以来评论可能已发生变化