流式模型浏览器

在 Colab 中打开下载

_{最后更新：2025 年 3 月 10 日}

由 Tilde Thurium 编写的笔记本：Mastodon || Twitter || LinkedIn

问题：现在有这么多 LLM！哪种模型最适合我的用例？

此笔记本使用 Haystack 来比较将相同提示发送到多个不同模型的结果。

这是一个非常基础的演示，您只能比较少数支持流式响应的模型。我希望将来支持更多模型，敬请关注更新。

模型

Haystack 的 OpenAIGenerator 和 CohereGenerator 支持开箱即用的流式传输。

其他模型使用 HuggingFaceAPIGenerator。

前提条件

您需要 HuggingFace、Cohere 和 OpenAI API 密钥。将它们保存为 Colab 中的密钥。单击左侧菜单中的密钥图标，或在此处查看详细说明。
要使用 Mistral-7B-v0.1，您还应该在此处接受 Mistral 的条款：https://hugging-face.cn/mistralai/Mistral-7B-v0.1

!pip install -U haystack-ai cohere-haystack "huggingface_hub>=0.22.0"

为了使 userdata.get 生效，这些密钥需要保存为 Colab 中的密钥。单击左侧菜单中的密钥图标，或在此处查看详细说明。

from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_integrations.components.generators.cohere import CohereGenerator
from haystack.components.generators import HuggingFaceAPIGenerator
from haystack.utils import Secret
from google.colab import userdata

open_ai_generator = OpenAIGenerator(api_key=Secret.from_token(userdata.get('OPENAI_API_KEY')))

cohere_generator = CohereGenerator(api_key=Secret.from_token(userdata.get('COHERE_API_KEY')))

hf_generator = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "mistralai/Mistral-7B-Instruct-v0.1"},
    token=Secret.from_token(userdata.get('HF_API_KEY')))


hf_generator_2 = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "tiiuae/falcon-7b-instruct"},
    token=Secret.from_token(userdata.get('HF_API_KEY')))


hf_generator_3 = HuggingFaceAPIGenerator(
    api_type="serverless_inference_api",
    api_params={"model": "bigscience/bloom"},
    token=Secret.from_token(userdata.get('HF_API_KEY')))

tokenizer_config.json:   0%|          | 0.00/967 [00:00<?, ?B/s]



tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]



tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/287 [00:00<?, ?B/s]



tokenizer.json:   0%|          | 0.00/2.73M [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/281 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]



tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]



special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

MODELS = [open_ai_generator, cohere_generator, hf_generator, hf_generator_2, hf_generator_3]

AppendToken 数据类格式化输出，以便打印模型名称，然后文本以 5 个 token 为一组显示。

from dataclasses import dataclass
import ipywidgets as widgets

def output():...

@dataclass
class AppendToken:
  output: widgets.Output
  chunks = []
  chunk_size = 5

  def __call__(self, chunk):
      with self.output:
        text = getattr(chunk, 'content', '')
        self.chunks.append(text)
        if len(self.chunks) == self.chunk_size:
          output_string = ' '.join(self.chunks)
          self.output.append_display_data(output_string)
          self.chunks.clear()

def multiprompt(prompt, models=MODELS):
  outputs = [widgets.Output(layout={'border': '1px solid black'}) for _ in models]
  display(widgets.HBox(children=outputs))

  for i, model in enumerate(models):
    model_name = getattr(model, 'model', '')
    outputs[i].append_display_data(f'Model name: {model_name}')
    model.streaming_callback = AppendToken(outputs[i])
    model.run(prompt)

multiprompt("Tell me a cyberpunk story about a black cat.")

HBox(children=(Output(layout=Layout(border='1px solid black')), Output(layout=Layout(border='1px solid black')…

这是一个非常简陋的示例提示。如果您觉得此演示有用，请告诉我您测试过哪些类型的提示！

Mastodon || Twitter || LinkedIn

感谢您的跟随。