使用 FastEmbed 进行嵌入生成的 RAG 流水线
最后更新:2025 年 7 月 8 日
FastEmbed 是一个由 Qdrant 维护的、轻量级、快速的 Python 库,专为嵌入生成而设计。它适用于在仅 CPU 的机器上高效快速地生成嵌入。
在本笔记本中,我们将使用 FastEmbed-Haystack 集成来生成用于索引和 RAG 的嵌入。
Haystack 实用资源
安装依赖项
!pip install fastembed-haystack qdrant-haystack wikipedia transformers
下载内容并创建文档
favourite_bands="""Audioslave
Green Day
Muse (band)
Foo Fighters (band)
Nirvana (band)""".split("\n")
import wikipedia
from haystack.dataclasses import Document
raw_docs=[]
for title in favourite_bands:
page = wikipedia.page(title=title, auto_suggest=False)
doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
raw_docs.append(doc)
在 Qdrant 上清理、拆分和索引文档
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder
from haystack.document_stores.types import DuplicatePolicy
document_store = QdrantDocumentStore(
":memory:",
embedding_dim =384,
recreate_index=True,
return_embedding=True,
wait_result_from_api=True,
)
cleaner = DocumentCleaner()
splitter = DocumentSplitter(split_by='period', split_length=3)
splitted_docs = splitter.run(cleaner.run(raw_docs)["documents"])
len(splitted_docs["documents"])
493
FastEmbed 文档嵌入器
在这里,我们初始化 FastEmbed 文档嵌入器并使用它来生成文档嵌入。我们使用一个小型且效果良好的模型 `BAAI/bge-small-en-v1.5`,并将 `parallel` 参数指定为 0,以便使用所有可用的 CPU 核心来生成嵌入。
⚠️ 如果您在 Google Colab 上运行此笔记本,请注意 Google Colab 只提供 2 个 CPU 核心,因此嵌入生成可能不像在标准机器上那样快。
有关 FastEmbed-Haystack 集成的更多信息,请参阅文档和API 参考。
document_embedder = FastembedDocumentEmbedder(model="BAAI/bge-small-en-v1.5", parallel = 0, meta_fields_to_embed=["title"])
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(splitted_docs["documents"])
Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 148034.26it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 32458.07it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 223365.30it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 55758.84it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 81884.46it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 140853.49it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 105443.40it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 112014.05it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 76260.07it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 123766.35it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 63443.25it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 55431.33it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 82782.32it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 57368.90it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 9792.15it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 8983.52it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 10585.74it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 59634.65it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 46260.71it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 36900.04it/s]
Calculating embeddings: 100%|██████████| 493/493 [00:35<00:00, 13.73it/s]
document_store.write_documents(documents_with_embeddings.get("documents"), policy=DuplicatePolicy.OVERWRITE)
500it [00:00, 4262.26it/s]
493
使用 Qwen 2.5 7B 的 RAG 管道
from haystack import Pipeline
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder
from haystack.components.generators.chat import HuggingFaceAPIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
from pprint import pprint
# Enter your Hugging Face Token
# this is needed to use Zephyr, calling the free Hugging Face Inference API
from getpass import getpass
import os
os.environ["HF_API_TOKEN"] = getpass("Enter your Hugging Face Token: https://hugging-face.cn/settings/tokens ")
generator = HuggingFaceAPIChatGenerator(api_type="serverless_inference_api",
api_params={"model": "Qwen/Qwen2.5-7B-Instruct",
"provider": "together"},
generation_kwargs={"max_tokens":500})
# define the template
template = [ChatMessage.from_user("""
Using only the information contained in these documents return a brief answer (max 50 words).
If the answer cannot be inferred from the documents, respond \"I don't know\".
Documents:
{% for doc in documents %}
{{ doc.content }}
{% endfor %}
Question: {{question}}
Answer:
""")]
query_pipeline = Pipeline()
# FastembedTextEmbedder is used to embed the query
query_pipeline.add_component("text_embedder", FastembedTextEmbedder(model="BAAI/bge-small-en-v1.5", parallel = 0, prefix="query:"))
query_pipeline.add_component("retriever", QdrantEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("prompt_builder", ChatPromptBuilder(template=template))
query_pipeline.add_component("generator", generator)
# connect the components
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt_builder.documents")
query_pipeline.connect("prompt_builder", "generator")
尝试此管道
question = "Who is Dave Grohl?"
results = query_pipeline.run(
{ "text_embedder": {"text": question},
"prompt_builder": {"question": question},
}
)
Calculating embeddings: 100%|██████████| 1/1 [00:00<00:00, 24.62it/s]
for d in results['generator']['replies']:
pprint(d.text)
(' Dave Grohl is the founder and lead vocalist of the American rock band Foo '
'Fighters, which he formed in 1994 after the breakup of Nirvana, in which he '
'was the drummer.')
