📘 **TELUS Agriculture & Consumer Goods** 如何通过 **Haystack Agents** 转变促销交易
由 deepset 维护

集成:FastEmbed

使用 FastEmbed 嵌入模型

作者
deepset
Nicola Procopio

目录

概述

FastEmbed 是一个轻量级、快速的 Python 库,专为嵌入生成和文档排序而构建。

  • 轻巧快速:量化模型权重;通过 Optimum 使用 ONNX Runtime 进行推理。
  • 高性能嵌入模型:支持的模型列表 - 包括多语言模型。
  • 支持稀疏嵌入模型。
  • 与 Qdrant 文档存储和检索器良好集成。

安装

pip install fastembed-haystack

使用

组件

fastembed-haystack 集成提供了以下组件

  • 嵌入器
    • FastembedTextEmbedder:为文本创建密集嵌入(用于查询/RAG 管道)。
    • FastembedDocumentEmbedder:使用密集嵌入丰富文档(用于索引管道)。
    • FastembedSparseTextEmbedder:为文本创建稀疏嵌入(用于查询/RAG 管道)。
    • FastembedSparseDocumentEmbedder:使用稀疏嵌入丰富文档(用于索引管道)。
  • Ranker
    • FastembedRanker:根据查询对文档进行排序(用于检索后的查询/RAG 管道)。

密集嵌入示例

from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder, FastembedTextEmbedder

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

documents = [
    Document(content="My name is Wolfgang and I live in Berlin"),
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
    Document(content="fastembed is supported by and maintained by Qdrant."),
]

document_embedder = FastembedDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", FastembedTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query = "Who supports fastembed?"

result = query_pipeline.run({"text_embedder": {"text": query}})

有关更详细的示例,请参阅此笔记本

稀疏嵌入示例

目前,稀疏嵌入检索仅受 QdrantDocumentStore 支持。您可以按如下方式安装该包

pip install qdrant-haystack
from haystack import Document, Pipeline
from haystack_integrations.components.retrievers.qdrant import QdrantSparseEmbeddingRetriever
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder, FastembedTextEmbedder

document_store = QdrantDocumentStore(
    ":memory:",
    recreate_index=True,
    use_sparse_embeddings=True
)

documents = [
    Document(content="My name is Wolfgang and I live in Berlin"),
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
    Document(content="fastembed is supported by and maintained by Qdrant."),
]

sparse_document_embedder = FastembedSparseDocumentEmbedder()
sparse_document_embedder.warm_up()
documents_with_sparse_embeddings = sparse_document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_sparse_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("sparse_text_embedder", FastembedSparseTextEmbedder())
query_pipeline.add_component("sparse_retriever", QdrantSparseEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("sparse_text_embedder.sparse_embedding", "retriever.query_sparse_embedding")

query = "Who supports fastembed?"

result = query_pipeline.run({"sparse_text_embedder": {"text": query}})

有关更详细的示例,请参阅此笔记本

Ranker 示例

from haystack import Document, Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack_integrations.components.embedders.fastembed import FastembedDocumentEmbedder, FastembedTextEmbedder
from haystack_integrations.components.rankers.fastembed import FastembedRanker

document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")

query = "Who supports fastembed?"

documents = [
    Document(content="My name is Wolfgang and I live in Berlin"),
    Document(content="I saw a black horse running"),
    Document(content="Germany has many big cities"),
    Document(content="fastembed is supported by and maintained by Qdrant."),
]

document_embedder = FastembedDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", FastembedTextEmbedder())
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store))
query_pipeline.add_component("ranker", FastembedRanker(top_k=2))
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "ranker.documents")


result = query_pipeline.run({"text_embedder": {"text": query}, "ranker": { "query" : query }})

许可证

fastembed-haystack Apache-2.0 许可证的条款下分发。