📘 **TELUS Agriculture & Consumer Goods** 如何通过 **Haystack Agents** 转变促销交易

Cohere v3 用于多语言 QA


笔记本作者:Bilge Yucel

使用 Cohere 和 Haystack 进行多语言生成式问答

在本笔记本中,我们将深入探讨多语言检索和多语言生成的细节,并演示如何构建一个检索增强生成 (RAG) 管道,使用 Cohere 模型和 Haystack 从多语言酒店评论中生成答案。 🏡

Haystack 实用资源

对于 Haystack 1.x 版本,请查看 文章:使用 Haystack 和 Cohere 进行多语言生成式问答

安装

让我们从安装 Haystack 的 Cohere 集成开始。

!pip install cohere-haystack

存储多语言嵌入

要创建酒店评论的问答系统,我们需要做的第一件事是拥有一个文档存储。我们将使用一个 InMemoryDocumentStore 来保存酒店评论及其嵌入。

from haystack.document_stores.in_memory import InMemoryDocumentStore

document_store = InMemoryDocumentStore()

获取 Cohere API 密钥

注册后,您可以免费获取 Cohere API 密钥来开始使用 Cohere 模型。

from getpass import getpass

COHERE_API_KEY = getpass("Enter Cohere API key:")
Enter Cohere API key:··········

创建索引管道

让我们创建一个索引管道,将不同语言的酒店评论写入我们的文档存储。为此,我们将使用 DocumentSplitter 来分割长评论,并使用 embed-multilingual-v3.0 模型和 CohereDocumentEmbedder 为每个文档创建多语言嵌入。

from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.writers import DocumentWriter
from haystack.components.preprocessors import DocumentSplitter
from haystack_integrations.components.embedders.cohere import CohereDocumentEmbedder
from haystack.utils import Secret

documents = [Document(content="O ar condicionado de um dos quartos deu problema, mas levaram um ventilador para ser utilizado. Também por ser em uma área bem movimentada, o barulho da rua pode ser ouvido. Porém, eles deixam protetores auriculares para o uso. Também senti falta de um espelho de corpo inteiro no apartamento. Só havia o do banheiro que mostra apenas a parte superior do corpo."),
             Document(content="Durchgängig Lärm, weil direkt an der Partymeile; schmutziges Geschirr; unvollständige Küchenausstattung; Abzugshaube über Herd ging für zwei Stunden automatisch an und lies sich nicht abstellen; Reaktionen auf Anfragen entweder gar nicht oder unfreundlich"),
             Document(content="Das Personal ist sehr zuvorkommend! Über WhatsApp war man im guten Kontakt und konnte alles erfragen. Auch das Angebot des Shuttleservices war super und würde ich empfehlen - sehr unkompliziert! Unser Flug hatte Verspätung und der Shuttle hat auf uns gewartet. Die Lage zur Innenstadt ist sehr gut,jedoch ist die Fensterfront direkt zur Club-Straße deshalb war es nachts bis drei/vier Uhr immer recht laut. Die Kaffeemaschine oder auch die Couch hätten sauberer sein können. Ansonsten war das Appartement aber völlig ok."),
             Document(content="Super appartement. Juste au dessus de plusieurs bars qui ferment très tard. A savoir à l'avance. (Bouchons d'oreilles fournis !)"),
             Document(content="Zapach moczu przy wejściu do budynku, może warto zainstalować tam mocne światło na czujnik ruchu, dla gości to korzystne a dla kogoś kto chciałby zrobić tam coś innego niekorzystne :-). Świetne lokalizacje w centrum niestety są na to narażane."),
             Document(content="El apartamento estaba genial y muy céntrico, todo a mano. Al lado de la librería Lello y De la Torre de los clérigos. Está situado en una zona de marcha, así que si vais en fin de semana , habrá ruido, aunque a nosotros no nos molestaba para dormir"),
             Document(content="The keypad with a code is convenient and the location is convenient. Basically everything else, very noisy, wi-fi didn't work, check-in person didn't explain anything about facilities, shower head was broken, there's no cleaning and everything else one may need is charged."),
             Document(content="It is very central and appartement has a nice appearance (even though a lot IKEA stuff), *W A R N I N G** the appartement presents itself as a elegant and as a place to relax, very wrong place to relax - you cannot sleep in this appartement, even the beds are vibrating from the bass of the clubs in the same building - you get ear plugs from the hotel -> now I understand why -> I missed a trip as it was so loud and I could not hear the alarm next day due to the ear plugs.- there is a green light indicating 'emergency exit' just above the bed, which shines very bright at night - during the arrival process, you felt the urge of the agent to leave as soon as possible. - try to go to 'RVA clerigos appartements' -> same price, super quiet, beautiful, city center and very nice staff (not an agency)- you are basically sleeping next to the fridge, which makes a lot of noise, when the compressor is running -> had to switch it off - but then had no cool food and drinks. - the bed was somehow broken down - the wooden part behind the bed was almost falling appart and some hooks were broken before- when the neighbour room is cooking you hear the fan very loud. I initially thought that I somehow activated the kitchen fan"),
             Document(content="Un peu salé surtout le sol. Manque de service et de souplesse"),
             Document(content="De comfort zo centraal voor die prijs."),
             Document(content="Die Lage war sehr Zentral und man konnte alles sehenswertes zu Fuß erreichen. Wer am Wochenende nachts schlafen möchte, sollte diese Unterkunft auf keinen Fall nehmen. Party direkt vor der Tür so das man denkt, man schläft mitten drin. Sehr Sehr laut also und das bis früh 5 Uhr. Ab 7 kommt dann die Straßenreinigung die keineswegs leiser ist."),
             Document(content="Ótima escolha! Apartamento confortável e limpo! O RoofTop é otimo para beber um vinho! O apartamento é localizado entre duas ruas de movimento noturno. Porem as janelas, blindam 90% do barulho. Não nos incomodou"),
             Document(content="Nous avons passé un séjour formidable. Merci aux personnes , le bonjours à Ricardo notre taxi man, très sympathique. Je pense refaire un séjour parmi vous, après le confinement, tout était parfait, surtout leur gentillesse, aucune chaude négative. Je n'ai rien à redire de négative, Ils étaient a notre écoute, un gentil message tout les matins, pour nous demander si nous avions besoins de renseignement et savoir si tout allait bien pendant notre séjour."),
             Document(content="Boa localização. Bom pequeno almoço. A tv não se encontrava funcional."),
             Document(content="Céntrico. Muy cómodo para moverse y ver Oporto. Edificio con terraza propia en la última planta. Todo reformado y nuevo. Te traen un estupendo desayuno todas las mañanas al apartamento. Solo que se puede escuchar algo de ruido de la calle a primeras horas de la noche. Es un zona de ocio nocturno. Pero respetan los horarios.")
]

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("splitter", DocumentSplitter(split_by="word", split_length=200))
indexing_pipeline.add_component("embedder", CohereDocumentEmbedder(api_key=Secret.from_token(COHERE_API_KEY), model="embed-multilingual-v3.0"))
indexing_pipeline.add_component("writer", DocumentWriter(document_store=document_store))
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")

indexing_pipeline.run({"splitter": {"documents": documents}})
Calculating embeddings: 100%|██████████| 1/1 [00:00<00:00,  3.29it/s]





{'embedder': {'meta': {'api_version': {'version': '1'},
   'billed_units': {'input_tokens': 1137}}},
 'writer': {'documents_written': 16}}

构建 RAG 管道

现在我们已经在文档存储中索引了多语言嵌入,我们将创建一个用户互动最多的管道:检索增强生成 (RAG) 管道。

RAG 管道由两部分组成:文档检索和答案生成。

多语言文档检索

在 RAG 管道的文档检索步骤中,CohereTextEmbedder 在多语言向量空间中为查询创建一个嵌入,而 InMemoryEmbeddingRetriever 从文档存储中检索与查询最相似的top_k 个文档。在我们的例子中,检索到的文档将是酒店评论。

多语言答案生成

在 RAG 管道的生成步骤中,我们将使用 Cohere 的 command 模型和 CohereGenerator,根据检索到的文档生成答案。

让我们创建一个用于酒店评论的提示模板。在此模板中,我们将有两个提示变量:{{documents}}{{question}}。这些变量稍后将与用户问题以及从检索器输出的多语言酒店评论一起填充。

from haystack import Pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack_integrations.components.embedders.cohere import CohereTextEmbedder
from haystack_integrations.components.generators.cohere import CohereGenerator

template = """
You will be provided with reviews in multiple languages for an accommodation.
Create a concise and informative answer for a given question based solely on the given reviews.

\nReviews:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

\nQuestion: {{question}};
\nAnswer:
"""
rag_pipe = Pipeline()
rag_pipe.add_component("embedder", CohereTextEmbedder(api_key=Secret.from_token(COHERE_API_KEY), model="embed-multilingual-v3.0"))
rag_pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=document_store, top_k=3))
rag_pipe.add_component("prompt_builder", PromptBuilder(template=template))
rag_pipe.add_component("llm", CohereGenerator(api_key=Secret.from_token(COHERE_API_KEY), model="command"))
rag_pipe.connect("embedder.embedding", "retriever.query_embedding")
rag_pipe.connect("retriever", "prompt_builder.documents")
rag_pipe.connect("prompt_builder", "llm")

提问

通过您的问题了解这家酒店是否适合入住!

question = "Is this place too noisy to sleep?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 The general consensus from the reviews is that this accommodation is very loud and not ideal for sleeping. The first review warns of loud club music which vibrates the beds, while the second review describes loud nightlife noise and loud cleaning practices in the early morning. 

Based on this evidence, it is fair to conclude that this accommodation is too noisy for guests to get adequate rest. 

您也可以尝试其他问题 👇

question = "What are the problems about this place?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 Some of the main issues with this accommodation include loud vibrations and poor sound insulation, likely due to the proximity of clubs in the same building. Also, the WiFi did not work and the equipment in the apartment, including the shower head and bed, was broken and aged. Finally, the staff who checked them in offered no guidance or support and charged additionally for any needed amenities. 
question = "What is good about this place?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 The reviews highlight the following positive aspects of the accommodation: 

- Great location 
- Good breakfast 
- Friendly staff and taxi driver (Ricardo)
- Gentleness and care of the staff, with a kind message each morning to ensure all was well. 

It seems like the guests greatly appreciated the staff and service of the accommodation, and recommend it, wishing to return again in the future. 
question = "Should I stay at this hotel?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 The reviews for this hotel are mixed. Some guests had an enjoyable experience at this hotel, highlighting the convenient location, breakfast, and friendly staff. However, it's important to note that there are also more critical reviews that point out some lacking amenities and an unclean atmosphere. 

If you are looking for a hotel with a more consistent reputation, it might be worth considering other options in the area. Ultimately, it is up to you to decide whether this hotel's potential strengths match your preferences and whether the reported issues are deal-breakers for your stay. 
question = "How is the wifi?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 Based on the reviews provided, the wifi is functional but may not work sometimes. There also seems to be an issue with the television set and the cleanliness of the room. 
question = "Are there pubs near by?"
result = rag_pipe.run({
    "embedder": {"text": question},
    "prompt_builder": {"question": question}
})

print(result["llm"]["replies"][0])
 Yes, the apartment is located above multiple bars. However the noise levels at night are mentioned as "ferment(ing) très tard", so you will want to consider ahead of time whether this might be an issue for you.