集成：AssemblyAI

在 Haystack 中使用 AssemblyAI 的转录、摘要和说话人分割模型

作者

AssemblyAI

GitHub 仓库 PyPI 包

引言

您可以在 Haystack 管道中使用 AssemblyAI 的转录文本，通过 AssemblyAITranscriber 实现。

通过此集成，您可以执行语音识别、说话人分割和摘要。

更多关于 AssemblyAI 的信息

安装

pip install assemblyai-haystack

使用

AssemblyAITranscriber 允许使用 AssemblyAI API 执行一些语音转文本处理，并将转录的文本加载到文档中。要使用此组件，您应该将您的 ASSEMBLYAI_API_KEY 作为参数传递。

根据传递的参数，转录、摘要和说话人分割的结果将返回在单独的文档列表中

转录
摘要
说话人标签

语音转文本

利用 AssemblyAITranscriber 的强大功能，轻松转录您的音频文件。默认情况下，它会输出一个单独的 Document 对象。但是，为了更精细的内容预处理，您可以使用 DocumentSplitter。

以下示例展示了一个索引管道，该管道集成了 AssemblyAITranscriber、DocumentSplitter 和 SentenceTransformersDocumentEmbedder，以预处理音频内容并将其高效地存储在 InMemoryDocumentStore 中，并带有密集嵌入。

from haystack.components.writers import DocumentWriter
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack import Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from assemblyai_haystack.transcriber import AssemblyAITranscriber

document_store = InMemoryDocumentStore()
transcriber = AssemblyAITranscriber(api_key=assemblyai_api_key)
document_splitter = DocumentSplitter(
    split_by = "word",
    split_length = 150,
    split_overlap = 50
)
document_writer = DocumentWriter(document_store)
document_embedder = SentenceTransformersDocumentEmbedder()

preprocessing_pipeline = Pipeline()
preprocessing_pipeline.add_component(instance=transcriber, name="transcriber")
preprocessing_pipeline.add_component(instance=document_splitter, name="document_splitter")
preprocessing_pipeline.add_component(instance=document_embedder, name="document_embedder")
preprocessing_pipeline.add_component(instance=document_writer, name="document_writer")

preprocessing_pipeline.connect("transcriber.transcription", "document_splitter")
preprocessing_pipeline.connect("document_splitter", "document_embedder")
preprocessing_pipeline.connect("document_embedder", "document_writer")

file_path = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
preprocessing_pipeline.run(
    {
        "transcriber": { "file_path": file_path}
    }
)

预期的输出应表明已将 9 个文档写入文档存储

{'document_writer': {'documents_written': 9}}

注意：调用 preprocessing_pipeline.run() 会阻塞，直到转录完成。

转录文档的元数据包含转录 ID 和上传音频文件的 URL。

# {'transcript_id': '	73089e32-...-4ae9-97a4-eca7fe20a8b1',
#  'audio_url': 'https://storage.googleapis.com/aai-docs-samples/nbc.mp3',
# }

总结

您可以通过设置 "summarization": True 来使用 AssemblyAITranscriber 进行摘要。激活后，AssemblyAITranscriber 将同时提供 transcription 对象和 summarization 输出。

下面的示例说明了一个生成式 QA 管道，该管道无缝集成了 AssemblyAITranscriber 和 OpenAIGenerator。此管道根据给定的问题和摘要的转录内容生成答案。

from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from assemblyai_haystack.transcriber import AssemblyAITranscriber

template = """
Given the following information, answer the question.

Context: 
{{summary[0].content}}

Question: {{ question }}
"""
summary_qa = Pipeline()
summary_qa.add_component("transcriber", AssemblyAITranscriber(api_key=assemblyai_api_key))
summary_qa.add_component("prompt_builder", PromptBuilder(template=template))
summary_qa.add_component("llm", OpenAIGenerator(api_key=Secret.from_token("YOUR_OPENAI_API_KEY"), model="gpt-3.5-turbo"))
summary_qa.connect("transcriber.summarization", "prompt_builder.summary")
summary_qa.connect("prompt_builder", "llm")

question="What are the air quality warnings?"
summary_qa.run({
    "transcriber": {"summarization": True, "file_path": "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"},
    "prompt_builder": {"question": question},
})

说话人分割

通过在 Cartridge 集成中使用 "speaker_labels": True 参数，可以轻松地进行说话人分割。此设置可确保 Cartridge 输出一个 Document 对象，其中包含一个发言列表。每个发言代表来自特定说话人的不间断语音段，并且相关的说话人信息保留在文档的 meta 字段中。

探索下面的示例，了解如何索引说话人分割信息并运行带有过滤器的查询管道，使您能够检索说话人 A 的语音文本。

from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack import Pipeline
from haystack.utils import Secret
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.builders.prompt_builder import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from assemblyai_haystack.transcriber import AssemblyAITranscriber

## Write utterances into InMemoryDocumentStore
document_store = InMemoryDocumentStore()
file_path = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"
transcriber = AssemblyAITranscriber(api_key=assemblyai_api_key)
result = transcriber.run(file_path=file_path, speaker_labels=True)
document_store.write_documents(result["speaker_labels"])

## Build a generative QA pipeline
template = """
Answer the question, based on the content in the documents. If you can't answer based on the documents, say so.
Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}
Question: {{ question }}
"""
pipe = Pipeline()
pipe.add_component("retriever", InMemoryBM25Retriever(document_store=document_store, top_k=3))
pipe.add_component("prompt_builder", PromptBuilder(template=template))
pipe.add_component("llm", OpenAIGenerator(api_key=Secret.from_token("YOUR_OPENAI_API_KEY"), model="gpt-3.5-turbo"))

pipe.connect("retriever", "prompt_builder.documents")
pipe.connect("prompt_builder", "llm")

## Run the pipeline and only include the speech text from speaker A
question = "Who is more affected by wildfires?"
pipe.run({    
    "prompt_builder": {"question": question},
    "retriever": {
        "query": question,
        "filters": { 
            "operator": "AND",
            "conditions": [{"field": "meta.speaker", "operator": "==", "value": "A"}]
            }
        }})

由于此过滤仅返回说话人 A 的文本，因此找不到任何相关结果。运行相同的管道以获取说话人 B 的信息以获得结果。

{'llm': {'replies': ['The documents do not provide explicit information on who is more affected by wildfires.'],
  'meta': [{'model': 'gpt-3.5-turbo-0613',
    'index': 0,
    'finish_reason': 'stop',
    'usage': {'completion_tokens': 15,
     'prompt_tokens': 177,
     'total_tokens': 192}}]}}

集成：AssemblyAI

目录

引言

安装

使用

语音转文本

总结

说话人分割