使用 Qdrant 和 FastEmbed 进行稀疏嵌入检索
最后更新:2024 年 9 月 24 日
在本 Notebook 中,我们将了解如何在 Haystack 中使用稀疏嵌入检索技术(例如 SPLADE)。
我们将使用 Qdrant 文档存储和 FastEmbed 稀疏嵌入器。
为什么选择 SPLADE?
- 稀疏关键词检索(基于 BM25 算法或类似算法)简单快速,资源消耗少,但依赖于词汇匹配,难以捕捉语义含义。
- 密集嵌入检索考虑语义,但需要大量的计算资源,通常在新的领域效果不佳,并且不考虑精确的措辞。
虽然通过结合这两种方法(教程)可以取得不错的结果,但 SPLADE(用于信息检索的稀疏词汇和扩展模型)引入了一种新方法,它融合了这两种技术的优点。特别是,SPLADE 使用 BERT 等语言模型来权衡查询中不同术语的相关性,并执行自动术语扩展,从而减少了词汇不匹配问题(查询和相关文档通常缺乏术语重叠)。
主要特点
- 在精确关键词匹配方面优于密集嵌入检索器
- 在语义匹配方面优于 BM25
- 比 BM25 慢
- 与 BM25 和密集嵌入相比仍处于实验阶段:模型较少;仅少数文档存储支持
资源
安装依赖项
!pip install -U fastembed-haystack qdrant-haystack wikipedia transformers
稀疏嵌入检索
索引
创建一个 Qdrant 文档存储
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
document_store = QdrantDocumentStore(
":memory:",
recreate_index=True,
return_embedding=True,
use_sparse_embeddings=True # set this parameter to True, otherwise the collection schema won't allow to store sparse vectors
)
下载维基百科页面并创建原始文档
我们下载几篇关于动物的维基百科页面,并从中创建 Haystack 文档。
nice_animals=["Capybara", "Dolphin"]
import wikipedia
from haystack.dataclasses import Document
raw_docs=[]
for title in nice_animals:
page = wikipedia.page(title=title, auto_suggest=False)
doc = Document(content=page.content, meta={"title": page.title, "url":page.url})
raw_docs.append(doc)
初始化一个 FastembedSparseDocumentEmbedder
FastembedSparseDocumentEmbedder 使用其稀疏嵌入丰富文档列表。
我们正在使用 prithvida/Splade_PP_en_v1,这是一个具有宽松许可的良好稀疏嵌入模型。
我们还希望嵌入文档的标题,因为它包含相关信息。
有关更多自定义选项,请参阅文档。
from haystack_integrations.components.embedders.fastembed import FastembedSparseDocumentEmbedder
sparse_doc_embedder = FastembedSparseDocumentEmbedder(model="prithvida/Splade_PP_en_v1",
meta_fields_to_embed=["title"])
sparse_doc_embedder.warm_up()
# let's try the embedder
print(sparse_doc_embedder.run(documents=[Document(content="An example document")]))
Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]
.gitattributes: 0%| | 0.00/1.52k [00:00<?, ?B/s]
README.md: 0%| | 0.00/133 [00:00<?, ?B/s]
generation_config.json: 0%| | 0.00/90.0 [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/712k [00:00<?, ?B/s]
config.json: 0%| | 0.00/755 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/1.38k [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/695 [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]
model.onnx: 0%| | 0.00/532M [00:00<?, ?B/s]
Calculating sparse embeddings: 100%|██████████| 1/1 [00:00<00:00, 12.05it/s]
{'documents': [Document(id=cd69a8e89f3c179f243c483a337c5ecb178c58373a253e461a64545b669de12d, content: 'An example document', sparse_embedding: vector with 19 non-zero elements)]}
索引管道
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from haystack import Pipeline
indexing = Pipeline()
indexing.add_component("cleaner", DocumentCleaner())
indexing.add_component("splitter", DocumentSplitter(split_by='sentence', split_length=4))
indexing.add_component("sparse_doc_embedder", sparse_doc_embedder)
indexing.add_component("writer", DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE))
indexing.connect("cleaner", "splitter")
indexing.connect("splitter", "sparse_doc_embedder")
indexing.connect("sparse_doc_embedder", "writer")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7f21068632e0>
🚅 Components
- cleaner: DocumentCleaner
- splitter: DocumentSplitter
- sparse_doc_embedder: FastembedSparseDocumentEmbedder
- writer: DocumentWriter
🛤️ Connections
- cleaner.documents -> splitter.documents (List[Document])
- splitter.documents -> sparse_doc_embedder.documents (List[Document])
- sparse_doc_embedder.documents -> writer.documents (List[Document])
让我们索引我们的文档!
⚠️ 如果您正在 Google Colab 上运行此 Notebook,请注意,Google Colab 只提供 2 个 CPU 核心,因此稀疏嵌入的生成速度可能不如在标准机器上快。
indexing.run({"documents":raw_docs})
Calculating sparse embeddings: 100%|██████████| 152/152 [02:29<00:00, 1.02it/s]
200it [00:00, 2418.48it/s]
{'writer': {'documents_written': 152}}
document_store.count_documents()
152
检索
检索管道
现在,我们创建一个简单的检索管道
FastembedSparseTextEmbedder:将查询转换为稀疏嵌入QdrantSparseEmbeddingRetriever:根据稀疏嵌入的相似性查找相关文档
from haystack import Pipeline
from haystack_integrations.components.retrievers.qdrant import QdrantSparseEmbeddingRetriever
from haystack_integrations.components.embedders.fastembed import FastembedSparseTextEmbedder
sparse_text_embedder = FastembedSparseTextEmbedder(model="prithvida/Splade_PP_en_v1")
query_pipeline = Pipeline()
query_pipeline.add_component("sparse_text_embedder", sparse_text_embedder)
query_pipeline.add_component("sparse_retriever", QdrantSparseEmbeddingRetriever(document_store=document_store))
query_pipeline.connect("sparse_text_embedder.sparse_embedding", "sparse_retriever.query_sparse_embedding")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7f21067cf3d0>
🚅 Components
- sparse_text_embedder: FastembedSparseTextEmbedder
- sparse_retriever: QdrantSparseEmbeddingRetriever
🛤️ Connections
- sparse_text_embedder.sparse_embedding -> sparse_retriever.query_sparse_embedding (SparseEmbedding)
尝试检索管道
question = "Where do capybaras live?"
results = query_pipeline.run({"sparse_text_embedder": {"text": question}})
Calculating sparse embeddings: 100%|██████████| 1/1 [00:00<00:00, 9.02it/s]
import rich
for d in results['sparse_retriever']['documents']:
rich.print(f"\nid: {d.id}\n{d.content}\nscore: {d.score}\n---")
id: 6a485709ae51c55b78252571c0808ef17129b32e930ea7d461c12d9afaf40672 Its karyotype has 2n = 66 and FN = 102, meaning it has 66 chromosomes with a total of 102 arms. == Ecology == Capybaras are semiaquatic mammals found throughout all countries of South America except Chile. They live in densely forested areas near bodies of water, such as lakes, rivers, swamps, ponds, and marshes, as well as flooded savannah and along rivers in the tropical rainforest. They are superb swimmers and can hold their breath underwater for up to five minutes at a time. score: 0.5607053126371688 ---
id: fcc9a816e7f2312988dbd20146e4a5c07d8d8b409a373c4c3986d85c26dc0d61 Capybaras have adapted well to urbanization in South America. They can be found in many areas in zoos and parks, and may live for 12 years in captivity, more than double their wild lifespan. Capybaras are docile and usually allow humans to pet and hand-feed them, but physical contact is normally discouraged, as their ticks can be vectors to Rocky Mountain spotted fever. The European Association of Zoos and Aquaria asked Drusillas Park in Alfriston, Sussex, England, to keep the studbook for capybaras, to monitor captive populations in Europe. score: 0.5577329835824506 ---
id: d70f54cc66a83b56210c801ecd49c95bae5fef4ab38989d38b26dc53449b192d In 2011, one specimen was spotted on the Central Coast of California. These escaped populations occur in areas where prehistoric capybaras inhabited; late Pleistocene capybaras inhabited Florida and Hydrochoerus hesperotiganites in California and Hydrochoerus gaylordi in Grenada, and feral capybaras in North America may actually fill the ecological niche of the Pleistocene species. === Diet and predation === Capybaras are herbivores, grazing mainly on grasses and aquatic plants, as well as fruit and tree bark. They are very selective feeders and feed on the leaves of one species and disregard other species surrounding it. score: 0.5567185168202262 ---
id: a1cb26bcd9d053fc8e7a3c8b6716801b37ca37940c6f8b7865d6f6bb50b38f2f The capybara inhabits savannas and dense forests, and lives near bodies of water. It is a highly social species and can be found in groups as large as 100 individuals, but usually live in groups of 10–20 individuals. The capybara is hunted for its meat and hide and also for grease from its thick fatty skin. == Etymology == Its common name is derived from Tupi ka'apiûara, a complex agglutination of kaá (leaf) + píi (slender) + ú (eat) + ara (a suffix for agent nouns), meaning "one who eats slender leaves", or "grass-eater". score: 0.5562936393318461 ---
id: 15755cebd1049a00c4656aaa7cf6c417966b81e482732e1c97288d58a08b53b2 The capybara or greater capybara (Hydrochoerus hydrochaeris) is a giant cavy rodent native to South America. It is the largest living rodent and a member of the genus Hydrochoerus. The only other extant member is the lesser capybara (Hydrochoerus isthmius). Its close relatives include guinea pigs and rock cavies, and it is more distantly related to the agouti, the chinchilla, and the nutria. score: 0.5559830683084014 ---
id: d8bd93eefa0c2feeb162972cd915fe29e3b13c98181a976d4751296c51547c77 Capybara have flourished in cattle ranches. They roam in home ranges averaging 10 hectares (25 acres) in high-density populations. Many escapees from captivity can also be found in similar watery habitats around the world. Sightings are fairly common in Florida, although a breeding population has not yet been confirmed. score: 0.5550636661344844 ---
id: 5429437120fdd0611ba2f14db51a28dfd894cd7221264c805dfe43de6ba95f7e In Japan, following the lead of Izu Shaboten Zoo in 1982, multiple establishments or zoos in Japan that raise capybaras have adopted the practice of having them relax in onsen during the winter. They are seen as an attraction by Japanese people. Capybaras became popular in Japan due to the popular cartoon character Kapibara-san. In August 2021, Argentine and international media reported that capybaras had been causing serious problems for residents of Nordelta, an affluent gated community north of Buenos Aires built atop wetland habitat. score: 0.5535668539880305 ---
id: 4261a2ae42f4edf8885c6e4c483356c994ad7699fd814e3b8f5da115aa5560ea
Alloparenting has been observed in this species. Breeding peaks between April and May in Venezuela and between
October and November in Mato Grosso, Brazil. === Activities ===
Though quite agile on land, capybaras are equally at home in the water. They are excellent swimmers, and can remain
completely submerged for up to five minutes, an ability they use to evade predators.
score: 0.553533523584911
---
id: b819073f1e3b6973dd1d22299a34a882acc348ba45257456cc85c3960fedd9ce
The studbook includes information about all births, deaths and movements of capybaras, as well as how they are
related.
Capybaras are farmed for meat and skins in South America. The meat is considered unsuitable to eat in some areas,
while in other areas it is considered an important source of protein. In parts of South America, especially in
Venezuela, capybara meat is popular during Lent and Holy Week as the Catholic Church previously issued special
dispensation to allow it to be eaten while other meats are generally forbidden.
score: 0.5527757086170884
---
id: a1fc3b907198e9d3e1d95d4c08f0fefae6861ce810d1b32479b50753326bdf58
== Conservation and human interaction ==
Capybaras are not considered a threatened species; their population is stable throughout most of their South
American range, though in some areas hunting has reduced their numbers. Capybaras are hunted for their meat and
pelts in some areas, and otherwise killed by humans who see their grazing as competition for livestock. In some
areas, they are farmed, which has the effect of ensuring the wetland habitats are protected. Their survival is
aided by their ability to breed rapidly.
score: 0.5503703349299455
---
理解 SPLADE 向量
(灵感来源:FastEmbed SPLADE Notebook)
我们已经看到我们的模型将文本编码为稀疏向量(= 包含许多零的向量)。稀疏向量的有效表示方法是保存非零元素的索引和值。
让我们尝试理解这些向量中包含什么信息……
question = "Where do capybaras live?"
sparse_embedding = sparse_text_embedder.run(text=question)["sparse_embedding"]
rich.print(sparse_embedding.to_dict())
Calculating sparse embeddings: 100%|██████████| 1/1 [00:00<00:00, 10.06it/s]
{ 'indices': [ 2015, 2100, 2427, 2444, 2555, 2693, 3224, 3269, 3295, 3562, 4111, 4761, 4982, 5430, 5917, 6178, 6552, 7713, 8843, 9089, 9230, 9277, 10746, 14627, 15267, 20709 ], 'values': [ 0.7443006634712219, 1.2232322692871094, 0.7982208132743835, 1.8504852056503296, 0.031874656677246094, 0.22175012528896332, 0.17087453603744507, 0.03717103973031044, 1.8334054946899414, 0.18768127262592316, 0.03902499005198479, 0.5681754946708679, 0.07937325537204742, 0.30040717124938965, 0.33065155148506165, 2.4437994956970215, 1.7612168788909912, 0.0731465145945549, 0.18527895212173462, 0.33103543519973755, 0.29275140166282654, 0.04728797823190689, 0.04782348498702049, 0.0030497254338115454, 0.6497660875320435, 2.6444451808929443 ] }
from transformers import AutoTokenizer
# we need the tokenizer vocabulary
tokenizer = AutoTokenizer.from_pretrained("Qdrant/Splade_PP_en_v1") # ONNX export of the original model
tokenizer_config.json: 0%| | 0.00/1.38k [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/712k [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/695 [00:00<?, ?B/s]
def get_tokens_and_weights(sparse_embedding, tokenizer):
token_weight_dict = {}
for i in range(len(sparse_embedding.indices)):
token = tokenizer.decode([sparse_embedding.indices[i]])
weight = sparse_embedding.values[i]
token_weight_dict[token] = weight
# Sort the dictionary by weights
token_weight_dict = dict(sorted(token_weight_dict.items(), key=lambda item: item[1], reverse=True))
return token_weight_dict
rich.print(get_tokens_and_weights(sparse_embedding, tokenizer))
{ '##bara': 2.6444451808929443, 'cap': 2.4437994956970215, 'live': 1.8504852056503296, 'location': 1.8334054946899414, 'habitat': 1.7612168788909912, '##y': 1.2232322692871094, 'species': 0.7982208132743835, '##s': 0.7443006634712219, 'predator': 0.6497660875320435, 'origin': 0.5681754946708679, 'nest': 0.33103543519973755, 'tribe': 0.33065155148506165, 'cave': 0.30040717124938965, 'migration': 0.29275140166282654, 'move': 0.22175012528896332, 'genus': 0.18768127262592316, 'breed': 0.18527895212173462, 'forest': 0.17087453603744507, 'grow': 0.07937325537204742, 'shelter': 0.0731465145945549, 'habitats': 0.04782348498702049, 'refuge': 0.04728797823190689, 'animal': 0.03902499005198479, 'plant': 0.03717103973031044, 'region': 0.031874656677246094, 'reproduction': 0.0030497254338115454 }
非常棒!🦫
- token 按相关性排序
- 查询已扩展为相关 token/术语:“location”、“habitat”……
混合检索
理想情况下,SPLADE 等技术旨在取代其他方法(BM25 和密集嵌入检索)及其组合。
然而,有时将密集嵌入检索和稀疏嵌入检索结合起来可能是有意义的。您可以在本文档的附录中找到一些正面示例(混合检索的融合函数分析)。确保这适用于您的用例并进行评估。
下面我们展示如何在 Haystack 中创建此类应用程序。
在此示例中,我们使用 Qdrant 混合检索器:它比较密集和稀疏的查询和文档嵌入,并检索最相关的文档,使用倒数排名融合(Reciprocal Rank Fusion)合并分数。
如果您想进一步自定义行为,请参阅混合检索管道(教程)。
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack.components.preprocessors import DocumentCleaner, DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack_integrations.components.embedders.fastembed import FastembedSparseDocumentEmbedder, FastembedDocumentEmbedder
from haystack.document_stores.types import DuplicatePolicy
from haystack import Pipeline
document_store = QdrantDocumentStore(
":memory:",
recreate_index=True,
return_embedding=True,
use_sparse_embeddings=True,
embedding_dim = 384
)
hybrid_indexing = Pipeline()
hybrid_indexing.add_component("cleaner", DocumentCleaner())
hybrid_indexing.add_component("splitter", DocumentSplitter(split_by='sentence', split_length=4))
hybrid_indexing.add_component("sparse_doc_embedder", FastembedSparseDocumentEmbedder(model="prithvida/Splade_PP_en_v1", meta_fields_to_embed=["title"]))
hybrid_indexing.add_component("dense_doc_embedder", FastembedDocumentEmbedder(model="BAAI/bge-small-en-v1.5", meta_fields_to_embed=["title"]))
hybrid_indexing.add_component("writer", DocumentWriter(document_store=document_store, policy=DuplicatePolicy.OVERWRITE))
hybrid_indexing.connect("cleaner", "splitter")
hybrid_indexing.connect("splitter", "sparse_doc_embedder")
hybrid_indexing.connect("sparse_doc_embedder", "dense_doc_embedder")
hybrid_indexing.connect("dense_doc_embedder", "writer")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7f1fe8292170>
🚅 Components
- cleaner: DocumentCleaner
- splitter: DocumentSplitter
- sparse_doc_embedder: FastembedSparseDocumentEmbedder
- dense_doc_embedder: FastembedDocumentEmbedder
- writer: DocumentWriter
🛤️ Connections
- cleaner.documents -> splitter.documents (List[Document])
- splitter.documents -> sparse_doc_embedder.documents (List[Document])
- sparse_doc_embedder.documents -> dense_doc_embedder.documents (List[Document])
- dense_doc_embedder.documents -> writer.documents (List[Document])
hybrid_indexing.run({"documents":raw_docs})
Fetching 9 files: 0%| | 0/9 [00:00<?, ?it/s]
special_tokens_map.json: 0%| | 0.00/695 [00:00<?, ?B/s]
README.md: 0%| | 0.00/28.0 [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/711k [00:00<?, ?B/s]
.gitattributes: 0%| | 0.00/1.52k [00:00<?, ?B/s]
ort_config.json: 0%| | 0.00/1.27k [00:00<?, ?B/s]
config.json: 0%| | 0.00/706 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/1.24k [00:00<?, ?B/s]
model_optimized.onnx: 0%| | 0.00/66.5M [00:00<?, ?B/s]
vocab.txt: 0%| | 0.00/232k [00:00<?, ?B/s]
Calculating sparse embeddings: 100%|██████████| 152/152 [02:14<00:00, 1.13it/s]
Calculating embeddings: 100%|██████████| 152/152 [00:41<00:00, 3.68it/s]
200it [00:00, 655.45it/s]
{'writer': {'documents_written': 152}}
document_store.filter_documents()[0]
Document(id=5e2d65ac05a8a238b359773c3d855e026aca6e617df8a011964b401d8b242a1e, content: ' Overall, they tend to be dwarfed by other Cetartiodactyls. Several species have female-biased sexua...', meta: {'title': 'Dolphin', 'url': 'https://en.wikipedia.org/wiki/Dolphin', 'source_id': '6584a10fad50d363f203669ff6efc19e7ae2a5a28ca9351f5cceb5ba88f8e847'}, embedding: vector of size 384, sparse_embedding: vector with 129 non-zero elements)
from haystack_integrations.components.retrievers.qdrant import QdrantHybridRetriever
from haystack_integrations.components.embedders.fastembed import FastembedTextEmbedder
hybrid_query = Pipeline()
hybrid_query.add_component("sparse_text_embedder", FastembedSparseTextEmbedder(model="prithvida/Splade_PP_en_v1"))
hybrid_query.add_component("dense_text_embedder", FastembedTextEmbedder(model="BAAI/bge-small-en-v1.5", prefix="Represent this sentence for searching relevant passages: "))
hybrid_query.add_component("retriever", QdrantHybridRetriever(document_store=document_store))
hybrid_query.connect("sparse_text_embedder.sparse_embedding", "retriever.query_sparse_embedding")
hybrid_query.connect("dense_text_embedder.embedding", "retriever.query_embedding")
<haystack.core.pipeline.pipeline.Pipeline object at 0x7f1fe8293190>
🚅 Components
- sparse_text_embedder: FastembedSparseTextEmbedder
- dense_text_embedder: FastembedTextEmbedder
- retriever: QdrantHybridRetriever
🛤️ Connections
- sparse_text_embedder.sparse_embedding -> retriever.query_sparse_embedding (SparseEmbedding)
- dense_text_embedder.embedding -> retriever.query_embedding (List[float])
question = "Where do capybaras live?"
results = hybrid_query.run(
{"dense_text_embedder": {"text": question},
"sparse_text_embedder": {"text": question}}
)
Calculating sparse embeddings: 100%|██████████| 1/1 [00:00<00:00, 9.95it/s]
Calculating embeddings: 100%|██████████| 1/1 [00:00<00:00, 12.05it/s]
import rich
for d in results['retriever']['documents']:
rich.print(f"\nid: {d.id}\n{d.content}\nscore: {d.score}\n---")
id: fcc9a816e7f2312988dbd20146e4a5c07d8d8b409a373c4c3986d85c26dc0d61 Capybaras have adapted well to urbanization in South America. They can be found in many areas in zoos and parks, and may live for 12 years in captivity, more than double their wild lifespan. Capybaras are docile and usually allow humans to pet and hand-feed them, but physical contact is normally discouraged, as their ticks can be vectors to Rocky Mountain spotted fever. The European Association of Zoos and Aquaria asked Drusillas Park in Alfriston, Sussex, England, to keep the studbook for capybaras, to monitor captive populations in Europe. score: 0.6666666666666666 ---
id: 6a485709ae51c55b78252571c0808ef17129b32e930ea7d461c12d9afaf40672 Its karyotype has 2n = 66 and FN = 102, meaning it has 66 chromosomes with a total of 102 arms. == Ecology == Capybaras are semiaquatic mammals found throughout all countries of South America except Chile. They live in densely forested areas near bodies of water, such as lakes, rivers, swamps, ponds, and marshes, as well as flooded savannah and along rivers in the tropical rainforest. They are superb swimmers and can hold their breath underwater for up to five minutes at a time. score: 0.6666666666666666 ---
id: d8bd93eefa0c2feeb162972cd915fe29e3b13c98181a976d4751296c51547c77 Capybara have flourished in cattle ranches. They roam in home ranges averaging 10 hectares (25 acres) in high-density populations. Many escapees from captivity can also be found in similar watery habitats around the world. Sightings are fairly common in Florida, although a breeding population has not yet been confirmed. score: 0.6428571428571428 ---
id: a1cb26bcd9d053fc8e7a3c8b6716801b37ca37940c6f8b7865d6f6bb50b38f2f The capybara inhabits savannas and dense forests, and lives near bodies of water. It is a highly social species and can be found in groups as large as 100 individuals, but usually live in groups of 10–20 individuals. The capybara is hunted for its meat and hide and also for grease from its thick fatty skin. == Etymology == Its common name is derived from Tupi ka'apiûara, a complex agglutination of kaá (leaf) + píi (slender) + ú (eat) + ara (a suffix for agent nouns), meaning "one who eats slender leaves", or "grass-eater". score: 0.45 ---
id: d70f54cc66a83b56210c801ecd49c95bae5fef4ab38989d38b26dc53449b192d In 2011, one specimen was spotted on the Central Coast of California. These escaped populations occur in areas where prehistoric capybaras inhabited; late Pleistocene capybaras inhabited Florida and Hydrochoerus hesperotiganites in California and Hydrochoerus gaylordi in Grenada, and feral capybaras in North America may actually fill the ecological niche of the Pleistocene species. === Diet and predation === Capybaras are herbivores, grazing mainly on grasses and aquatic plants, as well as fruit and tree bark. They are very selective feeders and feed on the leaves of one species and disregard other species surrounding it. score: 0.45 ---
id: 15755cebd1049a00c4656aaa7cf6c417966b81e482732e1c97288d58a08b53b2 The capybara or greater capybara (Hydrochoerus hydrochaeris) is a giant cavy rodent native to South America. It is the largest living rodent and a member of the genus Hydrochoerus. The only other extant member is the lesser capybara (Hydrochoerus isthmius). Its close relatives include guinea pigs and rock cavies, and it is more distantly related to the agouti, the chinchilla, and the nutria. score: 0.30952380952380953 ---
id: 4261a2ae42f4edf8885c6e4c483356c994ad7699fd814e3b8f5da115aa5560ea
Alloparenting has been observed in this species. Breeding peaks between April and May in Venezuela and between
October and November in Mato Grosso, Brazil. === Activities ===
Though quite agile on land, capybaras are equally at home in the water. They are excellent swimmers, and can remain
completely submerged for up to five minutes, an ability they use to evade predators.
score: 0.2361111111111111
---
id: a1fc3b907198e9d3e1d95d4c08f0fefae6861ce810d1b32479b50753326bdf58
== Conservation and human interaction ==
Capybaras are not considered a threatened species; their population is stable throughout most of their South
American range, though in some areas hunting has reduced their numbers. Capybaras are hunted for their meat and
pelts in some areas, and otherwise killed by humans who see their grazing as competition for livestock. In some
areas, they are farmed, which has the effect of ensuring the wetland habitats are protected. Their survival is
aided by their ability to breed rapidly.
score: 0.20202020202020202
---
id: 5429437120fdd0611ba2f14db51a28dfd894cd7221264c805dfe43de6ba95f7e In Japan, following the lead of Izu Shaboten Zoo in 1982, multiple establishments or zoos in Japan that raise capybaras have adopted the practice of having them relax in onsen during the winter. They are seen as an attraction by Japanese people. Capybaras became popular in Japan due to the popular cartoon character Kapibara-san. In August 2021, Argentine and international media reported that capybaras had been causing serious problems for residents of Nordelta, an affluent gated community north of Buenos Aires built atop wetland habitat. score: 0.125 ---
id: a87985b28681d12e9897eae531fb5e93ecd0c702a5419708ddcbc03ae13c0ed0 They can have a lifespan of 8–10 years, but tend to live less than four years in the wild due to predation from big cats like the jaguars and pumas and non-mammalian predators like eagles and the caimans. The capybara is also the preferred prey of the green anaconda. == Social organization == Capybaras are known to be gregarious. While they sometimes live solitarily, they are more commonly found in groups of around 10–20 individuals, with two to four adult males, four to seven adult females, and the remainder juveniles. score: 0.1 ---
📚 Haystack 中稀疏嵌入支持文档
(Notebook 由 Stefano Fiorucci 编写)
