这几乎是卡桑德拉理想的使用案例。
将URLs列入其中关键词的索引,与Casses最初在Facebook上设计的内容非常相似:在方框搜索中。 采用一种广泛的行文形式,如果行文钥匙是关键词,每个栏目是URL,则将非常有助于绘制URLs的关键词。 为了将URL改写为关键词,将URL作为逐个关键词和一栏。
To track first-order relationships between keywords, you can use one row per keyword, and each column in the row can be another keyword that was found at the same URL. If you want to store more information, such as the number of times the two keywords appeared together, use one of Cassandra s built-in distributed counters for each column value. They are designed to handle a high volume of increments as well as make it possible to have millions of active, distinct counters.
如同这一数据集一样,这种数据集可能变得非常大。 如果是的话,你就应当认真考虑将Casses代替MongoDB。 Mongo根本不处理超出记忆的数据集(因为Mongo依赖微粒),而Casses的设计则着重强调高效撰写和阅读超过记忆数据集。