English 中文(简体)
B. 绘制关键词关系图所需要的非行技术和方法建议[封闭]
原标题:Need DB tech and methodology suggestion on mapping keyword relationships [closed]

I have to spider over 2 million URLs and harvest their meta keywords. I then need to store each unique keyword and via the DB, keep track of which keyword appears with / is linked to other keywords.

我在努力应对行将使用的东西——标准RDBM似乎为这项任务而密集——我的格言告诉我,莫戈亚解决办法可能是行走的道路......但我对它来说是很新的。

Open to all experienced suggestions.

最佳回答

这几乎是卡桑德拉理想的使用案例。

将URLs列入其中关键词的索引,与Casses最初在Facebook上设计的内容非常相似:在方框搜索中。 采用一种广泛的行文形式,如果行文钥匙是关键词,每个栏目是URL,则将非常有助于绘制URLs的关键词。 为了将URL改写为关键词,将URL作为逐个关键词和一栏。

To track first-order relationships between keywords, you can use one row per keyword, and each column in the row can be another keyword that was found at the same URL. If you want to store more information, such as the number of times the two keywords appeared together, use one of Cassandra s built-in distributed counters for each column value. They are designed to handle a high volume of increments as well as make it possible to have millions of active, distinct counters.

如同这一数据集一样,这种数据集可能变得非常大。 如果是的话,你就应当认真考虑将Casses代替MongoDB。 Mongo根本不处理超出记忆的数据集(因为Mongo依赖微粒),而Casses的设计则着重强调高效撰写和阅读超过记忆数据集。

问题回答

这可以在蒙戈邦很好地发挥作用。 DB. 您可为每个欧洲语言中心编写一份文件。 该文件中有一阵标,列出了所用的关键词。 这一阵列的索引使你能够迅速找到提及任何具体关键词的URL。

用地图总结:使用一个图象,显示每种URL所使用的关键词(以斜体表示)的每2克(或克),然后减少,以计算独特的组合。 假设是按频率进行新的收集和分类。





相关问题
SQL SubQuery getting particular column

I noticed that there were some threads with similar questions, and I did look through them but did not really get a convincing answer. Here s my question: The subquery below returns a Table with 3 ...

please can anyone check this while loop and if condition

<?php $con=mysql_connect("localhost","mts","mts"); if(!con) { die( unable to connect . mysql_error()); } mysql_select_db("mts",$con); /* date_default_timezone_set ("Asia/Calcutta"); $date = ...

php return a specific row from query

Is it possible in php to return a specific row of data from a mysql query? None of the fetch statements that I ve found return a 2 dimensional array to access specific rows. I want to be able to ...

Character Encodings in PHP and MySQL

Our website was developed with a meta tag set to... <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" /> This works fine for M-dashes and special quotes, etc. However, I ...

Pagination Strategies for Complex (slow) Datasets

What are some of the strategies being used for pagination of data sets that involve complex queries? count(*) takes ~1.5 sec so we don t want to hit the DB for every page view. Currently there are ~...

Averaging a total in mySQL

My table looks like person_id | car_id | miles ------------------------------ 1 | 1 | 100 1 | 2 | 200 2 | 3 | 1000 2 | 4 | 500 I need to ...

热门标签