English 中文(简体)
Graph Database to Count Direct Relations
原标题:

I m trying to graph the linking structure of a web site so I can model how pages on a given domain link to each other. Note I m not graphing links to sites not on the root domain.

Obviously this graph could be considerable in size. One of the main queries I want to perform is to count how many pages directly link into a given url. I want to run this against the whole graph (shudder) such that I end up with a list of urls and the count of incoming links to that url.

I know one popular way of doing this would be via some kind of map reduce - and I may still end up going that way - however I have a requirement to be able to view this report in (near) realtime which isn t generally map reduce friendly.

I ve had a quick look at neo4j and OrientDb. While both of these could model the relationship I want it s not clear if I could query them to generate the report I want. At this point I m not committed to any particularly technology.

Any help would be greatly appreciated. Thanks, Paul

最佳回答

both OrientDB and Neo4J supports Blueprints as common API to make graph operations like traversal, counting, etc.

If I ve understood well your use case your graph seems pretty simple: you have a "URL" Vertex that links each other with one type of Edge "Links".

To execute operation against graphs take a look at Gremlin.

问题回答

You might have a look at structr. It is a open source CMS running on top of Neo4j and exactly has those types of inter-page links.

For getting the number of links pointing to the page you just have to iterate the incoming LINKS_TO links for the current page-node.

What is the use-case for your query ? A popular page list? So it would just contain the top-n pages? You might then try to just start at random places of the graph traverse incoming LINKS_TO relationships to your current node(s) in parallel and put them into a sorting structure, so you always start/continue with the first 20 or so top page-nodes that already have the highest number of incoming links (until they re finished).

Marko Rodriguez has some similar "page-rank" examples in the Gremlin documentation. He s also got several blog posts where he talks about this.

Well with Neo4J you won t be able to split the graph across servers to distribute the load. you could replicate the database to distribute the computation, but then updating will be slow (as you have to replicate the updates). I would attack the problem by updating a count of inbound links to each node as new relationships are added as a property of the node. Neo4J has excellent write performance. Of course you don t need to persist this information because direct relationships are cheap to retrieve (you don t get a collection of all related nodes just an iterator).

You should also take a look at a highly scalable graph database product, such as InfiniteGraph. If you email their technical support I think they will be able to point you at some sample code that does a large part of what you ve described here.





相关问题
what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...

热门标签