I m trying to graph the linking structure of a web site so I can model how pages on a given domain link to each other. Note I m not graphing links to sites not on the root domain.
Obviously this graph could be considerable in size. One of the main queries I want to perform is to count how many pages directly link into a given url. I want to run this against the whole graph (shudder) such that I end up with a list of urls and the count of incoming links to that url.
I know one popular way of doing this would be via some kind of map reduce - and I may still end up going that way - however I have a requirement to be able to view this report in (near) realtime which isn t generally map reduce friendly.
I ve had a quick look at neo4j and OrientDb. While both of these could model the relationship I want it s not clear if I could query them to generate the report I want. At this point I m not committed to any particularly technology.
Any help would be greatly appreciated. Thanks, Paul