English 中文(简体)
Any scalable OLAP database (web app scale)?
原标题:

I have an application that requires analytics for different level of aggregation, and that s the OLAP workload. I want to update my database pretty frequently as well.

e.g., here is what my update looks like (schema looks like: time, dest, source ip, browser -> visits)

(15:00-1-2-2010, www.stackoverflow.com, 128.19.1.1, safari) -->  105

(15:00-1-2-2010, www.stackoverflow.com, 128.19.2.1, firefox) --> 110

...

(15:00-1-5-2010, www.cnn.com, 128.19.5.1, firefox) --> 110

And then I want to ask what is the total visit to www.stackoverflow.com from a firefox browser last month.

I understand Vertica system can do this in a relatively cheap way (performance and scalability wise, but not cost-wise probably). I have two questions here.

1) Is there an open-source product that I can build upon to solve this problem? In particular, how well does a Mondrian system work? (scalability, and performance) 2) Is there an HBase or Hypertable base solution (obviously, a naked HBase/Hypertable can t do this) for this? -- but if there is a project based on HBase/Hypertable, scalability probably won t be an issue IMO)?

Thanks!

问题回答

You can download a free edition (the single node edition) of the greenplum database. I haven t tried it myself but I think/guess it is a powerful beast. Read here: http://www.dbms2.com/2009/10/19/greenplum-free-single-node-edition/

Another option is MongoDB, it is fast and free and you can write MapReduce functions with JavaScript to do analytics.

My reputation here is to low to add a hyperlink to mongodb, so you have to google . I can add only one hyper link per post.

The zohmg project aims to solve this problem using Hadoop and HBase.

Facebook also built Hive on-top of Hadoop. Pretty simple to get going - reasonable query API too.

http://mirror.facebook.net/facebook/hive/

Is your data model more complex than that? If it isn t you might be beter of just writing custom code for it. Then you can really tune it to your data. Real products have to offer a lot of flexibility, need a lot of complexiy to achieve that, and suffer in speed as a result.

Your question is not clear in one aspect: when you talk about scalable, what do you mean by that? Are you collecting data from lots of sites but only have a limited amount of query users, or do you also have a lot of users? That situation leads to a significantly different model.





相关问题
what is wrong with this mysql code

$db_user="root"; $db_host="localhost"; $db_password="root"; $db_name = "fayer"; $conn = mysqli_connect($db_host,$db_user,$db_password,$db_name) or die ("couldn t connect to server"); // perform query ...

Users asking for denormalized database

I am in the early stages of developing a database-driven system and the largest part of the system revolves around an inheritance type of relationship. There is a parent entity with about 10 columns ...

Easiest way to deal with sample data in Java web apps?

I m writing a Java web app in my free time to learn more about development. I m using the Stripes framework and eventually intend to use hibernate and MySQL For the moment, whilst creating the pages ...

join across databases with nhibernate

I am trying to join two tables that reside in two different databases. Every time, I try to join I get the following error: An association from the table xxx refers to an unmapped class. If the ...

How can I know if such value exists in database? (ADO.NET)

For example, I have a table, and there is a column named Tags . I want to know if value programming exists in this column. How can I do this in ADO.NET? I did this: OleDbCommand cmd = new ...

Convert date to string upon saving a doctrine record

I m trying to migrate one of my PHP projects to Doctrine. I ve never used it before so there are a few things I don t understand. In my current code, I have a class similar to this: class ...

热门标签