English 中文(简体)
CouchDB for chat history persistence and user statistics
原标题:

Is CouchDB or CouchBase suitable as a persistence NoSQL-based solution for storing users chat history and statistics? Since chat history would probably require writes rather than reads what should be the document structure for a single user history with some statistics - single entity representing user with embedded or separated documents for history data (lots of small docs) and some stats (small number of docs)?

最佳回答

Yes, CouchDB or Couchbase is suitable.

Since chat history requires many writes, I am thinking of something that makes writing easy: just drop a document and let CouchDB worry about aggregating it. In one quick POST you could describe the chat message, who sent it, timestamp, which chat room, etc.

CouchDB view collation will make the single entity representing a user with their historical data. For example, if you want to know user message volume, your map function will emit a key like this:

emit([doc.username, doc.year, doc.month, doc.day, doc.hour, doc.minute], 1);

And the reduce function adds up all the values. Now you can query a user s annual volume,

group_level=3&startkey=["somebody",2011,null]&endkey=["somebody",2011,{}]

or (by increasing the group level) monthly volume, daily volume, hourly volume, etc.

Considerations

This technique has costs and benefits. The basic trade-off is, updates should be easy, reports should be reasonable. In your example of 10,000 updates per day, I get nervous thinking about 409 Conflict rejections, or maintaining conflict-resolution code, or making the client gracefully recover from an error when more messages are piling up!

The suggested technique helps. Each update is isolated from the others, updates can occur out-of-order, error recovery is not too bad. Just retry a few times in the background. (Note, I am personally an advocate that updates should be easy—maybe I am biased.)

The cost is "wasting" disk space, and retrieving data is (relatively) more work. CouchDB is slow and wasteful like lorries are slow and wasteful. In reality, lorries are common in wealthy places and uncommon in poor places because they are a better long-term deal. Emotionally, we see lorries lumber about and vomit black smoke, but rationally, we know they are more efficient.

Most stats can be direct map/reduce views. However, you can also maintain "summary" documents with aggregated or independent results, or whatever else you need. Frequent updates are not a problem (on this scale: 86,400 updates per day is still just 1/sec). But you might want a dedicated "updater" client for those documents. With only one client working updating the special documents, you won t get 409 Conflicts since nobody else is fighting to update the same document.

问题回答

暂无回答




相关问题
couchdb for /really/ distributed replica

let s say i want to implement a distributed database (each node being the replica of the others); i hear that cdb is able to easily sync between two nodes and at least support some form of conflict ...

CouchDB View, Map, Index, and Sequence

I think read somewhere that when a View is requested the "map" is only run across documents that have been added since the last time it was requested? How is this determined? I thought I saw something ...

representing a many-to-many relationship in couchDB

Let s say I m writing a log analysis application. The main domain object would be a LogEntry. In addition. users of the application define a LogTopic which describes what log entries they are ...

couchDB , python and authentication

I have installed couchDB v 0.10.0, and am attempting to talk to it via python from Couch class downloaded from couchDB wiki. Problem is: Create database mydb : { error : unauthorized , reason : ...

Temp View Caching?

I m using CouchDB with a home-grown C# interface library. The most common method of accessing CouchDB with this library results in a temp view being created. I ve optimized the library so that it uses ...

热门标签