English 中文(简体)
MongoDB nested sets
原标题:
  • 时间:2009-12-05 22:49:25
  •  标签:
  • mongodb

What re the best practices to store nested sets (like trees of comments) in MongoDB?

I mean, every comment can have a parent comment and children-comments (answers).

Storing them like this:

{
   title: "Hello",
   body: "Please comment me!",
   comments: [
        {
            author: "Peter",
            text: "Hi there",
            answers: [
                  {
                      author: "Peter",
                      text: "Hi there",
                      answers: [
                                 { author: "Ivan", text: "Hi there" },
                                 { author: "Nicholas", text: "Hi there" }
                      ]
                  },
                  { author: "Ivan", text: "Hi there" },
                  { author: "Nicholas", text: "Hi there" },
            ]
        },
        { author: "Ivan", text: "Hi there" },
        { author: "Nicholas", text: "Hi there" },
   ]
}

is not cool, because we can t, for example, ask for "all post which are commented by Peter" without map/reduce.

最佳回答

I think there is no perfect solution - depends on what operations are more important for your app. I believe Silicon Alley Insider stores comments nested with MongoDB for example. That does make the query you mention harder.

One option is store at top-level in the post a list of all commenters in an array. Think of that as denormalized data. Then one can easily find all posts which involve a certain commenter. Then to drill down, you use map/reduce or db.eval() to get the nested post info within.

One other note - if you are dealing with a single document, db.eval() is probably lighter-weight than map/reduce. $where is also an option but can be slow so I like the additional list of commenters mentioned above - not it is also easy to index that array too (see Multikey in the docs).

See also: http://groups.google.com/group/mongodb-user/browse_thread/thread/df8250573c91f75a/e880d9c57e343b52?lnk=gst&q=trees#e880d9c57e343b52

问题回答

In the link from dm s post Dwight Merriman mentions using a path key and doing regex matches

{
  path : "a.b.c.d.e.f"
}

Another way to do this would be with arrays

{
  path : ["a", "b", "c", "d", "e", "f"]
}

db.test.ensureIndex({path: 1})

that should make it pretty fast.

if each node can only be in a single path then you wouldn t need to do worry about where it is located in the list

db.test.find({path: "a"})

would find all children of "a"

Instead of path names I would probably use the _id of the nodes.

Update

  • one thing to be careful of is that an index can only have one array in it.
  • Be careful to use explain on your queries

    db.test.find({path: {$in: ["a", "b"]})

gives you

 db.test.find({path: {$in: ["a", "b"]}}).explain()
{
        "cursor" : "BtreeCursor path_1 multi",
        "nscanned" : 2,
        "nscannedObjects" : 2,
        "n" : 1,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : true,
        "indexOnly" : false,
        "indexBounds" : {
                "path" : [
                        [
                                "a",
                                "a"
                        ],
                        [
                                "b",
                                "b"
                        ]
                ]
        }
}

but

 db.test.find({path: {$all: ["a", "b"]}}).explain()
{
        "cursor" : "BtreeCursor path_1",
        "nscanned" : 1,
        "nscannedObjects" : 1,
        "n" : 1,
        "millis" : 0,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "isMultiKey" : true,
        "indexOnly" : false,
        "indexBounds" : {
                "path" : [
                        [
                                "a",
                                "a"
                        ]
                ]
        }
}

only uses the first element and then scans all matching results for b.
If a is your root element or is in most of your records then your doing a nearly full scan of the records instead of an efficient index query.





相关问题
Access DB Ref MongoDB

Whats the best way to access/query a DB Ref: UPDATE: users: name, groupref : {$ref:"groups",$id:"ObjectId ..." } } groups: name, topic, country,...,.. Assumption is that user belongs to only one ...

MongoDB nested sets

What re the best practices to store nested sets (like trees of comments) in MongoDB? I mean, every comment can have a parent comment and children-comments (answers). Storing them like this: { ...

MongoMapper and migrations

I m building a Rails application using MongoDB as the back-end and MongoMapper as the ORM tool. Suppose in version 1, I define the following model: class SomeModel include MongoMapper::Document ...

MongoDB takes long for indexing

I have the following setup: Mac Pro with 2 GB of RAM (yes, not that much) MongoDB 1.1.3 64-bit 8 million entries in a single collection index for one field (integer) wanted Calling .ensureIndex(...) ...

Storing and accessing large amounts of data

My application creates pieces of data that, in xml, would look like this: <resource url="someurl"> <term> <name>somename</name> <frequency>somenumber</...

热门标签