English 中文(简体)
I need suggestions for a distributed media storage data store
原标题:

I want to develop one multimedia system, the system need to save millions videos and images, so I want to select a distributed storage subsystem. who can give me some suggestion ? thanks!

最佳回答

@yi_H

You can configure your writes to be first replicated to multiple nodes before it return to the client. Now whether or not that is needed is of course unto the use case. And definitely involves a performance hit. So if you are implementing a write heavy analytical database, it will have a significant impact on write throughput.

All other points you make about the question in terms of lack of requirements etc, I second that.

Having replicated file system with metadata in a nosql database is a very common way of doing things. @why did you consider this kinda approach?

Have you taken a look at Mongodb gridfs? I have never used it, but it is something I would take a look at to see if it gives you any ideas.

问题回答

I guess that best option for the millions videos and images is content distribution/delivery network (CDN):

CDN is a server setup which allows for faster, more efficient delivery of your media files. It does this by maintaining copies of your media at different points of presence (POPs) along a global network to ensure quick client access and the fastest delivery possible

If you will use CDN you no need care about many problems(distribution, fast access). Integration with CDN also should be very simple.

Yo gave us (near) zero information about what your requirements are. Eg:

  • Do you want atomic transactions?
  • Is the system read or write heavy?
  • Do you need fast queries or want to batch-process the data set?
  • How big are the videos?
  • Do you want to distribute data locally (on a LAN) or spanning multiple data centers / continents?

How are we supposed to pick the right tool if we don t know what it needs to support?

Without any knowledge of the system I would advise using some kind of FS replication for the videos and images and then storing the metadata associated with the items either in MongoDB, MySQL Master-Master or MySQL Cluster.

Distributed related to what?

If you are talking of replication to distribute:

MongoDb only restricted to Master-Slave replication, so only one node is able to read/write which leaves you with a single point of failure for a really distributed system. CouchDB is able to peer-to-peer replicate.

Find a very good comparison here and here also compared with hbase.

With CouchDB you also have to be aware that you are going to talk http to the database and have build in webservices.

Regards, Chris

An alternative is to use MongoDB s GridFS, serving as a (very easily manageable) redundant and distributed filesystem.

Some will say that it s slow on reads, (and it is, mostly because of the nature of its design) but that doesn t have to mean it s a dealbreaker for your system in whole, because if you need performance later on, you could always put Varnish or Squid in front of the filesystem tier.

For all I know, Squid also supports on-disk cache for all the less-hot files.

Sources:

http://www.mongodb.org/display/DOCS/GridFS

http://www.squid-cache.org/Doc/config/cache_dir/





相关问题
Collection replication using multicast

need a technology (open source or build myself) usable from C# that allows me to in one process maintain a “master” collection of objects (says a Dictionary of Customer objects) and in n other “client”...

How to use database server for distributed job scheduling?

I have around 100 computers and few workers on each of them. The already connect to a central database to query for job parameters. Now I have to do job scheduling for them. One job for one worker ...

P2P or Distributed System implementation

I have the work of implementing a distributed system of nodes (like p2p nodes) each of these nodes (lets say A,B,C and D) perform certain functions and need to interact with each other for various ...

Java Equivalent of distcc

Distcc makes it easy to distribute a C or C++ compile job across a number of machines, and is a godsend for working with large, frequently-built codebases. An increasing number of our large projects ...

热门标签