English 中文(简体)
Data structure to sync a file-tree
原标题:

I m in the process of a writing an application, which needs to synchronize a file-structure between a client and a (http) server.

The file-structure is essentially a list of file-paths where each path is a string connected with 1 or more data-block ids (256-bit reference to the actual data-block). A data-block can be referenced by several files so there s a n-m relation between paths and ids. Right now it is just a list of paths with there ids, but it can easily be converted to the tree structure which the paths represent, if that s necessary for the synchronization.

I m looking for a data structure which allows me to sync this data efficiently. Mainly achieving two goals:

  1. A change in one file should not force the client to send the entire file-strcuture to the server, only a small subset of it.
  2. If many files are changed these changes should be grouped together. E.g. so that 1000 changes doesn t result in 1000 requests to the server.

As you see, the goals are a bit conflicting and I m therefore looking for something which finds a good middleground between them. The second goal can easily be achieved by grouping several changes into one http-request, but then the processing required by the server (to parse all changes requested by the HTTP-request) should be very inexpensive, computing wise.

I should also mention that there could be several clients synchronizing the same structure on the server. It must therefore be easy to detect the changes by one client and then syncrhonize it to an other client (i.e. it s not just an upload to the server).

I m certainly not the first one doing something like this, so I assume there are some smart solutions available. For instance, I guess both Dropbox and Subversion have similar requirements when they sync their meta-data. Does anyone happen to know how they have implemented it?

最佳回答

I ve decided to solve this using a transaction-log. Each clients saves all changes to the tree to a transaction-log (in addition to the local db of the tree which it also keeps), which it periodically syncs with the server. The log is just a list of entries with file->datablock-id s and a timestamp.

When the log has been sent to the server it is removed from the client. Before uploading the log it also asks for logs written by other clients to the same tree. These logs are then merged into the local tree.

The log itself will be stored on the server using Azure Blob Storage. The server can periodically remove old entries from the log (if it grows to big).

This way the clients efficiently can communicate its changes with each other while the server doesn t have to any expensive processing on each request.

问题回答

Any reason not to use rsync? If you need to programmatically control it, there is librsync.

The subversion source code is open, so you could check that. Also, I know that Mercurial has a pretty smart wire protocol for minimizing traffic.





相关问题
The Fastest DataStructure to Filter with in C#

Currently we are filtering and sorting data with a datatable. /// <summary> /// Filters the data table and returns a new data table with only the filtered rows. /// </summary>...

Efficient queue in Haskell

How can I efficiently implement a list data structure where I can have 2 views to the head and end of the list, that always point to a head a tail of a list without expensive calls to reverse. i.e: ...

Java large datastructure for storing a matrix

I need to store a 2d matrix containing zip codes and the distance in km between each one of them. My client has an application that calculates the distances which are then stored in an Excel file. ...

Holding onto items after a postback

I have an ASP.NET web application and I want to be able to take items from a master list and store them temporarliy into one of four other lists. The other lists need to survive post backs so that ...

negative number in the stack

I am a new student in the compilers world ^_^ and I want to know is legal represent negative number in the stack. For example: infix: 1-5=-4 postfix: 15- The statements are: push(1) push(5) x=...

What type of struct/container would you use in this instance?

I am trying to figure out what type of structure or container I should use for a quick project. I need to have an unknown number of sets that will be entered from the GUI (each one will have a name, ...

热门标签