English 中文(简体)
Compress data before storage on Google App Engine
原标题:

I im trying to store 30 second user mp3 recordings as Blobs in my app engine data store. However, in order to enable this feature (App Engine has a 1MB limit per upload) and to keep the costs down I would like to compress the file before upload and decompress the file every time it is requested. How would you suggest I accomplish this (It can happen in the background by the way via a task queue but an efficient solution is always good)

Based on my own tests and research - I see two possible approaches to accomplish this

  • Zlib

For this I need to compress a certain number of blocks at a time using a While loop. However, App Engine doesnt allow you to write to the file system. I thought about using a Temporary File to accomplish this but I havent had luck with this approach when trying to decompress the content from a Temporary File

  • Gzip

From reading around the web, it appears that the app engine url fetch function requests content gzipped already and then decompresses it. Is there a way to stop the function from decompressing the content so that I can just put it in the datastore in gzipped format and then decompress it when I need to play it back to a user on demand?

Let me know how you would suggest using zlib or gzip or some other solution to accmoplish this. Thanks

最佳回答

"Compressing before upload" implies doing it in the user s browser -- but no text in your question addresses that! It seems to be about compression in your GAE app, where of course the data will only be after the upload. You could do it with a Firefox extension (or other browsers equivalents), if you can develop those and convince your users to install them, but that has nothing much to do with GAE!-) Not to mention that, as @RageZ s comment mentions, MP3 is, essentially, already compressed, so there s little or nothing to gain (though maybe you could, again with a browser extension for the user, reduce the MP3 s bit rate and thus the file s dimension, that could impact the audio quality, depending on your intended use for those audio files).

So, overall, I have to second @jldupont s suggestion (also in a comment) -- use a different server for storage of large files (S3, Amazon s offering, is surely a possibility though not the only one).

问题回答

While the technical limitations (mentioned in other answers) of compressing MP3 files via standard compression or reencoding at a lower bitrate are correct, your aim is to store 30 seconds of MP3 encoded data. Assuming that you can enforce that on your users, you should be alright without applying additional compression techniques if the MP3 bitrate is 256kbit constant bitrate (CBR) or lower. At 256kbit CBR, 30 seconds of audio would require:

(((256 * 1000) / 8) * 30) / 1048576 = 0.91MB

The maximum standard bitrate is 320kbit which equates to 1.14MB, so you d have to use 256 or less. The most commonly used bitrate in the wild is 128kbits.

There are additional overheads that will increase the final file size such as ID3 tags and framing, but you should be OK. If not, drop down to 224kbits as your maximum (30 secs = 0.80MB). There are other complexities such as variable bit rate encoding for which the file size is not so predictable and I am ignoring these.

So your problem is no longer how to compress MP3 files, but how to ensure that your users are aware that they can not upload more than 30 seconds encoded at 256kbits CBR, and how to enforce that policy.

As Aneto mentions in a comment, you will not be able to compress MP3 data with a standard compression library like gzip or zlib. However, you could reencode the MP3 at a MUCH lower bitrate, possible with LAME.

You can store up to 10Mb with a list of Blobs. Search for google file service. It s much more versatile than BlobStore in my opinion, since I just started using BlobStore Api yesterday and I m still figuring out if it is possible to access the data bytewise.. as in changing doc to pdf, jpeg to gif..

You can storage Blobs of 1Mb * 10 = 10 Mb (max entity size I think), or you can use BlobStore API and get the same 10Mb or get 50Mb if you enable billing (you can enable it but if you don t pass the free quota you don t pay).





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签