English 中文(简体)
Functional testing of output files, when output is non-deterministic (or with low control)
原标题:

A long time ago, I had to test a program generating a postscript file image. One quick way to figure out if the program was producing the correct, expected output was to do an md5 of the result to compare against the md5 of a "known good" output I checked beforehand.

Unfortunately, Postscript contains the current time within the file. This time is, of course, different depending on when the test runs, therefore changing the md5 of the result even if the expected output is obtained. As a fix, I just stripped off the date with sed.

This is a nice and simple scenario. We are not always so lucky. For example, now I am programming a writer program, which creates a big fat RDF file containing a bunch of anonymous nodes and uuids. It is basically impossible to check the functionality of the whole program with a simple md5, and the only way would be to read the file with a reader, and then validate the output through this reader. As you probably realize, this opens a new can of worms: first, you have to write a reader (which can be time consuming), second, you are assuming the reader is functionally correct and at the same time in sync with the writer. If both the reader and the writer are in sync, but on incorrect assumptions, the reader will say "no problem", but the file format is actually wrong.

This is a general issue when you have to perform functional testing of a file format, and the file format is not completely reproducible through the input you provide. How do you deal with this case?

问题回答

In the past I have used a third party application to validate such output (preferably converting it into some other format which can be mechanically verified). The use of a third party ensures that my assumptions are at least shared by others, if not strictly correct. At the very least this approach can be used to verify syntax. Semantic correctness will probably require the creation of a consumer for the test data which will likely always be prone to the "incorrect assumptions" pitfall you mention.

Is the randomness always in the same places? I.e. is most of the file fixed but there are some parts that always change? If so, you might be able to take several outputs and use a programmatic diff to determine the nondeterministic parts. Once those are known, you could use the information to derive a mask and then do a comparison (md5 or just a straight compare). Think about pre-processing the file to remove (or overwrite with deterministic data) the parts that are non-deterministic.

If the whole file is non-deterministic then you ll have to come up with a different solution. I did testing of MPEG-2 decoders which are non-deterministic. In that case we were able to do a PSNR and fail if it was above some threshold. That may or may not work depending on your data but something similar might be possible.





相关问题
Selenium not working with Firefox 3.x on linux

I am using selenium-server , selenium rc for UI testing in my application . My dev box is Windows with FireFox 3.5 and every thing is running fine and cool. But when i try to run selenium tests on my ...

Best browser for testing under Safari Mobile on Linux?

I have an iPhone web app I m producing on a Linux machine. What s the best browser I can use to most closely mimic the feature-limited version of Safari present on the iPhone? (It s a "slimmed down" ...

Code Coverage Tools & Visual Studio 2008 Pro

Just wondering what people are using for code coverage tools when using MS Visual Studio 2008 Pro. We are using the built-in MS test project and unit testing tool (the one that come pre-installed ...

Is there any error checking web app cralwers out there?

Wondering if there was some sort of crawler we could use to test and re-test everything when changes are made to the web app so we know some new change didn t error out any existing pages. Or maybe a ...

热门标签