A long time ago, I had to test a program generating a postscript file image. One quick way to figure out if the program was producing the correct, expected output was to do an md5 of the result to compare against the md5 of a "known good" output I checked beforehand.
Unfortunately, Postscript contains the current time within the file. This time is, of course, different depending on when the test runs, therefore changing the md5 of the result even if the expected output is obtained. As a fix, I just stripped off the date with sed.
This is a nice and simple scenario. We are not always so lucky. For example, now I am programming a writer program, which creates a big fat RDF file containing a bunch of anonymous nodes and uuids. It is basically impossible to check the functionality of the whole program with a simple md5, and the only way would be to read the file with a reader, and then validate the output through this reader. As you probably realize, this opens a new can of worms: first, you have to write a reader (which can be time consuming), second, you are assuming the reader is functionally correct and at the same time in sync with the writer. If both the reader and the writer are in sync, but on incorrect assumptions, the reader will say "no problem", but the file format is actually wrong.
This is a general issue when you have to perform functional testing of a file format, and the file format is not completely reproducible through the input you provide. How do you deal with this case?