The idea how to test rendering is quite simple: to test a function use the inverse function and check if the input and output match (match is not equality in your case):
f(f^-1(x)) = x
To test a rendering algorithm you would encode the raw input, render the encoded values and analyze the difference between the rendered output and the raw input. One problem is to get the raw input, when encoding/decoding random input is not appropriate. Another challenge is to evaluate the differences between the raw input and rendering output. I suppose if you re writing some rendering software you should be able to do a frequency analysis on the data. (Some transformation should pop into your head now.)
If it is possible generate your test data. Text fixtures are a real maintenance problem. They only shine in the beginning. If they are changing in some kind everything breaks down. The main problem is that if your using a fixture your tests are going to repeat the fixture s content. This makes the interpretation of intent of your tests harder. If there is a magic value in your test what s the significant part of this value?
Fixture:
actual = parse("file.xml")
expected = "magic value"
assert(actual == expected)
Generated values:
expected = generate()
input = render(expected)
actual = parse()
assert(actual == expected)
The nice thing with generators is that you can build quite complex object graphs with them starting from primitive types and fields (python Quickcheck version).
Generator based tests are not deterministic by nature. But given enough trials they follow the Law of large numbers.
Their additional value is that they will produce a good test value range coverage. (Which is hard to achieve with test fixtures.) They will find unanticipated bugs in your code.
An alternative test approach is to test with a equivalent function:
f(x) = f (x)
For example if you have a rendering function to compare against. This kind of test approach is useful if you have a working function. This function is your benchmark. It cannot be used in production because it is to slow or does use to much memory but can be easily debugged or proven to be correct.