Background
I have written very simple BBCode parser using C#
which transforms BBCode to HTML. Currently it supports only [b]
, [i]
and [u]
tags. I know that BBCode is always considered as valid regardless whatever user have typed. I cannot find strict specification how to transform BBCode to HTML
Question
- Does standard "BBCode to HTML" specification exist?
- How should I handle
"[b][b][/b][/b]"
? For now parser yields"<b>[b][/b]</b>"
. - How should I handle
"[b][i][u]zzz[/b][/i][/u]"
input? Currently my parser is smart enough to produce"<b><i><u>zzz</u></i></b>"
output for such case, but I wonder that it is "too smart" approach, or it is not?
More details
I have found some ready-to-use BBCode parser implementations, but they are too heavy/complex for me and, what is worse, use tons of Regular Expressions and produce not that markup what I expect. Ideally, I want to receive XHTML at the output. For inferring "BBCode to HTML" transformation rules I am using this online parser: http://www.bbcode.org/playground.php. It produces HTML that is intuitively correct on my opinion. The only thing I dislike it does not produce XHTML. For example "[b][i]zzz[/b][/i]"
is transformed to "<b><i>zzz</b></i>"
(note closing tags order). FireBug of course shows this as "<b><i>zzz</i></b><i></i>"
. As I understand, browsers fix such wrong closing tags order cases, but I am in doubt:
- Should I rely on this browsers feature and do not try to make XHTML.
- Maybe
"[b][i]zzz[/b]ccc[/i]"
must be understood as"<b>[i]zzz</b>ccc[/i]"
- looks logically for such improper formatting, but is in conflict with popular forums BBCode outputs (*zzz****ccc*, not **[i]zzzccc[/i])
Thanks.