Question

I have created a very simple markup parser in php. However, it currently uses str_replace to switch between markup and html. How can I make a "code" box of sorts (will eventually use GeSHI) that has the contents untouched?

Right now, the following markup: [code][b]Some bold text[/b][/code] winds up parsing as the code box with <b>Some bold text</b>.

I need some advice, which option is best?

Have it check each word individually, and if it is not inside a [code] box it should parse
Leave it as is, let users be unable to post markup inside of [code].
Create another type of code box specifically for HTML markup, have [code] autorevert any < or > to [ and ].

Is there maybe even another option? This is a bit tougher than I thought it would be...

EDIT: Is it even worth adding a code box type thing to this parser? I mean, I see how it could be useful, but it is a rather large amount of effort for a small result.

Answer 1

You could break it down into multiple strings for the purposes of using the str_replace. Split the strings on the [code] and [/code] tags - saving the code box in a separate string. Make note of where it went in the original string somehow. Then use str_replace on the original string and do whatever parsing you like on the code box string. Finally reinsert the parsed code boxes and display.

Just a word of warning though, turning input into html for display strikes me as inherently dangerous. I d recommend a large amount of input sanitization and checking before converting to html for redisplay.

Answer 2

Why would you reinvent the wheel?

There s plenty of markup parsers already.

Anyway, just str_replace won t help much. You d have to learn regular expressions and as they say, now you ve got two problems ;)

Answer 3

HTML beautifier is pretty sweet. http://pear.php.net/package/PHP_Beautifier . The have a decorator class as well that would probably suit your needs.

Answer 4

To be clear, your problem is in two parts. The first part is the need for a lexical analyzer to break your "code" into the keywords for your "language." Once you have a lexical analyzer, you then need a parser. A parser is code that accepts the keywords for your language one-at-a-time in a logical (usually recursive-descent way) manner.

友情链接