English 中文(简体)
Building a markup parser in php
原标题:

I have created a very simple markup parser in php. However, it currently uses str_replace to switch between markup and html. How can I make a "code" box of sorts (will eventually use GeSHI) that has the contents untouched?

Right now, the following markup: [code][b]Some bold text[/b][/code] winds up parsing as the code box with <b>Some bold text</b>.

I need some advice, which option is best?

  • Have it check each word individually, and if it is not inside a [code] box it should parse
  • Leave it as is, let users be unable to post markup inside of [code].
  • Create another type of code box specifically for HTML markup, have [code] autorevert any < or > to [ and ].

Is there maybe even another option? This is a bit tougher than I thought it would be...

EDIT: Is it even worth adding a code box type thing to this parser? I mean, I see how it could be useful, but it is a rather large amount of effort for a small result.

最佳回答

You could break it down into multiple strings for the purposes of using the str_replace. Split the strings on the [code] and [/code] tags - saving the code box in a separate string. Make note of where it went in the original string somehow. Then use str_replace on the original string and do whatever parsing you like on the code box string. Finally reinsert the parsed code boxes and display.

Just a word of warning though, turning input into html for display strikes me as inherently dangerous. I d recommend a large amount of input sanitization and checking before converting to html for redisplay.

问题回答

Why would you reinvent the wheel?

There s plenty of markup parsers already.

Anyway, just str_replace won t help much. You d have to learn regular expressions and as they say, now you ve got two problems ;)

HTML beautifier is pretty sweet. http://pear.php.net/package/PHP_Beautifier . The have a decorator class as well that would probably suit your needs.

To be clear, your problem is in two parts. The first part is the need for a lexical analyzer to break your "code" into the keywords for your "language." Once you have a lexical analyzer, you then need a parser. A parser is code that accepts the keywords for your language one-at-a-time in a logical (usually recursive-descent way) manner.





相关问题
Parse players currently in lobby

I m attempting to write a bash script to parse out the following log file and give me a list of CURRENT players in the room (so ignoring players that left, but including players that may have rejoined)...

How to get instance from string in C#?

Is it possible to get the property of a class from string and then set a value? Example: string s = "label1.text"; string value = "new value"; label1.text = value; <--and some code that makes ...

XML DOM parsing br tag

I need to parse a xml string to obtain the xml DOM, the problem I m facing is with the self closing html tag like <br /> giving me the error of Tag mismatch expected </br>. I m aware this ...

Ruby parser in Java

The project I m doing is written in Java and parsers source code files. (Java src up to now). Now I d like to enable parsing Ruby code as well. Therefore I am looking for a parser in Java that parses ...

Locating specific string and capturing data following it

I built a site a long time ago and now I want to place the data into a database without copying and pasting the 400+ pages that it has grown to so that I can make the site database driven. My site ...

热门标签