English 中文(简体)
Deriving a state machine from a BNF grammar
原标题:

I am trying to put together a proof of concept of an XSS-safe string interpolation scheme.

Given a string with substitutions,

"Hello <b>$planetoid</b>!"

I want break it into literal portions and substitutions ("Hello<b>" planetoid "</b>!") and then run a state machine left to right over the literal portions. When I reach an interpolated value (planetoid in the above), I need to be able to get from the state to an appropriate escaping function.

Does anyone know of any examples of how to use lex/yacc/bison to derive a state machine and be able to associate labels in the grammar with output states? I want to derive a state machine that I can use both in javascript, and to try and replace PHP s underlying string implementation.

My reasons for doing this are described here.

cheers, mike

最佳回答

In general, it is not possible to create a state machine for grammars that can be represented in BNF. State machines can only recognize regular languages and BNF can specify context-free languages. Yacc can create parsers. Would that be sufficient?

问题回答

It looks like I can put markers in the grammar, so if I use two different production types, one with no side effect, and that consumes characters, and one that consumes no characters, but updates a state variable

ST_EXPECT_TAG_NAME : { state = TAG_NAME };
TAG_BODY
    :  <  ST_EXPECT_TAG_NAME TAG_NAME ATTRS SPACES  >  ST_OUT_OF_TAG
    ;

the compiled output associates state names in a switch statement

YY_REDUCE_PRINT (yyn);
switch (yyn)
  {
      case 118:
#line 74 "tmp/html-combo.y"
    { state = TAG_NAME ;}
    break;

There may be a way to extract the tables without parsing C, but I m so ignorant of yacc/bison.

You can use yacc/bison for this. Initially looking at bison its hard to tell where you can implement a state machine. Rules in bison are resolved left-right-up. Ie; If you have a rule (called rule0) which derives rule1 rule2 rule3, the actions are called in this order: rule1,rule2,rule3,rule0. You can combine that using a global state machine, with dynamic return values for rules (I use a union with different types like a string, an int or even a container for return values).





相关问题
How to turn this into a parser

If I just add on to the following yacc file, will it turn into a parser? /* C-Minus BNF Grammar */ %token ELSE %token IF %token INT %token RETURN %token VOID %token WHILE %token ID %token NUM %...

Bison/Yacc, make literal token return its own value?

Below is my rule, when i replace $2 with = my code works. I know by default all literal tokens uses their ascii value (hence why multi character token require a definition) The below doesnt work. ...

What s wrong with this yacc file?

When I run yacc -d parser.y on the following file I get the following errors: parser.y:23.3-24.4: warning: unused value: $4 15 rules never reduced parser.y: warning: 7 useless nonterminals and 15 ...

How to find shift/reduce conflict in this yacc file?

When I try to use yacc on the following file I get the error conflicts: 1 shift/reduce How can I find and fix the conflict? /* C-Minus BNF Grammar */ %token ELSE %token IF %token INT %token RETURN %...

Deriving a state machine from a BNF grammar

I am trying to put together a proof of concept of an XSS-safe string interpolation scheme. Given a string with substitutions, "Hello <b>$planetoid</b>!" I want break it into literal ...

热门标签