Question

The syntax I m trying to parse includes a continuation indicator in column 71. Identifiers, literals, almost anything can be continued onto the next line.

Ideally, I would like to drop the characters which make up the continue token, so that I m left with only the identifier characters. However, using the following lexer rules, the setText("") in LINE_CONTINUATION is ignored, thus polluting the final IDENTIFIER token.

IDENTIFIER 
    : 
    {getCharPositionInLine() < 71 }? IDENTIFIER_PART
    (
            {getCharPositionInLine() < 71 }? IDENTIFIER_PART  
        |   LINE_CONTINUATION 
    )*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT| _ );
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];

//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on next line the characters starting after column position 15
LINE_CONTINUATION
    : 
    {getCharPositionInLine() == 71 }? 
    ~[ ] 
    ~[
]* EOL
    ({getCharPositionInLine() <= 15 }? [ ] )+  
    {setText("");}
;

Is there anyway of overriding the value of a subrule (or fragment) in the same way that root rules can be overridden?

例如,可以列出一个识别标志清单,其定义是:

AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG

I m试图用文字表示:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG

然而,我收到了:

AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment
FFFF
GGGGGGGG

Answer 1

这是不可能的。页: 1 试验:

IDENTIFIER
 : {getCharPositionInLine() < 71 }? IDENTIFIER_PART
   ( {getCharPositionInLine() < 71 }? IDENTIFIER_PART  
   | LINE_CONTINUATION 
   )*
   {
     String text = getText();
     setText(text.replaceAll(“\S[^
]*[
]+[ ]{0,15}”, “”));
   }
;

友情链接