The syntax I m trying to parse includes a continuation indicator in column 71. Identifiers, literals, almost anything can be continued onto the next line.
Ideally, I would like to drop the characters which make up the continue token, so that I m left with only the identifier characters. However, using the following lexer rules, the setText("") in LINE_CONTINUATION is ignored, thus polluting the final IDENTIFIER token.
IDENTIFIER
:
{getCharPositionInLine() < 71 }? IDENTIFIER_PART
(
{getCharPositionInLine() < 71 }? IDENTIFIER_PART
| LINE_CONTINUATION
)*
;
fragment IDENTIFIER_PART: (LETTER|DIGIT| _ );
fragment DIGIT: [0-9];
fragment LETTER options { caseInsensitive=true; } : [A-Z];
//A continuation line is non-blank in column 72, followed by anything until EOL,
//then on next line the characters starting after column position 15
LINE_CONTINUATION
:
{getCharPositionInLine() == 71 }?
~[ ]
~[
]* EOL
({getCharPositionInLine() <= 15 }? [ ] )+
{setText("");}
;
Is there anyway of overriding the value of a subrule (or fragment) in the same way that root rules can be overridden?
例如,可以列出一个识别标志清单,其定义是:
AAAAAAAAAAAA,BBBBBBBBBBB,CCCCCCCCCCCCCCCCC,DDDDDDDDDDD,EEEEEEEEEE,FFFF* Some comment
FFFF,GGGGGGGG
I m试图用文字表示:
AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFFFFFF
GGGGGGGG
然而,我收到了:
AAAAAAAAAAAA
BBBBBBBBBBB
CCCCCCCCCCCCCCCCC
DDDDDDDDDDD
EEEEEEEEEE
FFFF* Some comment
FFFF
GGGGGGGG