English 中文(简体)
How to use ANTLR to parse xml document
原标题:
  • 时间:2009-11-18 13:51:05
  •  标签:
  • antlr

can anybody tell how to use ANTLR tool(in java) to create our own grammar for xml documents and how to parse those documents using ANTLR tool(in java)?

问题回答

Check out ANTXR, my ANTLR derivation that supports XML tags in the grammar itself. You can use SAX or XMLPull as a front end. (Note: it s based on ANTLR 2.x)

http://javadude.com/tools/antxr/index.html

Short example:

header {
package com.javadude.antlr.sample.xml;

import java.util.List;
import java.util.ArrayList;
}

class PeopleParser extends Parser;


document returns [List results = null]
  : results=<people> EOF
  ;

<people> returns [List results = new ArrayList()]
  { Person p; }
  : ( p=<person>  { results.add(p); }   )*
  ;

<person> returns [Person p = new Person()]
  {
    String first, last;
    p.setId(@id);  // attributes are read using "@xxxx"
  }
  : ( first=<firstName>  { p.setFirstName(first); }
    | last=<lastName>    { p.setLastName(last);   }
    )*
  ;

<firstName> returns [String value = null]
  : pcdata:PCDATA { value = pcdata.getText(); }
  ;

<lastName> returns [String value = null]
  : pcdata:PCDATA { value = pcdata.getText(); }
  ;

If you want to write a completely conforming (even non-validating) XML parser you must read the W3C specification (http://www.w3.org/TR/REC-xml/). You will need to deal with internal and external DTD subsets, parameter entities and general entities. This will be a major task, even with ANTLR. You will need to be able to resolve URLs and deal with namespaceURIs. And a lot more.

I suspect that you wish to parse only a subset (though I don t think it s a good idea to write non-conformant parsers for standards). In which case the first thing is to write the EBNF for your subset. Then it should be fairly straightforward :-)

EDIT To make it very clear: anything that does not conform to the complete spec is NOT XML. You talk about creating your "own grammar" for XML, but there is already a defined grammar for XML which cannot be modified. If you wish to create your own syntax which is "like XML" you can, but anyone who thinks it actually IS XML will be disapppointed as there are many XML constructs you won t support (or will support differently).





相关问题
ANTLR parser hanging at proxy.handshake call

I am attempting to get a basic ECMAScript parser working, and found a complete ANTLR grammar for ECMAScript 3, which appears to compile ok and produces the appropriate Lexer/Parser/Walker Java files. (...

Will ANTLR Help? Different Suggestion?

Before I dive into ANTLR (because it is apparently not for the faint of heart), I just want to make sure I have made the right decision regarding its usage. I want to create a grammar that will parse ...

How to use ANTLR to parse xml document

can anybody tell how to use ANTLR tool(in java) to create our own grammar for xml documents and how to parse those documents using ANTLR tool(in java)?

JavaCC Problem - Generated code doesn t find all parse errors

Just started with JavaCC. But I have a strange behaviour with it. I want to verify input int the form of tokens (letters and numbers) wich are concatenated with signs (+, -, /) and wich can contain ...

How to generate introductory recognizer using ANTLR3C?

The Definitive ANTLR Guide starts with a simple recognizer. Using grammar verbatim to target C-runtime fails because %s means something to ANTLR: $ cat T.g grammar T; options { language = ...

What s the matter with this Grammar?

grammar Test; IDHEAD: ( a .. z | A .. Z | _ ); IDTAIL: (IDHEAD | 0 .. 9 ); ID: (IDHEAD IDTAIL*); fragment TYPE: ( text | number | bool ); define: define ID as TYPE; The problem ...

热门标签