Question

我有一个系统，用于查询REST / Atom服务器上的文档。这些查询启发自GData，类似于：

http://server/base/feeds/documents?bq=[type in { news }]

我必须解析“bq”参数，以了解将返回哪种类型的文档，而无需实际执行查询。例如，

bq=[type =  news ]                      ->  return ["news"]
bq=[type in { news }]                   ->  return ["news"]
bq=[type in { news ,  article }]        ->  return ["news", "article"]
bq=[type =  news ]|[type =  article ]   ->  return ["news", "article"]
bq=[type =  news ]|[title =  My Title ] ->  return ["news"]

基本上，查询语言是一系列谓词列表，可以通过OR（“|”）或AND（无分隔符）组合。每个谓词都是对字段的约束。约束可以是=，<，>，<=，>=，in等......可以在任何有意义的地方添加空格。

我有点迷惑于Regexp、StringTokenizer、StreamTokenizer等之间的区别，而我被困在Java 1.4上，所以没有解析器...

谁能指点我正确的方向？

谢谢！

Answer 1

正确的方法是使用像Antlr、JFlex或JavaCC这样的解析器生成器。

一种快速而简单的方法是：

String[] disjunctedPredicateGroups = query.split("|");
List<String[]> normalizedPredicates = ArrayList<String[]>;
for (String conjunction : disjunctedPredicateGroups ) {
   normalizedPredicates.add(conjunction.split("[|]"));
}
// process each predicate

友情链接