English 中文(简体)
Grammars, Scala Parsing Combinators and Orderless Sets
原标题:

I m writing an application that will take in various "command" strings. I ve been looking at the Scala combinator library to tokenize the commands. I find in a lot of cases I want to say: "These tokens are an orderless set, and so they can appear in any order, and some might not appear".

With my current knowledge of grammars I would have to define all combinations of sequences as such (pseudo grammar):

command = action~content
action = alphanum
content  = (tokenA~tokenB~tokenC | tokenB~tokenC~tokenA | tokenC~tokenB~tokenA ....... )

So my question is, considering tokenA-C are unique, is there a shorter way to define a set of any order using a grammar?

最佳回答

There are ways around it. Take a look at the parser here, for example. It accepts 4 pre-defined numbers, which may appear in any other, but must appear once, and only once.

OTOH, you could write a combinator, if this pattern happens often:

def comb3[A](a: Parser[A], b: Parser[A], c: Parser[A]) =
  a ~ b ~ c | a ~ c ~ b | b ~ a ~ c | b ~ c ~ a | c ~ a ~ b | c ~ b ~ a
问题回答

You can use the "Parser.^?" operator to check a group of parse elements for duplicates.

  def tokens = tokenA | tokenB | tokenC
  def uniqueTokens = (tokens*) ^? (
    { case t if (t == t.removeDuplicates) => t },
    { "duplicate tokens found: " + _ })

Here is an example that allows you to enter any of the four stooges in any order, but fails to parse if a duplicate is encountered:

package blevins.example

import scala.util.parsing.combinator._  

case class Stooge(name: String)

object StoogesParser extends RegexParsers {
  def moe = "Moe".r
  def larry = "Larry".r
  def curly = "Curly".r
  def shemp = "Shemp".r
  def stooge = ( moe | larry | curly | shemp ) ^^ { case s => Stooge(s) }
  def certifiedStooge = stooge | """w+""".r ^? (
    { case s: Stooge => s },
    { "not a stooge: " + _ })

  def stooges = (certifiedStooge*) ^? (
    { case x if (x == x.removeDuplicates) => x.toSet },
    { "duplicate stooge in: " + _ })

  def parse(s: String): String = {
    parseAll(stooges, new scala.util.parsing.input.CharSequenceReader(s)) match {
      case Success(r,_) => r.mkString(" ")
      case Failure(r,_) => "failure: " + r
      case Error(r,_) => "error: " + r
    }
  }

}

And some example usage:

package blevins.example

object App extends Application {

  def printParse(s: String): Unit = println(StoogesParser.parse(s))

  printParse("Moe Shemp Larry")
  printParse("Moe Shemp Shemp")
  printParse("Curly Beyonce")

  /* Output:
     Stooge(Moe) Stooge(Shemp) Stooge(Larry)
     failure: duplicate stooge in: List(Stooge(Moe), Stooge(Shemp), Stooge(Shemp))
     failure: not a stooge: Beyonce
  */
}

I would not try to enforce this requirement syntactically. I d write a production that admits multiple tokens from the set allowed and then use a non-parsing approach to ascertaining the acceptability of the keywords actually given. In addition to allowing a simpler grammar, it will allow you to more easily continue parsing after emitting a diagnostic about the erroneous usage.

Randall Schulz

I don t know what kind of constructs you want to support, but I gather you should be specifying a more specific grammar. From your comment to another answer:

todo message:link Todo class to database

I guess you don t want to accept something like

todo message:database Todo to link class

So you probably want to define some message-level keywords like "link" and "to"...

def token = alphanum~ : ~ "link" ~ alphanum ~ "class" ~ "to" ~ alphanum 
  ^^ { (a:String,b:String,c:String) => /* a == "message", b="Todo", c="database" */ }

I guess you would have to define your grammar at that level.

You could of course write a combination rule that does this for you if you encounter this situation frequently.

On the other hand, maybe the option exists to make "tokenA..C" just "token" and then differentiate inside the handler of "token"





相关问题
Reserved keywords in Objective-C?

At the CocoaHeads Öresund meeting yesterday, peylow had constructed a great ObjC quiz. The competition was intense and three people were left with the same score when the final question was to be ...

ANTLR grammar license [closed]

I m planning to make an implementation of Lua for the DLR, and i would like to use the listed Lua 5.1 grammar here. However i can t see a license that it was released under, so can someone please ...

Does anyone recognise this unfamiliar notation?

I have a question from a test in a Programming Languages class that is confusing me. Give a context-free grammar to generate the following language L = { aibjck | 0 <= i <= j <= i + k } I ...

Question about building a symbol table with a yacc parser

If my yacc parser encounters the following code: int foo(int a, int b) should it add int a and int b as attributes of foo? The way I have it now, it enters a and b as separate table entries.

Tips on Using Bison --graph=[file] on Linux

Recently (about a month ago) I was trying to introduce new constructs to my company s in-house extension language, and struggling with a couple of reduce-reduce errors. While I eventually solved this ...

Yacc program not recognizing function declaration

I think my program should be able to recognize the following as a function declaration: int fn(int i) { int n; return; } but it doesn t. Here s the relevant part of my yacc file: program : ...

Grammars, Scala Parsing Combinators and Orderless Sets

I m writing an application that will take in various "command" strings. I ve been looking at the Scala combinator library to tokenize the commands. I find in a lot of cases I want to say: "These ...

What s the matter with this Grammar?

grammar Test; IDHEAD: ( a .. z | A .. Z | _ ); IDTAIL: (IDHEAD | 0 .. 9 ); ID: (IDHEAD IDTAIL*); fragment TYPE: ( text | number | bool ); define: define ID as TYPE; The problem ...

热门标签