Question

I m writing an application that will take in various "command" strings. I ve been looking at the Scala combinator library to tokenize the commands. I find in a lot of cases I want to say: "These tokens are an orderless set, and so they can appear in any order, and some might not appear".

With my current knowledge of grammars I would have to define all combinations of sequences as such (pseudo grammar):

command = action~content
action = alphanum
content  = (tokenA~tokenB~tokenC | tokenB~tokenC~tokenA | tokenC~tokenB~tokenA ....... )

So my question is, considering tokenA-C are unique, is there a shorter way to define a set of any order using a grammar?

Answer 1

There are ways around it. Take a look at the parser here, for example. It accepts 4 pre-defined numbers, which may appear in any other, but must appear once, and only once.

OTOH, you could write a combinator, if this pattern happens often:

def comb3[A](a: Parser[A], b: Parser[A], c: Parser[A]) =
  a ~ b ~ c | a ~ c ~ b | b ~ a ~ c | b ~ c ~ a | c ~ a ~ b | c ~ b ~ a

Answer 2

You can use the "Parser.^?" operator to check a group of parse elements for duplicates.

  def tokens = tokenA | tokenB | tokenC
  def uniqueTokens = (tokens*) ^? (
    { case t if (t == t.removeDuplicates) => t },
    { "duplicate tokens found: " + _ })

Here is an example that allows you to enter any of the four stooges in any order, but fails to parse if a duplicate is encountered:

package blevins.example

import scala.util.parsing.combinator._  

case class Stooge(name: String)

object StoogesParser extends RegexParsers {
  def moe = "Moe".r
  def larry = "Larry".r
  def curly = "Curly".r
  def shemp = "Shemp".r
  def stooge = ( moe | larry | curly | shemp ) ^^ { case s => Stooge(s) }
  def certifiedStooge = stooge | """w+""".r ^? (
    { case s: Stooge => s },
    { "not a stooge: " + _ })

  def stooges = (certifiedStooge*) ^? (
    { case x if (x == x.removeDuplicates) => x.toSet },
    { "duplicate stooge in: " + _ })

  def parse(s: String): String = {
    parseAll(stooges, new scala.util.parsing.input.CharSequenceReader(s)) match {
      case Success(r,_) => r.mkString(" ")
      case Failure(r,_) => "failure: " + r
      case Error(r,_) => "error: " + r
    }
  }

}

And some example usage:

package blevins.example

object App extends Application {

  def printParse(s: String): Unit = println(StoogesParser.parse(s))

  printParse("Moe Shemp Larry")
  printParse("Moe Shemp Shemp")
  printParse("Curly Beyonce")

  /* Output:
     Stooge(Moe) Stooge(Shemp) Stooge(Larry)
     failure: duplicate stooge in: List(Stooge(Moe), Stooge(Shemp), Stooge(Shemp))
     failure: not a stooge: Beyonce
  */
}

Answer 3

I would not try to enforce this requirement syntactically. I d write a production that admits multiple tokens from the set allowed and then use a non-parsing approach to ascertaining the acceptability of the keywords actually given. In addition to allowing a simpler grammar, it will allow you to more easily continue parsing after emitting a diagnostic about the erroneous usage.

Randall Schulz

Answer 4

I don t know what kind of constructs you want to support, but I gather you should be specifying a more specific grammar. From your comment to another answer:

todo message:link Todo class to database

I guess you don t want to accept something like

todo message:database Todo to link class

So you probably want to define some message-level keywords like "link" and "to"...

def token = alphanum~ : ~ "link" ~ alphanum ~ "class" ~ "to" ~ alphanum 
  ^^ { (a:String,b:String,c:String) => /* a == "message", b="Todo", c="database" */ }

I guess you would have to define your grammar at that level.

Answer 5

You could of course write a combination rule that does this for you if you encounter this situation frequently.

On the other hand, maybe the option exists to make "tokenA..C" just "token" and then differentiate inside the handler of "token"

友情链接