English 中文(简体)
how to create a parser for search queries
原标题:

for example i d need to create something like google search query parser to parse such expressions as:

flying hiking or swiming -"**walking in boots **" **author:**hamish **author:**reid

or

house in new york priced over $500000 with a swimming pool

how would i even go about start building something like it? any good resources?

c# relevant, please (if possible)

  • edit: this is something that i should somehow be able to translate to a sql query
最佳回答

How many keywords do you have (like or , in , priced over , with a )? If you only have a couple of them I d suggest going with simple string processing (regexes) too.

But if you have more than that you might want to look into implementing a real parser for those search expressions. Irony.net might help you with that (I found it extremely easy to use as you can express your grammar in a near bnf-form directly in code).

问题回答

The Lucene/NLucene project have functionality for boolean queries and some other query formats as well. I don t know about the possibilities to add own extensions like author in your case, but it might be worthwile to check it out.

There are few ways doing it, two of them:

  • Parsing using grammar (useful for complex language)
  • Parsing using regular expression and basic string manipulations (for simpler language)

According to your example, the language is very basic so splitting the string according to keyword can be the best solution.

string sentence = "house in new york priced over $500000 with a swimming pool";
string[] values = sentence.Split(new []{" in ", " priced over ", " with a "}, 
                                 StringSplitOptions.None);
string type = values[0];
string area = values[1];
string price = values[2];
string accessories = values[3];

However, some issues that may arise are: how to verify if the sentence stands in the expected form? What happens if some of the keywords can appear as part of the values?

If this is the case you encounter there are some libraries you can use to parse input using a defined grammar. Two of these libraries that works with .Net are ANTLR and Gold Parser, both are free. The main challenge is defining the grammar.

A grammar would work very well for the second example you gave but the first (any order keyword/command strings) would be best handled using Split() and a class to handle the various keywords and commands. You will have to do initial processing to handle quoted regions before the split (for example replacing spaces within quoted regions with a rare/unused character).

The ":" commands are easy to find and pull out of the search string for processing after the split is completed. Simply traverse the array looking.

The +/- keywords are also easy to find and add to the sql query as AND/AND NOT clauses.

The only place you might run into issues is with the "or" since you ll have to define how it is handled. What if there are multiple "or"s? But the order of keywords in the array is the same as in the query so that won t be an issue.

i think you should just do some string processing. There is no smart way of doing this.

So replace "OR" with your own or operator (e.g. ||). As far as i know there is no library for this.

I suggest you go with regexes.





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

How to Add script codes before the </body> tag ASP.NET

Heres the problem, In Masterpage, the google analytics code were pasted before the end of body tag. In ASPX page, I need to generate a script (google addItem tracker) using codebehind ClientScript ...

Transaction handling with TransactionScope

I am implementing Transaction using TransactionScope with the help this MSDN article http://msdn.microsoft.com/en-us/library/system.transactions.transactionscope.aspx I just want to confirm that is ...

System.Web.Mvc.Controller Initialize

i have the following base controller... public class BaseController : Controller { protected override void Initialize(System.Web.Routing.RequestContext requestContext) { if (...

Microsoft.Contracts namespace

For what it is necessary Microsoft.Contracts namespace in asp.net? I mean, in what cases I could write using Microsoft.Contracts;?

Separator line in ASP.NET

I d like to add a simple separator line in an aspx web form. Does anyone know how? It sounds easy enough, but still I can t manage to find how to do it.. 10x!

热门标签