English 中文(简体)
SC 选择性声明
原标题:SQL Select Statement Parser not returning JOIN type

I want to parse through a SQL Select Statement that has all the features a normal SQL dialect like MySQL has too. I looked for parsing libraries in python but couldn t find one that is doing the job. By that I mean I found some parsing libraries, but they were only able to parse through basic select statements (FROM and WHERE, not even ORDER BY). So as alternative I made my own parser (which I know is not a great solution at all). I spent a few hours working on it, but I keep getting an weird error and don t know how to approach it. Before I show the code I just want to mention that if you know a python library that is able to parse through SQL statements, not just SELECT but also CREATE TABLE, INSERT, etc., let me know.

这里是我的语言制图:

select_grammar = """
    start: select_statement ";"

    select_statement: "SELECT" column_list "FROM" table_list join_list? where_clause? groupby_clause? having_clause? orderby_clause?

    column_list: "*" | column_expr ("," column_expr)*

    column_expr: function_call | column_name | subquery
    
    column_name: (table_name ".")? NAME ("AS" NAME)?
    
    table_name: NAME ("AS" NAME)?

    function_call: NAME "(" function_args ")" ("AS" NAME)?

    function_args: expression ("," expression)*

    where_clause: "WHERE" condition

    groupby_clause: "GROUP BY" column_expr ("," column_expr)*

    having_clause: "HAVING" logical_expr

    orderby_clause: "ORDER BY" order_column ("," order_column)*

    order_column: column_expr ["ASC" | "DESC"]?

    condition: logical_expr

    logical_expr: logical_term
                | logical_expr "AND" logical_term
                | logical_expr "OR" logical_term
                | "NOT" logical_term

    logical_term: comparison_expr
                | "(" logical_expr ")"
                | subquery

    comparison_expr: expression OPERATOR expression
                    | expression "IS" ("NULL" | "NOT NULL")

    expression: (table_name ".")? NAME | INT | string | function_call | subquery

    table_list: table_name ("," table_name)* | subquery

    subquery: "(" select_statement ")"

    join_list: join_expr+

    join_expr: join_type (table_name | subquery) "ON" condition

    join_type: "INNER JOIN" | "LEFT JOIN" | "RIGHT JOIN" | "FULL JOIN"

    string: ESCAPED_STRING | / [^ ]* /

    OPERATOR: ">" | "<" | ">=" | "<=" | "=" | "!="

    %import common.CNAME -> NAME
    %import common.INT
    %import common.ESCAPED_STRING
    %import common.WS
    %ignore WS
"""

我也创立了“变压器”班。

@v_args(inline=True)
class SelectTransformer(Transformer):
    def start(self, *args):
        print("start result: ", args)
        return Tree("SELECT statement", args)

    def column_list(self, *args):
        return args

    def column_expr(self, *args):
        return args[0] if len(args) == 1 else args

    def function_call(self, name, args, alias=None):
        return (name, args, alias)

    def subquery(self, value):
        print("Subquery:", value)

    def where_clause(self, condition=None):
        return condition

    def groupby_clause(self, *args):
        return args

    def having_clause(self, condition=None):
        return condition

    def orderby_clause(self, *args):
        return args

    def order_column(self, *args):
        return args

    def condition(self, *args):
        return args

    def logical_expr(self, *args):
        return args

    def logical_term(self, *args):
        return args

    def comparison_expr(self, *args):
        return args

    def expression(self, *args):
        return args[0] if len(args) == 1 else args

    def column_name(self, *args):
        if len(args) == 1:
            return args[0]  # No alias present
        elif len(args) == 3:
            return args[0], args[2]  # Alias present, return a tuple
        else:
            return args

    def table_list(self, *args):
        return args

    def join_list(self, *args):
        return args

    def join_expr(self, *args):
        return args

    def join_type(self, *args):
        return args

    def subquery(self, *args):
        return args

    def string(self, value):
        return value.strip(" ")

    def table_name(self, *args):
        if len(args) == 1:
            return args[0]  # No alias present
        elif len(args) == 3:
            return args[0], args[2]  # Alias present, return a tuple
        else:
            return args

我不知道这是否很重要,我也创造了一个小的功能,显示最后一 tree:

def format_ast(ast, level=0):
    result = ""
    indent = "  " * level

    if isinstance(ast, tuple):
        for item in ast:
            result += format_ast(item, level + 1)
    elif isinstance(ast, Token):
        result += f"{indent}{ast.type}, Token( {ast.value} )
"
    elif isinstance(ast, Tree):
        result += f"{indent}Tree({ast.data}), [
"
        for child in ast.children:
            result += format_ast(child, level + 1)
        result += f"{indent}]
"
    else:
        result += f"{indent}{ast}
"

    return result

声明一行:

sql_query =  SELECT   
         name AS alias,   
         COUNT(age) AS age_alias,   
         (SELECT department_name FROM departments WHERE department_id = employees.department_id)   
         FROM employees AS emp, department   
         INNER JOIN departments AS dep ON employees.department_id = departments.id   
         LEFT JOIN other_table AS ot ON other_table.id = employees.table_id   
         WHERE age > 25   
         GROUP BY age, name   
         HAVING COUNT(age) > 1   
         ORDER BY name ASC, age DESC; 

第1号执行法是:

parser = Lark(select_with_joins_grammar, parser= lalr , transformer=SelectTransformer())
tree = parser.parse(sql_query)

# Print the custom export format
print(format_ast(tree))

The problem is related to the method join_type() of my class SelectTransformer. Somehow *args is always empty, although it should theoretically contain (like defined in the rule) "INNER JOIN" or "LEFT JOIN" or "RIGHT JOIN" or "FULL JOIN". My output looks like this:

  Tree(SELECT statement), [
  Tree(select_statement), [
        NAME, Token( name )
        NAME, Token( alias )
        NAME, Token( COUNT )
        Tree(function_args), [
          NAME, Token( age )
        ]
        NAME, Token( age_alias )
        Tree(select_statement), [
            NAME, Token( department_name )
            NAME, Token( departments )
                  NAME, Token( department_id )
                  OPERATOR, Token( = )
                    NAME, Token( employees )
                    NAME, Token( department_id )
        ]
        NAME, Token( employees )
        NAME, Token( emp )
      NAME, Token( department )
          NAME, Token( departments )
          NAME, Token( dep )
                  NAME, Token( employees )
                  NAME, Token( department_id )
                OPERATOR, Token( = )
                  NAME, Token( departments )
                  NAME, Token( id )
          NAME, Token( other_table )
          NAME, Token( ot )
                  NAME, Token( other_table )
                  NAME, Token( id )
                OPERATOR, Token( = )
                  NAME, Token( employees )
                  NAME, Token( table_id )
            NAME, Token( age )
            OPERATOR, Token( > )
            INT, Token( 25 )
      NAME, Token( age )
      NAME, Token( name )
            NAME, Token( COUNT )
            Tree(function_args), [
              NAME, Token( age )
            ]
            None
          OPERATOR, Token( > )
          INT, Token( 1 )
        NAME, Token( name )
        NAME, Token( age )
  ]
]

As you can see, no join type is displayed. I am relatively new to parsing so I don t really know what to try.

问题回答

答案是这种情况。 图表界定了> > <>>>和terminals/em>的组合。 规则中的案件名称较低,而终点站有上个位名称。 似乎只有终端站能够捕获其对应的标语。 (这可能是一种更为正式的方式来表明这一点,但对于这次讨论来说,这足够准确。)

So instead of:

    join_expr: join_type (table_name | subquery) "ON" condition

    join_type : "INNER JOIN" | "LEFT JOIN" | "RIGHT JOIN" | "FULL JOIN"
    join_expr: JOIN_TYPE (table_name | subquery) "ON" condition

    JOIN_TYPE: "INNER JOIN" | "LEFT JOIN" | "RIGHT JOIN" | "FULL JOIN"

这将产生一种结果,包括JOIN_TYPE, Token( INNER JOIN >





相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...