I want to parse through a SQL Select Statement that has all the features a normal SQL dialect like MySQL has too. I looked for parsing libraries in python but couldn t find one that is doing the job. By that I mean I found some parsing libraries, but they were only able to parse through basic select statements (FROM and WHERE, not even ORDER BY). So as alternative I made my own parser (which I know is not a great solution at all). I spent a few hours working on it, but I keep getting an weird error and don t know how to approach it. Before I show the code I just want to mention that if you know a python library that is able to parse through SQL statements, not just SELECT but also CREATE TABLE, INSERT, etc., let me know.
这里是我的语言制图:
select_grammar = """
start: select_statement ";"
select_statement: "SELECT" column_list "FROM" table_list join_list? where_clause? groupby_clause? having_clause? orderby_clause?
column_list: "*" | column_expr ("," column_expr)*
column_expr: function_call | column_name | subquery
column_name: (table_name ".")? NAME ("AS" NAME)?
table_name: NAME ("AS" NAME)?
function_call: NAME "(" function_args ")" ("AS" NAME)?
function_args: expression ("," expression)*
where_clause: "WHERE" condition
groupby_clause: "GROUP BY" column_expr ("," column_expr)*
having_clause: "HAVING" logical_expr
orderby_clause: "ORDER BY" order_column ("," order_column)*
order_column: column_expr ["ASC" | "DESC"]?
condition: logical_expr
logical_expr: logical_term
| logical_expr "AND" logical_term
| logical_expr "OR" logical_term
| "NOT" logical_term
logical_term: comparison_expr
| "(" logical_expr ")"
| subquery
comparison_expr: expression OPERATOR expression
| expression "IS" ("NULL" | "NOT NULL")
expression: (table_name ".")? NAME | INT | string | function_call | subquery
table_list: table_name ("," table_name)* | subquery
subquery: "(" select_statement ")"
join_list: join_expr+
join_expr: join_type (table_name | subquery) "ON" condition
join_type: "INNER JOIN" | "LEFT JOIN" | "RIGHT JOIN" | "FULL JOIN"
string: ESCAPED_STRING | / [^ ]* /
OPERATOR: ">" | "<" | ">=" | "<=" | "=" | "!="
%import common.CNAME -> NAME
%import common.INT
%import common.ESCAPED_STRING
%import common.WS
%ignore WS
"""
我也创立了“变压器”班。
@v_args(inline=True)
class SelectTransformer(Transformer):
def start(self, *args):
print("start result: ", args)
return Tree("SELECT statement", args)
def column_list(self, *args):
return args
def column_expr(self, *args):
return args[0] if len(args) == 1 else args
def function_call(self, name, args, alias=None):
return (name, args, alias)
def subquery(self, value):
print("Subquery:", value)
def where_clause(self, condition=None):
return condition
def groupby_clause(self, *args):
return args
def having_clause(self, condition=None):
return condition
def orderby_clause(self, *args):
return args
def order_column(self, *args):
return args
def condition(self, *args):
return args
def logical_expr(self, *args):
return args
def logical_term(self, *args):
return args
def comparison_expr(self, *args):
return args
def expression(self, *args):
return args[0] if len(args) == 1 else args
def column_name(self, *args):
if len(args) == 1:
return args[0] # No alias present
elif len(args) == 3:
return args[0], args[2] # Alias present, return a tuple
else:
return args
def table_list(self, *args):
return args
def join_list(self, *args):
return args
def join_expr(self, *args):
return args
def join_type(self, *args):
return args
def subquery(self, *args):
return args
def string(self, value):
return value.strip(" ")
def table_name(self, *args):
if len(args) == 1:
return args[0] # No alias present
elif len(args) == 3:
return args[0], args[2] # Alias present, return a tuple
else:
return args
我不知道这是否很重要,我也创造了一个小的功能,显示最后一 tree:
def format_ast(ast, level=0):
result = ""
indent = " " * level
if isinstance(ast, tuple):
for item in ast:
result += format_ast(item, level + 1)
elif isinstance(ast, Token):
result += f"{indent}{ast.type}, Token( {ast.value} )
"
elif isinstance(ast, Tree):
result += f"{indent}Tree({ast.data}), [
"
for child in ast.children:
result += format_ast(child, level + 1)
result += f"{indent}]
"
else:
result += f"{indent}{ast}
"
return result
声明一行:
sql_query = SELECT
name AS alias,
COUNT(age) AS age_alias,
(SELECT department_name FROM departments WHERE department_id = employees.department_id)
FROM employees AS emp, department
INNER JOIN departments AS dep ON employees.department_id = departments.id
LEFT JOIN other_table AS ot ON other_table.id = employees.table_id
WHERE age > 25
GROUP BY age, name
HAVING COUNT(age) > 1
ORDER BY name ASC, age DESC;
第1号执行法是:
parser = Lark(select_with_joins_grammar, parser= lalr , transformer=SelectTransformer())
tree = parser.parse(sql_query)
# Print the custom export format
print(format_ast(tree))
The problem is related to the method join_type() of my class SelectTransformer. Somehow *args is always empty, although it should theoretically contain (like defined in the rule) "INNER JOIN" or "LEFT JOIN" or "RIGHT JOIN" or "FULL JOIN". My output looks like this:
Tree(SELECT statement), [
Tree(select_statement), [
NAME, Token( name )
NAME, Token( alias )
NAME, Token( COUNT )
Tree(function_args), [
NAME, Token( age )
]
NAME, Token( age_alias )
Tree(select_statement), [
NAME, Token( department_name )
NAME, Token( departments )
NAME, Token( department_id )
OPERATOR, Token( = )
NAME, Token( employees )
NAME, Token( department_id )
]
NAME, Token( employees )
NAME, Token( emp )
NAME, Token( department )
NAME, Token( departments )
NAME, Token( dep )
NAME, Token( employees )
NAME, Token( department_id )
OPERATOR, Token( = )
NAME, Token( departments )
NAME, Token( id )
NAME, Token( other_table )
NAME, Token( ot )
NAME, Token( other_table )
NAME, Token( id )
OPERATOR, Token( = )
NAME, Token( employees )
NAME, Token( table_id )
NAME, Token( age )
OPERATOR, Token( > )
INT, Token( 25 )
NAME, Token( age )
NAME, Token( name )
NAME, Token( COUNT )
Tree(function_args), [
NAME, Token( age )
]
None
OPERATOR, Token( > )
INT, Token( 1 )
NAME, Token( name )
NAME, Token( age )
]
]
As you can see, no join type is displayed. I am relatively new to parsing so I don t really know what to try.