English 中文(简体)
Is it safe to include extra columns in the SELECT list of a SQLite GROUP BY query?
原标题:

I have a simple SQLite table called "message":

sequence INTEGER PRIMARY KEY
type TEXT
content TEXT

I want to get the content of the last message of each type (as determined by its sequence). To my surprise, the following simple query works:

SELECT MAX(sequence), type, content
FROM message
GROUP BY type

Surprise, because I know that MSSQL or Postgres would refuse to include a column in the SELECT list that is not part of the GROUP BY clause or an aggregate function and I d have to do a join, like this:

SELECT m.sequence, m.type, m.content
FROM
(
    SELECT MAX(sequence) as sequence, type
    FROM message
    GROUP BY type
) g
JOIN message m
ON g.sequence = m.message_sequence

My question is: is it safe to use the first, much simpler, form of the query in SQLite? It intuitively makes sense that it selects the "content" value that matches the "MAX(sequence)" value, but the documentation doesn t seem to talk about this at all. Of course, if sequence was not unique then the result would be undefined. But if sequence is unique, as in my case, is this guaranteed or is it simply a lucky implementation detail that s subject to change?

最佳回答

You can use these queries "safely," that is, without getting ambiguous results, if the extra columns are functionally dependent on the column(s) you group by:

SELECT c.parent_id, COUNT(*), p.any_column
FROM child_table c 
JOIN parent_table p USING (parent_id)
GROUP BY c.parent_id;

The example above would work in SQLite, and produce an unambiguous result, because there s no way p.any_column could have multiple values per group. However, this query is strictly in violation of the SQL standard, and most brands of RDBMS would raise an error.

It s too easy to write a query that produces ambiguous results, though. When you name a column that has multiple values per group, you can t control which value is returned in your result set.

In practice, MySQL returns the value from the first row with respect to physical storage, and SQLite returns the value from the last row. But it s totally implementation-dependent and not reliable. If the next version of either software changes its internals, you could get different query results after you upgrade. So it s best not to rely on this behavior.


Regarding your example, where content should "intuitively" have the value from the row where sequence is MAX. But is this really intuitive? Consider these other cases:

SELECT MAX(sequence), MIN(sequence), type, content
FROM message
GROUP BY type

So which row now supplies the value for content? The row where sequence is MAX, or the row where sequence is MIN?

What if you use a non-unique column (e.g. date), and there are multiple rows with the same MAX value for date, but different values for content?

SELECT MAX(date), type, content
FROM message
GROUP BY type

What about other aggregate functions like AVG() or SUM()? It could be that the value of the aggregate corresponds to no individual row in the table. So now which row should supply the value for content?

SELECT AVG(sequence), type, content
FROM message
GROUP BY type
问题回答

I don t know of any database which will "intuitively" solve this sort of problem, where you want to get related row values for a group based upon the result of an aggregate for a specific column. For SQLite, I think you had better stick with your second query.

Since you mentioned PostgreSQL, it s worth noting that it supports some non-standard syntax which accomplishes this, in the form of DISTINCT ON:

select distinct on (type) sequence, type, content
from message
order by sequence desc

(There could be some errors in that, as I don t have a psql prompt in front of me, but that s the gist of it.)

See http://www.postgresql.org/docs/8.4/interactive/queries-select-lists.html

I bet it just uses a random value for sequence field. MySQL docs for instance explicitly say so.





相关问题
SQL SubQuery getting particular column

I noticed that there were some threads with similar questions, and I did look through them but did not really get a convincing answer. Here s my question: The subquery below returns a Table with 3 ...

难以执行 REGEXP_SUBSTR

I m 查询Oracle 10g。 我有两张表格(样本数据见下文)。 i m 试图提取一些领域

SQL Query Shortcuts

What are some cool SQL shorthands that you know of? For example, something I learned today is you can specify to group by an index: SELECT col1, col2 FROM table GROUP BY 2 This will group by col2

PHP array callback functions for cleaning output

I have an array of output from a database. I am wondering what the cleanest way to filter the values is example array Array ( [0] => Array ( [title] => title 1 ...

OracleParameter and DBNull.Value

we have a table in an Oracle Database which contains a column with the type Char(3 Byte). Now we use a parameterized sql to select some rows with a DBNull.Value and it doesn t work: OracleCommand ...

Running numbers in SQL

I have a SQL-statement like this: SELECT name FROM users WHERE deleted = 0; How can i create a result set with a running number in the first row? So the result would look like this: 1 Name_1 2 ...

How to get SQL queries for each user where env is production

I’m developing an application dedicated to generate statistical reports, I would like that user after saving their stat report they save sql queries too. To do that I wrote the following module: ...

热门标签