`EXCLUDE` Extremely useful, is there a reason why this is something not implemen...

heyda · on May 12, 2022

because columns can be added to tables in production databases, so any time you use select * you run the chance the number of columns changing and breaking anything you wrote.

wruza · on May 12, 2022

This seems reasonable on its own, but then you can add a compound index and forget to join on a second part, or refactor a column in two and only collect one value into aggregation. This spotted babysitting is just stupid. If you’re anxious about query integrity, get some tooling and check your sqls/ddls against some higher-level schema.

Even if that turns out to be a constant source of trouble worth not having, then why SQL can’t provide columnsets at least, so that queries could include, group or join on these predefined sets of columns instead of repeating tens of columns and/or expressions and/or aggregations many times across a single query. You had employees.bio_set=(name, dob), now you add `edu` to it and it just works everywhere, because you think in sets rather than in specific columns. Even group by bio_set works. Heck, I bet most of ORMs partially exist only to generate SQL, because it’s sometimes unbearable as is.

justin_oaks · on May 12, 2022

That's a problem with select * in general, not a problem with using EXCLUDE with select *. So that still doesn't explain why it's not in SQL to begin with.

munk-a · on May 12, 2022

I've always viewed SELECT * as a convinence for schema discovery and a huge bonus for subqueries - our shop excludes its use at the top level in production due to the danger of table definitions changing underneath... but we happily allow subqueries to use SELECT * so long as that column list is clearly defined before we leave the database.

Worst, by far, than a column you didn't expect being added is a column you did expect being removed. Depending on how thorough your integration tests are (and ideally they should be pretty thorough) you could suddenly start getting strange array key access (or object key unfound) errors somewhere on the other side of the codebase.

throwawayboise · on May 12, 2022

Yeah I tend to use "select *" in interactive queries when I'm working out what I want, but then write explicit column names in anything going into production. This helps with the column-being-removed case, as the query will fail immediately selecting a nonexistent column, whereas "select *" will not fail and the error will happen somewhere else.

heyda · on May 12, 2022

"A traditional SQL SELECT query requires that requested columns be explicitly specified, with one notable exception: the * wildcard. SELECT * allows SQL to return all relevant columns. This adds tremendous flexibility, especially when building queries on top of one another. However, we are often interested in almost all columns. In DuckDB, simply specify which columns to EXCLUDE:"

It appears how this works is that is selects all columns and then EXCLUDES only the column's specified, the reason this doesn't exist in normal SQL is because it is a terrible idea. This is something that will break at many companies with large technical debt if it is ever used.

sagarm · on May 13, 2022

It can definitely be misused, but SELECT * is pretty handy for ad-hoc queries and to succinctly get all (or almost all) of the columns for a subquery or CTE.

gmueckl · on May 12, 2022

That's why good database wrappers support referencing columns in result sets by column name. It's good practice.