Hacker News new | past | comments | ask | show | jobs | submit login

That is more reasonable than what I thought you were saying.

However what happens when your list of columns is long? What happens if you want to include a CASE statement in a field?

I use vim and personally solve the scrolling problem with :split. This is particularly important in making sure that the SELECT and GROUP BY match up. (I am perpetually annoyed that the GROUP BY is not inferred from the SELECT. Unfortunately multiple databases have invented different inconsistent behavior for missing stuff in a GROUP BY, so there is absolutely no possibility of getting agreement on the convenient default of grouping on all non-aggregate functions that appear in the SELECT and HAVING clauses.)




Sorry for the confusion.

When my list of columns is long, I split it up into multiple lines, usually 7-8 per line (where I work, column names are capped at 30 chars, hence the 200 character estimate), unless it is a case statement. In case statements, each case/when clause get its own line, so you'll have:

case when condition then result \ when condition_2 then result_2 \ ...\ when condition_n then result_n end as column_name \

All of my queries and statements follow the same logic and set of rules, I just prefer a more compact view than most it seems (which could indicate that I'm optimizing for a different set of constraints/preferences).

I'll have to look into using :split.

Honestly I don't think about formatting that much anymore, as I've started storing most of my statements as "metrics" which can be easily repurposed for use in another script. Each metric contains the name of the metric, the columns added, the metric source (usually a table, but can also be a select stmt), the possible join conditions, extra SQL like where, group by, qualify, and any indices. When I want to use that metric in another script, I can do so by specifying the metric name, the table or select statement I'd like append the metric's columns to, and my join conditions. It works great for simple to moderately complex queries, and provides you a good starting point for really complex queries. Plus, it auto-formats everything how I like it. I also use comments to explain what the statement is doing, so someone theoretically should be able to get a pretty good idea of whats going just by reading that.

I really only think about formatting when I'm reading others code and trying to make sense of it, which is where I find the the formatting like the author mentioned most annoying/frustrating (perhaps because it's different than how I think/do things?).


My personal experience when I tried to clump things together is that I lost track of what columns were where. My eyes scan vertically really fast. And once I was used to it for large queries, well, what works for large queries works for small as well if you're used to it.

Honestly I don't think about formatting that much anymore, as I've started storing most of my statements as "metrics" which can be easily repurposed for use in another script. Each metric contains the name of the metric, the columns added, the metric source (usually a table, but can also be a select stmt), the possible join conditions, extra SQL like where, group by, qualify, and any indices. When I want to use that metric in another script, I can do so by specifying the metric name, the table or select statement I'd like append the metric's columns to, and my join conditions. It works great for simple to moderately complex queries, and provides you a good starting point for really complex queries. Plus, it auto-formats everything how I like it. I also use comments to explain what the statement is doing, so someone theoretically should be able to get a pretty good idea of whats going just by reading that.

I do something different that you might like to borrow. If you're working in a database with real tmp tables (PostgreSQL, MySQL and MS SQL all do, Oracle does not) you can control execution plans for complex queries by creating a series of tmp tables in order, adding indexes if you want, then joining them down the chain. I initially did this for performance reasons, but quickly found that I could easily reuse the logic to create specific tmp tables in different places, and that a series of tmp tables was much more readable than complex subqueries.

You may wonder why I specifically dissed Oracle, after all they claim to have tmp tables. They do..but all tmp table definitions are in a global namespace that requires special permissions to write to, and are shared across all database handles. All of the other databases that I named allow you to create/destroy them on the fly without special permissions. And do not have an issue if two database handles want to use the same tmp table name with different definitions.

This opens you up for all sorts of nasty interaction problems. And increases the cognitive overhead of the technique to the point where it becomes not worthwhile.


It would seem we're each doing what works for us.

I work mainly in MySQL and Teradata (which has tmp tables which are called volatile tables), and I do exactly what you describe when creating complex queries. My metric system is just a way to build those temp tables more rapidly.

I use two main functions to manipulate metrics:

  create_metric(metric_name,{cols_added},join_src,{join_cols},{extra_sql},{indices});
This stores the metric for later use.

  add_metric(metric_name, my_other_table, {my_join_conditions});
This retrieves the metric, and returns a string of (Teradata) SQL in the form:

  CREATE VOLATILE MULTISET TABLE add_metric_name AS (

  SELECT a.*, b.cols_added_1, b.cols_added_2,...,b.cols_added_n
  
  FROM my_other_table a

  LEFT JOIN join_src b

  ON a.my_join_condition_1 = b.join_col_1

  AND a.my_join_condition_2 = b.join_col_2

  ...

  AND a.my_join_condition_n = b.join_col_n

  [If there is extra sql, like where a.condition = X, or group by's like group by 1, 2, it would show up here. SQL here can reference the join columns and table name in an add_metric stmt]

  ) WITH DATA PRIMARY INDEX(indice_1, indice_2) ON COMMIT PRESERVE ROWS;
I can also store entire create tmp table chains as metrics, with the last table appending all of the information from that chain to another source (I do this by storing the chain as preparatory sql, which is run before the create add_metric_name statement.

It also allows me to search all of my metrics on a number of different dimensions: the common name of the information I am adding (metric name), column names, tables, join conditions (particularly useful - It helps you map how you'll get from one metric to another), indices, or any combination of the above. For example I can find all metrics that have the word phone in the metric name and are joinable on user_id.

I'm aware of Oracle's lack of tmp tables. My fiancée has to use Oracle SQL at work, and I quickly discovered its lack of tmp tables when trying to help her solve a SQL issue.


Yup, they look like similar solutions to similar problems.

One of the things that I built into reports at that location was the ability to see all of the tmp tables that had been created, and the ability to stop the report on any particular one and display that. I built this as a debugging aid for myself, but was quite surprised when finance came to me one day and said, "Report X is going wrong on step Y - it looks like you're filtering out duplicate records."

I like having users that will debug my stuff. :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: