Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Any SQL wrapper worth its salt provides some mechanism to specify positional (or even named) parameters within that string (without resorting to string interpolation or concatenation) and pass them in when executing the query. Usually it'll be something like

    foos = execute_sql("SELECT * FROM foo WHERE bar = ? AND baz = ?", [bar, baz])
If you're resorting to concatenating strings, then you're almost certainly opening yourself up to SQL injections (and rather needlessly, I might add!).



Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)

Also, getting the library to properly expose the formatting interface is also a trick. We would do massive loads/dumps that we did not want to use Python for (but we worked in Python) as the DB-API pulls the entire result set into RAM, and we were pulling >RAM (and we didn't want to do something lower-level w/ the driver). We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. And the Python library really doesn't (last I checked) have a public interface to the templater portion of the library. (I think this is b/c PG just sends the parameters, and the binding is done server-side. So, really, psql needs an interface to supply those values as arguments to the CLI.)


> We'd spawn psql, but that needs the query, as a complete, pre-templated string — it does not support any sort of positional parameters. [...] So, really, psql needs an interface to supply those values as arguments to the CLI.

I haven't tried this before, so caveat emptor, but per psql's manpage this should be supported:

    QUERY="SELECT * FROM sometable WHERE name = :'foo'"
    echo $QUERY | psql -v foo=asdf # ... other args ...
Basically: you can use colon-prefixed variables in the query text (with the variable name optionally quoted to indicate how the result should be quoted, apparently?), and set them with the -v option (which is equivalent to using the \set command within the query string itself).

This also happens to work for table names, at least per the example in the manpage:

    testdb=> \set foo 'my_table'
    testdb=> SELECT * FROM :"foo";
Except in this case it'd be more

    echo 'SELECT * FROM :"foo";' | psql -v foo=my_table # ... other args ...
Still no option there for positional parameters, but it's a start, right?


> Which doesn't work for table names, or for building up where clauses gradually, etc. SQLAlchemy makes those much more trivial to deal w/ the SQL Expression bits. (Though technically, that's not an ORM.)

Thin SQL wrappers/dsl's like the SQLAlchemy expression library are great. IMO those thin wrappers are what most people should reach for first, over a full blown ORM. A good mimimal sql wrapper will:

* Save people from pain, problems, and security issues involved in building queries through string manipulation.

* Map very closely to raw SQL

* Make it easy to compose query expressions and queries, using language native features.

* Help devs develop their SQL skills. Learning the wrapper IS learning SQL, for the most part.

I have a hard time seeing many good reasons to add any more layers of abstraction on top. That last point is particularly important to me. My first exposure to many advanced SQL techniques, was through such libraries. Since the code maps to sql quite naturally, that learning can be applied in a much wider variety of contexts. Teach someone how to do aggregations using Django's ORM, and you've taught them to do aggregations in Django's ORM, and basically nowhere else.


I think the issue the parent is referring to is when you have optional clauses in your statement. Like if you have a table with 5 update-able columns (where someone might want to update any 1 to 5 columns that you don't know in advance), and you want some generic code that updates a row, if you're writing raw SQL then you have to do concatenations to build up the "set clause":

    UPDATE sometable SET {set clause} WHERE id = ?
The "set clause" might just be "foo = ?", but it might be "foo = ?, bar = ?", or "foo = ?, baz = ?", etc.


SQL's got you covered here:

    UPDATE sometable SET
        foo = COALESCE(?, foo),
        bar = COALESCE(?, bar),
        baz = COALESCE(?, baz),
        bam = COALESCE(?, bam),
        bat = COALESCE(?, bat)
    WHERE id = ?
And then you'd pass in NULLs for any columns not being updated at the moment.

Confirmed that this works exactly as expected in SQLite:

    $ sqlite3 optional_column_test.db
    sqlite> CREATE TABLE sometable (id INTEGER PRIMARY KEY, foo, bar, baz, bam, bat);
    sqlite> INSERT INTO sometable (foo,bar,baz,bam,bat) VALUES ('You', 'are', 'a', 'bold', 'one');
    sqlite> SELECT * FROM sometable;
    1|You|are|a|bold|one
    sqlite> UPDATE sometable SET
       ...> foo = COALESCE(NULL, foo),
       ...> bar = COALESCE('ain''t', bar),
       ...> baz = COALESCE(NULL, baz),
       ...> bam = COALESCE(NULL, bam),
       ...> bat = COALESCE(NULL, bat)
       ...> WHERE id = 1;
    sqlite> SELECT * FROM sometable;
    1|You|ain't|a|bold|one


I'd never heard of COALESCE before! I'm very very far from a SQL expert, but I'm surprised that's never popped up for me before.

Are there any bad performance implications here? Like, will it still rewrite columns that don't need to be updated?


Good question.

The answer to that is almost certainly "it depends on your DB". Most query planners should be smart enough to not try to write anything, but some are smarter than others. For example, SQL Server (if I'm understanding right) does perform a log write even if it should be obvious that the value won't change: https://www.sql.kiwi/2010/08/the-impact-of-non-updating-upda...

That being said, I'd skeptical of the performance impact (if any) being all that significant; even in a worst-case scenario of "yeah, it's going to stupidly write every field every time", the columns for each row should be pretty close together both in-memory and on-disk, so unless you're writing a whole bunch of giant strings or something at once (and even then) there shouldn't be a whole lot of seeking happening.

tl;dr: assuming there's a minor performance impact is reasonable, but it should probably be measured before trying to optimize it away.


So you mean I'm writing code like this, in a language somewhat like Typescript but simplified to my end.

   function update(id: ID_TYPE, value: Partial<sometable>) {
      const set_clause = value.keys.map(k => k + " = ?").join(", ")
      const params = value.values.concat([id])

      db.query("UPDATE sometable SET " + set_clause + " WHERE = ?", params)
   }
How is code not terrible? It shouldn't pass code review. Once you concatenate strings to generate your sql, all bets are off. There will be a day when your object doesn't match your assumptions. Either you have a series of different functions, each of which will change under different circumstances; or you should just be generating this function like this directly from the schema in which case you might as well just generate code you want at build time so there's string concatenations anywhere.

Is there some other circumstance when you want that?


It is terrible. That's the problem. No one wants to write code like that, so they turn to an ORM to do it for them, in a hopefully safer way.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: