Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: SQLite query inside a Bash function
134 points by chmaynard on July 7, 2021 | hide | past | favorite | 62 comments
Part of the workflow for building my website is the generation of a table in tab-separated column format (.tsv). The source data is found in four other .tsv files. I use an SQLite query to perform a 4-way join and write out the new table. For convenience, I wrote a script that encapsulates the query inside a Bash function.

The example below illustrates this technique.

    repertoire() {
      pushd $CMM_SOURCES/_data
      sqlite3 <<EOS

    .headers on
    .mode tabs
    .import category.tsv category
    .import composition.tsv composition
    .import concert.tsv concert
    .import program.tsv program
    .once repertoire.tsv

    SELECT 
      category.name AS category, 
      composition.key, 
      composition.composer, 
      composition.name AS composition, 
      concert.name AS concert
    FROM 
      category, 
      concert, 
      composition, 
      program
    WHERE 
      julianday(concert.date) < julianday('now')
      AND composition.category = category.name
      AND program.key = composition.key
      AND program.date = concert.date
    ORDER BY 
      category.sequence, 
      composition.key
    ;

    EOS
      popd
    }


You can do this with https://github.com/dinedal/textql as well, and a little easier as you don't need to import the data explicitly.

  SQL_QUERY=<<-EOS
    SELECT 
      category.name AS category, 
      composition.key, 
      composition.composer, 
      composition.name AS composition, 
      concert.name AS concert
    FROM 
      category, 
      concert, 
      composition, 
      program
    WHERE 
      julianday(concert.date) < julianday('now')
      AND composition.category = category.name
      AND program.key = composition.key
      AND program.date = concert.date
    ORDER BY 
      category.sequence, 
      composition.key
    ;
  EOS

  textql --header --dlm=tab --sql $SQL_QUERY category.tsv composition.tsv concert.tsv program.tsv


Seems like textql is a wrapper around sqlite, I'm guessing staying with sqlite directly makes it easier to interact with other tooling.

One nitpick with textql is that it says "sqlite import will not accept stdin, breaking unix pipes. textql will happily do so.", but that's not true, you just need to tell it to use stdin by doing:

    sqlite3 '.import /dev/stdin {youtablename}'
I do this all the time for using gzip and sqlite to get compressed import with progress. For example filtering a compressed CSV from a SQL query with progress for import:

    pv icons.csv.gz | gzip -dc | sqlite3 ':memory:' -csv '.import /dev/stdin data' '.output result.csv' '.headers on' "SELECT name, keywords from data where name = 'chart-area'"


Do keep in mind that /dev/stdin{out,err} aren't completely portable. Bash emulates them if they don't exist, but this usage bypasses Bash. So there are unix like platforms where this trick won't work. They do, though, seem to be supported on the ones that are most popular now (Linux, MacOS, FreeBSD).





+1 for csvkit because it is extremely useful when dealing with people who love spreadsheets to the point of insanity.


If you don't intend to fill in/expand variables - you probably want to quote the HEREDOC tag,so rather than:

  sqlite3 <<EOS
  ... WHERE user = $(rm /etc/shadow)...
  EOS
You might want:

  sqlite3 <<'EOS'
  ... WHERE user = $(rm /etc/shadow)...
  EOS


I recommend adding a little trick to ignore existing ~/.sqliterc files, which makes the script more portable:

    cat <<EOF | sqlite3  -init <(echo "") "$DBFILE"
    ...your sql...
    EOF
For one-line queries I use:

    q() { echo "$@" | sqlite3 -init <(echo "") "$DBFILE"; }


Interesting. I noticed that "-init" doesn't error out if you give it a filename that can't be a file, like . or /. Using /dev/null seems to work also.


The cat in the first script can be omitted I think (by using the heredoc on sqlite directly).


The "here document" (<<EOS) is a nice workaround to keep from having to correctly quote a bunch of SQL syntax.

I do agree, though, with some of the other comments that you will probably want a "real language" other than bash if this grows at all. Passing in string data for a where clause, for example, would get cumbersome and error prone.

Python or Perl would be good choices if it goes there.


I wasn't aware that '<<' is a shell directive. That often happens when I mindlessly copy code snippets from somewhere else.

https://tldp.org/LDP/abs/html/here-docs.html


It is very worthwhile to read through the bash man page once, to give you a feel for what the shell does.


You might also be interested in "here strings", <<<.


Speaking of SQLite, this CoRecursive podcast a few days ago was enjoyable: https://corecursive.com/066-sqlite-with-richard-hipp/


Quite agree. And he seems like such an agreeable person too. Listening to Richard Hipp and his thought process felt familiar in the sense that he is the traditional old school programmer, and I know a few of those.


second this! I read the transcript and it was quite enjoyable indeed


Personally, I like to see sqlite builds into the bash.

For features such as:

   Continuous logging the bash commands, env, results, exec time in a sqlite database for security/auditing and regression testing. 


   The db file can be kept opened, no need to spawn a separate process for each command logged. 
Anyone else interests in such approach. Anyone thinks of any potential issue for such integration?

If there are enough interests, I can start a open source project for this.


Interesting idea. How would you add sqlite bindings to bash? Via a plugin?


Probably something like the following:

1) Add sqlite_log.[ch] files to bash source repo. 2.a) Add sqlite_init_db(...) to init a database per bash_sqldb_{date}_{pid}.db per bash pid on ~user/.bash_db/ dir. 2.b) Add sqlite_log_cmd(...) to log the cmd, evn, pid, to the opened db. 2.c) have the bash command exec routine to call the sqlite_log_cmd(...) on each cmd execution. 3) Query, report of the commands execution will be handled outside with bash scripts/sub commands/web intf.

* Should be very low overhead (micro-seconds) for each command.

Now I think this out a bit more, maybe it is better to write ebpf script and pipe all the cmds exec to one centralized sqlitedb (instead of per user/pid and only for bash). Likely more useful from system security auditing POV, easier to extend by adding network connetions, file open, privilege escalation type events to the DB.


You could do it in Zsh with a "module".


Yup, I do this all the time! :) I write Python programs that use sqlite and set up the DB state with bash.


Why not sh instead of bash. This is a script; it is not an interactive shell. sh is faster as a scripting language. Don't take my word for it. If you are using a UNIX-like OS such as BSD or GNU/Linux, sh has been chosen as the default scripting shell, not bash. Bash might be the default login shell (interactive use) but it is not the default scripting shell (non-interactive use).

This is only one function. We cannot see the rest of the script. Maybe there is a compelling reason1 to to use pushd and popd instead of using sh features.

1. i.e., no way to achieve same result in sh


Is comparability with sh really a worthwhile goal anymore?

I haven’t seen many places with sh but not full bash (busybox being the main exemptions but that breaks most things due to muslc anyway).


What OS are you using. It may use bash for login, but I would be willing to bet it does not use bash for scripting (non-interactive use).

It is possible that many shell script authors do not understand the difference between a login shell (optimised for interactive use) and a scripting shell (optimised for non-interactive use). Thus the comparison could be worthwhile. sh is faster, more portable and arguably presents fewer potential gotchas. FWIW, shellschock affected bash but not sh.

NetBSD uses ash-derived sh as both login and scripting shell. No bash. Debian and many, many other GNU/Linux distributions (including musl-based Void) all use NetBSD-derived sh called "dash" as the default scripting shell. I am not aware of any Berkeley distribution that uses bash as the default scripting shell.


So use a shebang that specifies Bash, or invoke it with `bash` on the CLI.

The default scripting shell might be important for typing like X11 startup scripts, but I don't see why you have to use it for standalone scripts just because it's the default.


I write my scripts in Zsh and Python. Maybe once in a while Perl, if I really want Sed but can't figure out the right incantations with branching and the hold space.

Why? Because they are programming languages that have features that I want.

Bash has more features and functionality than Zsh. It's also pretty much ubiquitous. If your scripts are only meant to run on your computer, and you don't need them to work under Busybox or whatever, why constrain yourself to the limited set of features in POSIX sh?


"If your scripts are only meant to run on your computer, ..."

is it likely you would be sharing them with others on the web.

Thank you for sharing your story. Here's mine. I write scripts in sh. Why? Because it's small, fast and portable. On the OS I use it is not just "pretty much ubiquitous" it is ubiquitous. I have multiple computers running different OS and I want scripts that work on all of them. Some of these computers are resource-constrained and installing additional shells and other interpreters plus libraries is not worth the cost. Anyway, there is no need to do that.

If I need scripts that are small, fast and portable, why should I use zsh plus perl or python.


Consider that people don't write scripts to work with your system, but with their system. When I see a Ruby script, I don't complain to the author and tell them to use Python instead.


OK, now I understand how you interpreted my comment. Your interpetation is incorrect. I am neither complaining nor telling anyone what to use. I am asking a question. The comparison of Ruby versus Python is a little strange; they are two totally different languages. Bash OTOH is a superset of sh. Same language, with more features. I can usually modify most bash scripts I encounter to run under sh; most do not use the more advanced bash features. That is why I often wonder "Why not sh (for non-interactive use)?" You get better speed and portability and you do not have to install anything.



I would add a `#!/usr/bin/env bash` at the top, so readers know that it's meant for Bash.


If you're sweating quoting issues in here-documents, you can put the sqlite query in a separate file and tell sqlite to execute that file instead.

Here's a helpful example from SO: https://stackoverflow.com/a/21759264/711585


True, I don't see the benefits of writing it in bash/shell script in this case. Plus you can get proper syntax highlighting for SQL in your text editor for ".sql" file.

  sqlite3 database.db <query.sql
Some other DBMS have similar things, for MariadDB you could write it in a "
.sql" file as well, setting variables and use it so that you can write a generic sql file. (seems you can't use the variable thing with sqlite3)

  MariaDB> SET @username='UserName@localhost';
  MariaDB> SET @hostname='localhost';
  MariaDB> SOURCE create-user.sql;
Such things is usually cleaner/simpler than writing in bash.


For what it's worth, you can have "nested" syntax highlighting of SQL inside Bash scripts, with Tree Sitter in Neovim.

You could also read the contents of the query file into a shell parameter with the `read` builtin.


Fun fact: you can do this with Sed and Awk, too. No more hideous unreadable one-liners.


You are not using any variables or anything from bash in that query. Wouldn't it make more sense to extract it to a .sql file and just run that, rather than your (rather long) heredoc?


Agreed. I like having the query in situ during development because it simplifies editing. But there are disadvantages that you and others have identified.


You can encapsulate almost anything in a bash function via heredocs like that. The disadvantage is that editor features like indentation and syntax highlighting rarely work.


This is a perfect use case for a shell script and sqlite. No need to complicate with anything else.


this is really cool! Would you mind if I ask about "performance"? I'm working on a toy project with luajit and I can either use ffi and SQLite C API (which can be a little complicated) or just use the CLI and capture the output of a command as a string and parse that.

As I'm writing this I'm thinking that I should just run o=some tests using both approaches, but, how's the performance for you? Any noticeable delays or something of the sort?


My guess is that the query itself takes the same amount of time either way, C API or CLI. There may be some I/O overhead when using sqlite3, but I doubt if it's anything to worry about. The beauty of using sqlite3 is that it hides the complexity of the C API, which is procedural.


neat


Thanks.


If you're trying to put sql queries into a bash script, I beg the question to rewrite said script in a programming language. There are of course a bunch of other factors that would justify a rewrite so this isn't a hard rule.

However, having written hundreds of lines of bash gluing systems together for various CI jobs and automations, at some point you need to stop writing bash and switch to something with a richer data model and easier to maintain/test.


Meanwhile the static sites i generate from cronjobs with bash have been rock-solid for years (jq and sqlite are awesome!), while the ones i did in python breaks on updates or cause other headaches (not just dynamic sites). Sure if its webapp, i'd do it in react or something.

Bash/shell is great once you realize it's a glue language. You avoid doing anything but pipelines, and if theres no command/tool to do something, you can write that in other language and include it in the pipeline.


But if OP isn't writing hundreds of lines of glue code and their solution works well for their use case, there's not much reason to re-write anything for scale that'll never happen.


In my opinion this is not just for scale. In my opinion anything more than a one-liner is too large of a scale for bash.

More thoughts on this: https://blog.habets.se/2021/06/The-uselessness-of-bash.html

Another piece of evidence is that I do many many code reviews. Not once have I reviewed bash script changes that did not have subtle bugs that would bite us in real life. Not once!


Doing all my scripting in Go because some people don't care to learn bash seems like a small version of hell.

If I'm using bash, I just need to glue a few tools together and I've most likely decided already that concurrency more complex than simple wait(1p) is unnecessary, and that fine-grained error handling is unimportant.

If I really cared about those things, I'd probably use something like Haskell that actually improves on error handling, unlike Go, or if I'm scripting something for a specific software project, whatever that project's language is.


It's not so much about them not knowing bash, as it takes MANY lines to do things correctly sometimes. Per the example in my blog post full of mkfifo and stuff.

But sure, my choice is Go. You do haskell if it's best for you.


What language do you believe is better for DevOps / Linux systems engineering than Bash?


Once you need concurrency or have anything but the most basic logic almost anything trumps bash. It’s possible to implement basically anything in bash, of course, but that doesn’t make it sensible. Its error handling is poor, there are too many ways to do things (eg [ vs [[ vs test), dealing with non-scalar data is a pain, it has subtle gotchas (accidentally spawning a subshell and shadowing variables), it’s cryptic (${foo#bar} and friends are cool but much harder to remember than something like .replace()), and so on.

If I’m just writing small tools that wire a few basic commands together bash is fine. For anything that actually needs data to be crunched or decent error handling Python, Go, Perl (though also cryptic), C++ are, IMO, a far better choice.


The cryptic nature of Bash syntax is why I generally try to do all my scripting in Python these days, if it exceeds a certain level of complexity. i.e. if I want to just group a bunch of shell commands together with no special logic, Bash is usually OK. For anything more complicated than an if statement or while loop I have to refer back to old scripts to remember the syntax. Not an ideal situation.

I did recently have a case where I needed to run a shell command with a multi-line output, and an easy call to os.popen() wasn't returning the full output. So, back to Bash arrays and weird text parsing syntax (IFS=" " read tempArray...) I went...


Powershell with its inconsistent returns, weird pipeline behavior at random, and bad default output settings is not better but like all crappy pop songs pushed by the music industry it will gain popularity.

NodeJS and Python could be contenders but Bash has them beat in longevity.


How is python better? If I want to say install 5 different yum packages and then do a kubernetes command the bash is pretty straightforward. With python I would have to do subprocess.call and then handle the exit codes and errors in python.


Both Node and Python attempt an OS abstraction layer which in theory could be an advantage in practice it is a leaky abstraction. Both have many packages which could enhance scripting and even the opportunity for private repos. Installing from yum makes me believe you are using Fedora so cross platform might not be a concern for you. If running on windows is a concern Cygwin could be an opportunity. I'd say Bash, Node, Python, then PowerShell in order of advantage.



In English, meaning is defined by common usage. I know the formal logic hipsters have mounted a persistent campaign to make an exception here, but the fact remains that "begging the question" means exactly what ordinary people think it means.


I've spoken and written English for over three decades, and this is the first time I've seen "begging the question" be used to mean this.

It's usually meant as "raises the question", but language lawyers insist it means what the wiki page said.

But this is neither of those two. And if you're the only one who uses a phrase a certain way, and it means other things to everyone else, then that's actually wrong, even in a descriptive language.


This is one of the cases for which, when writing for a broad audience, it's best to avoid the phrase in question, in either of its senses—precisely to avoid confusion or distraction like what's happening in this thread, with the common definition, and to avoid confusing the uninitiated with its more "correct", but jargony, sense—but also to accept the common meaning in others' language without complaint (ahem).


While I agree with you in principle, I think it’s more than permissible to make this distinction on this forum given the audience, IMO, even if the person who made the comment admittedly lacked tact in their approach.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: