Hacker News new | past | comments | ask | show | jobs | submit login
What's New in SQLite 3.35 (nalgeon.github.io)
496 points by nalgeon on March 12, 2021 | hide | past | favorite | 167 comments



`RETURNING` will substantially clean up my code, and I already have one migration which could have just been a `DROP COLUMN`, so this is great news.

On the subject of "it's called 'lite' for a reason", my wishlist does include library functions for working with RFC3339 timestamps. SQLite already ships with a fairly large suite of JSON tools, which are optional to compile into the library, so there's precedent.

Datetimes are of those things which is incredibly annoying to get right, and really belongs inside the database. RFC3339 timestamps are already well designed, since if you stick to UTC (and if you don't store timezone data separately you deserve those problems), lexical order is temporal order, but queries which would be rendered in English as "return all accounts where last payment is ninety days prior to `now`" isn't really possible with string comparisons.

Also, with the JSON library, you can use a check constraint to fail if a string isn't valid JSON, another affordance I would love to have for datetimes.

Grateful for what we just got, though! Just daydreaming...


SQLite has ISO-8601 compatible date functions, isn't that enough?

  sqlite> select datetime('now', '-90 days');
  2020-12-12 21:44:22
https://sqlite.org/lang_datefunc.html


Beats a swift kick in the pants!

You're right, that was a bad example. Maybe it's just me, but I've never figured out how to get SQLite to do a query like "select from orders where the order was on a Tuesday in Pacific time". I don't think you can; that requires predicates, and all I see are strftime and some useful pre-cooked variations on it.


It looks like this works, but it's pretty gnarly.

I have datetimes stored in UTC, so I first needed to convert them to PST by applying the 8 hour time difference.

Then I used strftime('%w') to get the day of week (as a number in a string).

Then I can filter for that equalling '4' for Thursday.

    select
      author_date,
      datetime(author_date, '-8 hours') as author_date_pst,
      strftime('%w', datetime(author_date, '-8 hours')) as dayofweek_pst,
      *
    from
      commits
    where
      dayofweek_pst = '4' -- Thursday
You can try the query out here: https://github-to-sqlite.dogsheep.net/github?sql=select+auth...


On the one hand: very impressive!

On the other hand: Wrong from March 14th to November 7th of this year.

This is where a richer standard library of datetime functions would come in handy. Another example: strftime's %W starts on the first Monday of the year, when the ISO week calendar starts the first Monday of the week containing the first Thursday (so the first week is sometimes found in the previous calendar year).


Hah, yeah PST v.s. PDT does make this harder.

At that point I'd switch to using additional custom SQL functions. I wrote a plugin called datasette-dateutil https://datasette.io/plugins/datasette-dateutil that exposes some extra functions, but it doesn't have anything useful for timezone calculations yet.

I've opened an issue: https://github.com/simonw/datasette-dateutil/issues/4


If C++ is your thing, https://github.com/yangacer/sqlite3cpp might be interesting. It intelligently wraps C++ functions as SQL functions. Author here and looks forward feedbacks :).


I just wrote something using https://github.com/SqliteModernCpp/sqlite_modern_cpp, which I chose after 10 minutes of searching around. Should I have used yours instead? How should I decide?


The good news is that you could use both libraries. `sqlite3cpp::database` supports "attaching" mode that won't close a `sqlite3` instance after sqlite3cpp::database going out of scope. Here is an example for coexisting with SqliteModernCpp:

  // db inited by SqliteModernCpp
  sqlite::database db(":memory:");
  {
    sqlite3cpp::database attached_db(db.connection().get());
    // create the SQL function as you like
    attached_db.create_scalar("mutiply", [](int x, int y) {
      return x * y;
    });
  }  // attached_db is going out of scope

  // SQL scalar function `multiply` is still availble
  db << "CREATE TABLE numbers (n1 INTEGER, n2 INTEGER);";
  db << "SELECT multiply(n1, n2) from numbers;";
Note that SqliteModernCpp provides customization of SQL function as well as sqlite3cpp. Though it doesn't seem to have aggregate function supporting (i.e. sqlite3cpp::database::create_aggregate).


The ISO8601 weekdate format[1] is tricky, but it can be done:

    SELECT strftime('%Y', yyyymmdd, '-3 days', 'weekday 4') || '-W' || substr('00' || (((strftime('%j', date(yyyymmdd), '-3 days', 'weekday 4')) - 1) / 7 + 1), -2, 2)  || '-' || replace(strftime('%w', yyyymmdd), '0', '7')  AS yyyywwwd ;
returns ISO weekdate format ('2021-W10-6') the when the input column `yyyymmdd` is in year-month-day format ('2021-03-13'). Not pretty, but it works. The annoying part for me is actually that '%w' returns 0 for Sunday, 1-6 for Mon-Sat, rather than the ISO 1-7 for Mon-Sun.

If sqlite supported macros, we could have put the entire expression in a new "function". But you could (ab)use m4 or even the C preprocessor for that :).

[1] https://en.wikipedia.org/wiki/ISO_week_date


Re the annoying part, a different way to get the day of week in the rance 1–7 is

  1+(6+strftime('%w',yyyymmdd))%7


Without a timezone the time is meaningless for many purposes.


We store 8 million transactions a day in UTC...customer time zone is merely an integer offset stored alongside...simple and easy, what’s wrong with that?


The customer's offset is always changing but I assume you mean you're storing the offset as it was at the moment of the transaction.

How do you deal with the twice a year time offsets when a transaction occured first but shows up after because the clock rolled back an hour?

Isn't it generally better to store to the full timezone like America/New_York?


For daylight savings, the offset is generated in real-time in our app at the time of transaction and then stored inside the db...so if 2x TXs occurred in CST at 1:58am (1 before and 1 after DST went into effect at 2am) they would both show up as being between 1-2am on the day of time change. However, the spike in data would show the anomaly for our use case. For a more mission critical app, you could have an additional tinyint field that would get set as 1 for all transactions that occurred in these transitional periods to be able to show data with or without the daylight savings.


I've been storing my datetimes as 64 bit unix timestamps. If I need any additional timezone information, it lives in additional columns. The only edge case we have just requires looking at the user's timezone to project certain facts correctly.


Yes native date and time handling would be awesome. I dislike dealing with them as strings.

And then temporal tables. And natively versioned tables. I can dream...


> Column removal

Wow, finally. I Love SQlite. It saved my a* in so much projects where I had to implement an ETL, I just spawned some in memory SQLite database instead of writing some painful custom code in Language X or Z over and over again. Also used it to generate static search results in a flat file blogging platform since it does have some full text search capabilities. And let's not even talk about custom functions, including custom aggregation functions, in any client language, directly in your application code...


SQLite is really great at crunching data! I definitely prefer it over pandas in most cases, as SQL is naturally fit to joins, aggregates etc. Also SQLite works natively with JSON, which is a huge time saver.


I prefer it over pandas for joins etc too. My workflow is (1) do the simple stuff in Python using pandas (2) for some of the complex stuff, I just start creating sqlite tables. If you have datasette installed, you can also view the tables (choosing to write intermediate ones for greater debuggability) pretty easily in your browser.


datasette is a wonderful piece of software! Simon Willison has created a great tooling around SQLite. Can't imagine how much time and energy he has invested in these projects.


Honest question because I haven't messed with Datasette much beyond skimming the home page - how does it improve on a general SQL client like DataGrip or Squirrel or DBeaver?


Obviously I'm biased, so I'd love to hear answers to this from other people (plus I've not really used any of those alternatives much).

Datasette is very "webby". Queries you execute end up in your URL bar as ?sql= parameters, which means you can navigate to them in your history, bookmark them, share links with other people (if your Datasette is shared) and open them in new tabs.

It also does web-style tricks like turning foreign key references into hyperlinks through to the associated records.

Datasette's table browsing feature has faceting, which is enormously powerful. I don't know if those alternatives have this feature or not, but I use this constantly. Demo here (the owner, country_long and primary_fuel columns): https://global-power-plants.datasettes.com/global-power-plan...

Datasette's plugin system is pretty unique too. You can install plugins like https://datasette.io/plugins/datasette-cluster-map and https://datasette.io/plugins/datasette-vega to add visualizations, which again are bookmarkable and hence easy to share with other people.

All of that said, I don't really see Datasette as competing with existing SQL clients. It's intended more as a tool for exploratory data analysis - I've put very little work into running UPDATE/INSERT statements for example, it's much more about turning a set of relational data into something people can interactively explore.


I work in a research organization where I am responsible for crunching data and produce reports highlighting the most "notable" results. Not that the other data is uninteresting, but the volume is such that Excel cannot handle it and even distributing it can be challenging for non-computer-technical folks without dedicated solutions.

Instead, I can dump all of the processed results into a table, create some views highlighting analysis X vs Y, and share links that give others the ability to ask questions I had not even considered. Now the user is empowered to ask anything and they do not need to engage me for "simple questions". Everybody wins. I believe there is also an extension that allows you to generate and save new queries through the web interface.

It is not a tool for a professional analyst, but a means to collaborate with others. There are heavier/more feature rich alternatives, but Datasette is my favorite tool for getting results out the door without hassle (can run it off of a laptop after a pip install).


It's so great to hear people using it like this!

https://datasette.io/plugins/datasette-saved-queries is the plugin for storing queries - it's pretty basic, there's lots of scope for improving the story around that.


I have had such enthusiastic feedback from granting people access to the ~full dataset. They have been conditioned to expect whatever subset can fit inside an email or a powerpoint slide. I feel a little embarrassed when people fawn over the utility because it is so easy to get running.

Have not yet had a chance to try the idea, but I am toying with using render-images to bake in pre-built plots + markdown for reporting the output. Queryable report in a file. Dynamic Vega plotting (RShiny-ish) is also in the back of my mind, but that feels too close to magic.

It is an incredibly useful tool, and I appreciate the workflows you have enabled.


I use datasette, datagrip and excel to process sales data.

- datasette to surface data via rest to excel (power query)

- datagrip to get the data how I want it, the json1 extension is so so much easier to work with than power query and for my use cases extremely fast.

This gives you (arguably) the best data grid in the world (excel) but without the horrible experience of building a pipeline in power query that will eventually become too slow and/or randomly crash and hang.

I would really, really like an in the box Regex extension so I can create SQLite views without the crazy lengths I have to go with SQL to, for example split a comma delimited list in a field.


I get the feeling we’ve been trained to expect less, with poor, incredibly slow legacy products. But really for many use cases all the opensource relational databases give instant results.


> I had to implement an ETL

Buddy, google DuckDB, it's like OLAP SQLite. You'll thank me later.


Does someone here use DuckDB in production? Is it as stable as SQLite?

The hosting company I have to work with has a very old version of SQLite installed on the server and they don't want to update it. So I was looking at whether I could replace it with DuckDB since it seems to be easy to install with pip.


Answering the question you didn't ask:

pysqlite3-binary is a Linux only package in PyPI that includes a recent version of SQLite

apsw is a cross-platform package, it brings in a modern version as well, and additionally exposes everything SQLite can do (useful for me, you can write VFS drivers in python for it). The version of apsw in pypi is hopelessly out of date, the homepage has more details on how to install the latest version.


Thanks for the recommendations, I will have a look at pysqlite3-binary and apsw.


I very, very much like the idea of DuckDB, but have had many headaches with what I would consider "low-medium" amounts of log data (400 million rows, 180GB+, after stripping out any un-needed text) which works without issue using SQLite, btw, other than the expected slowness of row-oriented storage on the column-aggregate queries that I needed.

I have tried many different ways of importing the data with no luck. Usually it turned into a memory issue. This was irrespective of how I tried to import, whether it was in batches through a script, or using the built-in import functionality, or breaking the data into chunks prior to import, or anything else I tried. I occasionally keep trying to find a way, because I like the idea and features of DuckDB so much.

Of note, MonetDB slurped it right in without issue, but I would much rather use something not so "heavy" like DuckDB.

Also, regarding SQLite, I have DBs with billions of rows that work flawlessly, though not as fast as postgres.


You could try Clickhouse. It's a bit heavier than DuckDB and the default mode is server-client. But you can also use the client (a single binary) without a server to directly query data from csv or parquet files.

edit: added a better link, the stand-alone mode is called clickhouse-local

https://clickhouse.tech/docs/en/interfaces/cli

https://altinity.com/blog/2019/6/11/clickhouse-local-the-pow...


Interesting! I did not know about the standalone cli feature. Thanks! Will check it out.


One of the DuckDB authors here. Can you open an issue wrt the import problem? Should not happen. Try setting a database file and a memory limit.


DuckDB is designed for query processing, not updating data - I guess you would not want to use it for the transaction processing workloads just because it's easier to install. If you are doing mostly complex queries, then DuckDB is great, but I think there is going to be little usage directly in production, I'd expect it's more used for data processing


Yes I'm doing very few update or insert, but I run a lot of queries, so I guess it should be more than okay for DuckDB.


And how great is SQLite for transactional workloads? Every transaction locks up the entire db. There is no page or even table level locking. DuckDB can't be worse than that.


SQLite in WAL mode allows single concurrent writer and unlimited readers. Which is completely fine for a wide range of production use cases.


In practice, I've found that most SQLite write operations take low-milliseconds to complete - so even under a very heavy write load (assuming WAL mode) you're unlikely to see any contention between writes.

For my own stuff I maintain an in-memory Python queue of write operations and apply them in order against a single write connection - while running numerous other read-only connections to serve SELECTs.


SQLite is not at the top of the charts, but there's a looong ways to drop below it among the wide variety of desirable properties that it offers.


> Probably the second most popular source of user suffering.

It was #1 for me! Finally indeed.


I would have guessed desire to be the #1 source of user suffering.


I'm curious about this as I always prefer files and collections of structures or classes than having to deal with databases and ORMs. What else can you do aside from joining and aggregation?


You don't need to use an ORM, just use a query builder.


Possibly the first step towards the ability to rename a column?


Rename is already here! Came in 3.25.0 (https://www.sqlite.org/changes.html)


I love SQLite. Some of the best, most reliable code out there made by a small handful of people who really care about excellence. I’m curious if anyone here as experience with DQlite (https://dqlite.io/ ) from Canonical. It’s basically a distributed version of SQLite that combines it with the RAFT consensus protocol. It’s sounds like a great idea and I’m considering it for a project I’m working on now, but I’d love to hear what people think of it in practice.


There's a very similar project, https://github.com/rqlite/rqlite which has always felt more pleasant me.


Is there a site equivalent to "caniuse.com" for all the different database features by product and version? If not, someone should build one. Often some DB would say they added feature X but what does that mean? Did they implement this in advance of other DBs or behind others?


Markus Winand is doing something like that at https://modern-sql.com/. But it's not nearly as extensive as caniuse.com


The issue is there's no database feature spec so you end up with each DB having it's own version of something.


> The issue is there's no database feature spec

ISO/IEC 9075 disagrees.


Yeah, that's the query language used in relational databases but find my the spec for "database encryption" or "database clustering"


The SQL spec is barely comprehensive of the SQL dialects found “in the wild”, let alone the entirely independent and much more interesting set of non-table managing/querying features databases provide.

It provides at best a baseline for comparison.. and a fairly shallow one at that


There is no standard spec so the same function name doing the same kind of thing will work in different ways or support different features.


That's a really wonderful idea.


The `RETURNING` is so awesome! I'm implementing a set of data structures on top of SQLite, one of them is a queue[0], and I had to do a transaction to lock a message and then return it, but this makes it easier.

There's one little issue I keep finding with SQLite, and it's that most virtual servers / VM images ship with version 3.22.0, and upgrading often means building from source.

In any case, SQLite is absolutely wonderful. My favorite way of building products is having a folder for all the DBs that I mount to docker-compose. This release makes it even better.

[0] https://github.com/litements/litequeue


I saw this on the homepage:

> However, SQLite is incredibly powerful. It can scale to thousands of concurrent users with the correct settings

I never thought SQLite can do something like this. How does one go about scaling it for thousands of users? Are these just reads or read, writes? Are there any known real world use cases where people are actually doing this in production?


I guess you already know it, but: «SQLite is really easy to compile» by Julia Evans, see https://jvns.ca/blog/2019/10/28/sqlite-is-really-easy-to-com...


Yes, I’m aware. I usually compile it during a multi stage docker build. There’s also pysqlite3 for Python when I need the latest version.

However, my issue is when I want to access the DBs from the CLI to check something. If I am using features from 3.34.0 (inside my app) I will not be able to use them outside of it. I guess I will have to add the SQLite compilation as an extra step in my base bootstrapping script when I create servers. I hope some day apt-get will give me the latest version by default.


Or build it in a launchpad ppa and configure it as a repository. You'll get the latest version from apt-get as expected.


I don’t use SQLite as my main data store for any project that runs on a server because a dedicated DB server has advantages. But that doesn’t mean that I don’t use it for production use. Here is an example from a few years ago:

I needed to pull in a large dataset from CSV (several dozen gigabytes), do a bunch of transforms on it, extract a small subset of data that I actually was interested in, and load that subset into the main DB. I needed to do this in a way where it wouldn’t bog down the main DB server (which stored less data than what was in the CSVs), and also in a way where I could potentially load diffs of the CSV files as updates. So the solution was that a dedicated job would run on a server where the SQLite database was used to load the CSVs and to manipulate the data with SQL because that was way more convenient than doing it with ad hoc objects in Python. Once done, extract rows from SQLite and load into Postgres. The local DB would stick around to use the diffs for incremental updates, but if it was lost it was easy to recreate it from the last full dataset. Not highly available but highly understandably, cheap, fast, and dead simple.


Sounds like a job for lnav (https://lnav.org), an awesome little CLI "mini-ETL" powertool with SQLite embedded.


Interesting, have you used it for more than log file analysis? A "mini-ETL" sounds exactly what I need each week.


I technically only used it for logfiles, but as a key part of a chain of scripts / utils forming a simple custom test automation suite I cobbled together. Having done that, I'd reach for it first for any ETL-like task given structured data on the order of a few hundred thousand or maybe a few million rows.


https://duckdb.org/ is good for these type of tasks


> SQLite is called ‘lite’ for a reason. If you need functions, add them yourself.

Most people I know, myself included, pronounce it that way. But Richard Hipp, the creator of SQLite, actually pronounces it SQL-ite, like a mineral.


> actually pronounces it SQL-ite, like a mineral.

Surprised you didn't say he pronounces it SQL-ite like an animal


"Esqueuellite, like a mineral" is D. Richard Hipp's phrasing.


Yep, it's a direct quote I remembered from the Changelog Podcast #201 "Why SQLite succeeded as a database" starting at timestamp 15:58

https://changelog.com/podcast/201


It seems there is no universal agreement so you can pronounce which way you want. https://english.stackexchange.com/questions/431329/what-is-t...


Why not "sklite"? Monosyllabic - lite, ya know.


I recently used SQLite in my side project [1]. Here's what I learned from using it:

* Shaving the overhead of network calls for queries sped up my site significantly.

* Most CI/CD providers include SQLite in their base linux images and setup in local envs is easy as well. Running tests against the actual database is simple.

* Replication is not available out of the box. To share a database with multiple instances of your app you will have to use a shared storage volume or some of the available solutions at [2][3][4], but they each come with their caveats.

[1] https://www.tendielist.com

[2] http://litereplica.io

[3] http://litesync.io

[4] https://bedrockdb.com


You can add this one to the list https://litestream.io/


Litestream author here. Let me know if you have any questions. It's built to run as a separate process and to be super easy to get up and running. We have a GitHub discussion board and an active Slack group as well if you need any help.


Hey, lead of Debezium here, a change data capture tool for a number of databases (not SQLite, though). Out of curiousity, how are you implementing change ingestion, is there some interface/API in SQLite which lets you do this? Or are you manually parsing its log files?


Hi Gunnar, good to meet you. Litestream works by reading off the SQLite WAL file which acts as a circular buffer. It takes over the checkpointing process to control when the buffer rolls over so it doesn't miss any frames. Those frames get copied over and each buffer is recreated as a sequential set of WAL files that can be replayed to reconstruct the state of the database at a given point-in-time.

It sounds like Litestream differs from Debezium in that it provides physical replication rather than logical row changes. However, I've been toying with the idea of determining row-level changes from the WAL frames by using ptrmap pages to traverse up the b-tree and determine the owner table. There's a bunch of stuff on the roadmap before that like live read replication though.

There's some additional info on the site about how Litestream works[1] and I'm planning on making a video similar to this Raft visualization[2] I did a while back.

[1]: https://litestream.io/how-it-works/

[2]: http://thesecretlivesofdata.com/raft/


Thanks for sharing those insights. Indeed Debezium is based on logical replication. I'll definitely keep an eye on Litestream, perhaps there may be some potential for collaboration at some point? SQLite hasn't come up really in our community so far, but personally I find it very interesting.


Yeah, for sure. I'm up for some collaboration where it makes sense. I can understand how SQLite probably wouldn't come up so far. If someone is running CDC on Kakfa (Debezium) then they're probably running a client/server database instead of an embedded one.

Hit me up on Twitter[1] if you have any questions or we have a pretty friendly, active Slack[2] too.

[1]: https://twitter.com/benbjohnson

[2]: https://join.slack.com/t/litestream/shared_invite/zt-n0j4s3c...


How does Litstream actually works? Do you read from WAL and then sync with S3?


Yes, it reads off the WAL then compresses the frames using LZ4 and then uploads to S3. The SQLite WAL acts as a circular buffer so Litestream takes over the checkpointing process to control when it rolls over so it can recreate that buffer as separate files. There's additional information on the web site: https://litestream.io/how-it-works/


More than replication, but dqlite might be worth a look as well. It is supposedly stable now that it was rewritten in C.

https://dqlite.io/



This release looks amazing. I use sqlite in almost every one of my sideprojects because it works everywhere and I can just open a file to inspect my db state which is really useful when mashing and trying to decide on table formats etc. The column delete and RETURNING additions are godsends, someone else said it already but this makes replacing postgres a lot easier.


Oh that "returning" is nice; in current SQLite I'm doing a second query to get that:

  SELECT last_insert_rowid()


This sounds incredible.

I am going to have to do some testing. Right now, we lock on a single SQLiteConnection instance because of the fact that insert & id retrieval are 2 separate invocations. If we can get away with just a single ExecuteQueryAsync<long>(MyUpdateSql, MyObject) call and it's thread safe, then we could drop a ton of locking abstractions at the application level.


> If we can get away with just a single ExecuteQueryAsync<long>(MyUpdateSql, MyObject) call and it's thread safe, then we could drop a ton of locking abstractions at the application level.

Not only would you have to ensure you're always setting the db to SQLITE_OPEN_FULLMUTEX, https://sqlite.org/c3ref/errcode.html and https://sqlite.org/c3ref/changes.html are still not coherent in multithreaded contexts. Calling them won't corrupt anything, but they'll return garbage. The latter probably doesn't matter much, but the former seems quite relevant if you want to handle errors beyond "this failed".


Correct - We open our databases in serialized mode and access a single connection from multiple threads.

We actually are not concerned with handling detailed error information. Our usage of these databases is very well bounded. All of the SQL that will ever execute against them is contained in 1 series of constant declarations, and the mappers ensure request data is well-formed before attempting inserts & updates. SQLite is really just there to provide identity and persistence for our business objects.

We are still able to use these fields effectively in local development and unit testing, since in these contexts there is only ever 1 thread hitting the database at a time.


Now only MySQL is missing `RETURNING` from the relational databases that still matter. Postgres had it for ages, SQL Server has it with OUTPUT (with a caveat of breaking when using triggers) and SQLite finally added them on this release.

Using `RETURNING` is so much nicer than spending four round-trips to `BEGIN`, `INSERT`, `SELECT` and `COMMIT` otherwise...


Yea, this one and column removal are my two favourites. And it makes it easier to switch this and PG


Especially since last_insert_rowid() can never be free of race condition, and is basically useless whenever ON CONFLICT is involved.


> Especially since last_insert_rowid() can never be free of race condition

I think that's overstating it. It's per-connection, not per-database. If you don't share connections between threads, it's fine. If you keep the connection locked / checked out for the span of the two statement, it's fine. You're probably doing the latter anyway if you use transactions.

Returning looks nice, though! Particularly that it can return "one result row for each database row that is deleted, inserted, or updated" rather than just one.


> Returning looks nice, though! Particularly that it can return "one result row for each database row that is deleted, inserted, or updated" rather than just one.

That… seems normal? Returning just one row would make no sense, how would even a trivial `RETURNING id` work otherwise?


Sure, but I'm comparing to last_insert_rowid(), which can't do that.

Also, last_insert_rowid() (by definition) doesn't work with "without rowid" tables, where returning can.


Ah I see, I was a bit confused because I'm used to RETURNING from pg and I've been waiting for sqlite to add it for a very long time. Especially since ON CONFLICT was added as it made the issue even worse.


I'm actually surprised it took SQLite this long to implement RETURNING, such a useful SQL language feature.


It's not part of any standard, so they'd need to pick what to implement. They docs say they're modeling it on postgres's, which is a perfectly reasonable extension.

https://www.sqlite.org/lang_returning.html


True, though SQL isn't much of a standard anyway. I literally don't know of a single SQL database that implements the SQL standard to spec with no fiddling. Or more precisely, the vast majority of SQL databases out of the box behave differently, so anything more complex than a basically SELECT + JOIN generally isn't very cross-compatible.


One thing to note about the `RETURNING` clause is that it can't be used in subqueries[0] or CTEs. One thing I do with Postgres is using multiple CTEs with `RETURNING` clauses to have a query with many steps, without having to open a transaction and doing many queries. The SQLite documentation says it's something that they want to implement. Having that would be very nice. We could do complex queries all at once without having to lock the DB during the whole transaction.

[0] https://sqlite.org/draft/lang_returning.html#limitations_and...


This is pretty neat. Now that they added `exp`, I'm tempted to implement an autojumper using [frecency] in SQL. I tried it once before, but gave up specifically because of this IIRC.

[frecency]: https://wiki.mozilla.org/User:Jesse/NewFrecency?title=User:J...


What is an autojumper?


It's a shell command that allows you to `cd` to places without having to type the whole path. Some implementations are https://github.com/wting/autojump and https://github.com/rupa/z.

A very simplified example: if you call `j pro`, the command should take you to the folder that best matches the name based on some criteria (in my case, one of them would be frecency). In my case, that would be my projects folder in another drive.

The nifty thing about them is that you don't have to make those aliases manually -- the tool learns the directories you've visited over time.

Of course, the tool could take you to the wrong path, but with a well made set of matching/prioritization rules, I've found it more useful than not.


This release changes everything!

My biggest complaints were no returning clause, and no drop column.

That's it, now I can run sqlite in production without hesitation!


Returning and column deletes are a big deal. This will make me return back to SQLite.

One other huge issue is that sqllite doesn’t enforce schema. You can say that a column is an int but it will happily insert strings. That is a huge minefield.

I want SQLite but with strict enforced scheme that errors when it encounters an incorrect type.


Have you tried using the CHECK constraint mechanism for this? https://www.sqlitetutorial.net/sqlite-check-constraint/


I am curious to know. Is there any reason why they don't support schema enforcement?


I'd love to know that too.

The FAQ says at https://sqlite.org/faq.html#q3 says "This is a feature, not a bug"

https://sqlite.org/datatype3.html says "the dynamic typing in SQLite allows it to do things which are not possible in traditional rigidly typed databases"... but it doesn't expand on that point at all.

My best guess is that this is a language culture thing. SQLite has a strong connection to the Tcl community - it started life as a Tcl extension: https://www.sqlite.org/tclsqlite.html - and Tcl is a very dynamically typed language.

Whatever the reasons, it's not going to change now: SQLite has extremely strong backwards-compatibility guarantees, and this is very much baked into how it works.

As a dynamic language person this has never bothered me in the slightest, but it's definitely something that comes up a lot from people who are put off using SQLite because of it.


> To delete a column, SQLite have to completely overwrite the table - so the operation is not fast. But it’s still nice.

Can someone with more knowledge/experience ELI5, please? Is this essentially how it's done in other db engines? TIA


In many engines, row-tuples are materialized from rows by having the query planner turn the table’s metadata into a mapping function. With this approach, you get a bunch of things “for free”—the ability to reorder columns, rename columns, add new nullable all-NULL columns or default-constant all-default-valued columns, all without doing any writing. Rows instead get rewritten when the DB builds a new version of them for some other reason (e.g. during UPDATE) or during some DB-specific maintenance (e.g. during VACUUM, for Postgres.)

I don’t believe SQLite works this way. It gives you literally what’s in the encoded row, decoded. I believe this allows it to be either zero-copy or one-copy (not sure which), but it has the trade off of disallowing these fancy kinds of read-time mapping.

IMHO it’s a trade off that makes sense, on both sides. Client-server DBMS inherently need to eventually serialize the data and send it over the wire, so fewer copies doesn’t get you much, while remapping columns at read time might get you a lot. SQLite can hand pointers directly to the app it’s embedded in, so “direct” row reads are a great advantage, while—due to the small size of most SQLite DBs—the need for eager table rewrites on ALTER TABLE isn’t even very expensive.


postgresql does essentially nothing on a drop column, in part because it doesn't use fixed-size tuples:

> The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.

but if you VACUUM FULL (or CLUSTER) it will immediately rewrite the entire table.

Also note that storing a null means forcing a null bitmap for every row (even if it's not otherwise used).


In postgresql:

  The DROP COLUMN form does not physically remove the column, but simply makes it invisible to SQL operations. Subsequent insert and update operations in the table will store a null value for the column. Thus, dropping a column is quick but it will not immediately reduce the on-disk size of your table, as the space occupied by the dropped column is not reclaimed. The space will be reclaimed over time as existing rows are updated.


> The space will be reclaimed over time as existing rows are updated.

Or as you VACUUM, correct? I think it lets you specific a column name too


VACUUM just marks tuples as free spaces so they can be reused. This is part of

> The space will be reclaimed over time as existing rows are updated.

because of MVCC, updating a row really inserts a new row and the old one eventually becomes free space (once a vacuum comes around to marking it).

VACCUM FULL, however, will rewrite the entire table.


Good to know, thank you very much


I was already completely fan of SQLite, but some of these new features are the cherries on the cake =).


Yeah, like being able to remove a column! And math functions, can't believe they finally implemented those ツ


Yeah! And RETURNING is awesome. No more second query with last_insert_rowid() hoping that no queries were done in between :D.


Also see: "SQLite now allows multiple recursive SELECT statements in a single recursive CTE" https://news.ycombinator.com/item?id=24843643


That's awesome. Can't wait for these features to reach the Sqlite lib on Python in 5 years.


I've been using pysqlite3 which is a drop-in replacement compiled against a more recent version. pysqlite3-binary even gives you a (Linux-only) pre-compiled wheel.

https://github.com/coleifer/pysqlite3


There is also this, which seems pretty good: https://github.com/rogerbinns/apsw

(haven't tried it yet)


You can also read the release notes[0] which is pretty readable as always doesn't take the extra time to denigrate the work of this excellent and freely provided software

[0]: https://sqlite.org/releaselog/3_35_0.html


I am running zeeSQL.com a Redis module that embeds SQLite into Redis to provide SQL and secondary indexes (search by value) on top of Redis.

This release it is the first one I am really excited about .

As soon as I can allocate few minutes I will update the SQLite code of the software.

The RETURNING makes a lot of queries and use cases an order of magnitude simpler.


It's such a shame that webSQL died, I hope they will reconsider the decision in the next few years.


Even if they don't, I at least hope they'll provide an alternative. indexedDB is hardly usable let alone a replacement for anything.


I think an independent implementation would help. Personally, I think SQLite is a unique project which is of such high quality that independence is unimportant, but others do not see it that way.


We now have three steaming piles of indexedDB garbage that are still (last I checked) implemented on top of sqlite anyway. I don't see how anyone could think the world became a better place after the change.


What's the best way to run a service that uses SQLite in a serverless / container environment (e.g. Cloud Run).

I'd love to use this for my personal projects and I'm not sure how to set this up in a container given their ephemeral nature.


I run SQLite in serverless environments (Cloud Run, Vercel, Heroku) for dozens of projects... but the trick is that they all treat the database as a read-only asset.

If I want to deploy updated data, I build a brand new image and deploy the application bundled with the data. I tend to run the deploys for these (including the database build) in GitHub Actions workflows.

This works really well, but only for applications that don't need to apply updates more than a few times an hour! If you have a constant stream of updates I still think you're better off using a hosted database like Heroku PostgreSQL or Google Cloud SQL.

One example of a site I deploy like that is https://datasette.io/ - it's built and deployed by this GitHub Actions workflow here: https://github.com/simonw/datasette.io/blob/main/.github/wor...


I've seen some folks use Litestream [1] to continuously replicate to S3 storage and restore it later after their container is killed. There are also other serverless platforms like Fly[2] that provide persistent disk and long-running (but cheap) container VMs.

Disclaimer: I'm the author of Litestream.

[1]: https://litestream.io/

[2]: https://fly.io/


If your app is in a Docker container, you need to have some way to mount a persistent volume for the database. AWS Fargate and fly.io both offer storage volumes.


I doubt you can have a writable filesystem with any server less offering.

Serverless offerings force you to have all your state network accessible so that they can offer you horizontal scalability by using load balancers - so no writable+persistable file system that SQLite needs.

What I do is use a normal VM on GCP, run my docker container on that, and write to a regionally replicated disk (so it is copied between zones on every write). Snapshots run hourly and are stored in GCP storage. This gets you high reliability (if you can have a few mins down time to reboot) but not horizontal scalability.


You can check out Dokku, which is the open source, single server Heroku equivalent. You can mount a storage volume to the container and access it from your app running in Docker. Dokku makes it stupid simple, so it might be worth reading the source code to see how they do that.


I use Dokku, but it also has a Postgres plugin which is basically just a few commands. That also gives you the niceties of dokku pg:backup etc. If you’re on Dokku anyway then that’s what I would recommend.

For me, SQLite remains the perfect file format for command line tools.


Anyone know how to get this version running on a Mac? Trying to brew upgrade it fails:

    sqlite is keg-only, which means it was not symlinked into /usr/local, 
    because macOS already provides this software and installing another version in
    parallel can cause all kinds of trouble.

And checking installed version:

    $ type sqlite3
    sqlite3 is /usr/bin/sqlite3
    $ sqlite3
    SQLite version 3.28.0 2019-04-15 14:49:49
    Enter ".help" for usage hints.


You can invoke it as `/usr/local/opt/sqlite/bin/sqlite3`. Or run `brew info sqlite3` to see how to add it to your path.

https://docs.brew.sh/FAQ#what-does-keg-only-mean


Ok, got it. Found out it was installed:

    $ mdfind -name sqlite3 | grep -E 'bin/sqlite3$'
    /usr/local/Cellar/sqlite/3.34.1/bin/sqlite3
    /usr/bin/sqlite3
Just need to symlink it or add it to path. Or `brew link sqlite`. Thanks!


> SQLite is called ‘lite’ for a reason. If you need functions, add them yourself.

Ok, lets say that SQLite developer believe this position has merit (they must've have, since they held it for 20 years, right?)

So what changed now which led to you implementing such functions?

Nothing changed, you just didn't _feel_ like doing it.


That's not a direct quote from the maintainers - the author of the post seems to have made it up.

I imagine what changed is that the SQLite maintainers actively listen to their developer community, and decided it was finally time to prioritize that piece of work over other features.


Do joins now work in every way (full, inner, outer)?


They always have, haven't they? https://www.sqlite.org/lang_select.html



If you can do a LEFT OUTER JOIN I don't see that there's any benefit in being able to do RIGHT OUTER JOIN - surely you can change the table order yourself?

It looks like you can emulate FULL OUTER JOIN by doing a UNION between two LEFT joins, see https://www.sqlitetutorial.net/sqlite-full-outer-join/


my worries are more, if i use it, like everywhere, that at some point, some "ORM Layer" like for example hibernate will use the right outer join for something and then break since this is not implemented. and there are also a few other quirks where i am not sure, if some systems do not need them (grant/revoke is of course meaning less)


Wouldn't the ORM layer have knowledge of the SQL language of your DB engine? I thought that was one of the big wins of using an ORM.


In theory. In practice i do not tend to trust all these layer not so much as i have seen to many "edge cases" fail. :( but let's hope that works :D


Awesome! SQLite, from my perspective at least is feature-complete! Anyone else have any notable missing features?


As superb a piece of software as SQLite is, it's definitely still missing features that many would use. No FULL OUTER JOINs is maybe the most notable that I've encountered.

https://www.sqlite.org/omitted.html


I'm curious as to what you use FULL OUTER JOINS for - I've so far never felt the need for them in my own work but I'd love to understand what I'm missing out on.


The one use case I’ve found recently is joining sparse time series data. In other words, when the tables you’re joining are “equally important” as opposed to one depending on the other.


why isnt materialized CTE a default optimization in sql? why do we need to be explicit about it

any trade off


I think the changelog is a bit confusing on that point. I interpret it as meaning that CTE is a default optimization:

"The default behavior was formerly NOT MATERIALIZED, but is now changed to MATERIALIZED for CTEs that are used more than once"

The default was NOT MATERIALIZED only because previous versions of SQLite didn't have this feature at all.

"now changed to MATERIALIZED for CTEs that are used more than once" means that the query planner DOES use this optimization by default if you don't do anything - the optimization only makes sense for CTEs that are used more than once in the same query.


> The default was NOT MATERIALIZED only because previous versions of SQLite didn't have this feature at all.

The default was NOT MATERIALIZED because it's what the sqlite devs had selected whe they implemented CTEs.

Pg did the exact opposite before the feature was added (and exposed to end-users): CTEs would always be MATERIALIZED. And similarly, with the introduction of [NOT] MATERIALIZED they changed the default in some cases: for pg if the CTE is non-recursive, has no side-effects, and is only referenced once, then it defaults to NOT MATERIALIZED.


I think the change introduced this syntax

with history as materialized { .. }

so you have to declare the CTE as materialized to get this optimization


> so you have to declare the CTE as materialized to get this optimization

Not necessarily. You can get a materialized CTE if you request it regardless of its usage pattern, but you will also get one if the CTE is used more than once unless you suppress it.


That's not how I interpret the release notes, but I can see why they are being misunderstood like that (assuming I'm right in my interpretation).


with the returning clause it should be trivial to do cdc now, no?


Interesting thought; with the client/server database model, RETURNING isn't relevant for CDC, which rather extracts changes from the transaction log, not scoped to any particular client session. But in the case of SQLite and its embedded way of running, one could indeed envision to use it for implementing some basic CDC functionality at the application level. It'd still lack metadata though like transaction ids, log file offset, etc.


True that’s right. I was just highlighting the spin that SQLite puts on the returning idea. What a really cool feature and db


Isn't it done with triggers? SQLite supports them since long ago


CDC?



yup, change data capture.


> To delete a column, SQLite have to completely overwrite the table - so the operation is not fast. But it’s still nice.

Yet another reason why column stores are superior to row stores.


This is not a problem for "row stores" though, it's specific to SQLite.

As a matter of fact, any classical RDBMS can effectively store columns off-page and and most of them do just that in the case of large TEXT/JSON/BLOB columns.

Column stores like Redshift, Snowflake, etc. are optimized for a much narrower set of use cases. The more widely used databases could do the same optimizations internally but that would make them much less useful.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: