I've been trying out something new this week: I'm running "Datasette Office Hours" where people can book a 20 minute Zoom call on a Friday to to talk to me about the project.
Today was the first day for calls, and it was fantastic. I spoke to five different people and got to see some wonderful applications of the tool - from analyzing hardware test results to exploring cemetery interment records.
This is the new Datasette project website - the open source project is three years old now but I launched this website yesterday. Up until now there's just been the GitHub repo at https://github.com/simonw/datasette and the project documentation at https://docs.datasette.io/
The https://datasette.io/ site is running on Datasette itself, with a bunch of custom templates. If you're interested in seeing how it works the source code (and the GitHub Actions that build and deploy the underlying database) can be found here: https://github.com/simonw/datasette.io
It'd be nice if Datasette provided support for other file formats than SQLite. I made https://github.com/pytables/datasette-connectors that gives a simple API for making "connectors" for other formats by monkey patching Datasette, but it'd be more efficient if Datasette included that API (or something similar) directly.
Currently, datasette-connectors supports Datasette 0.51.1 but every few months an update is needed in order to be compatible with last versions.
I wonder if it'd be possible to merge datasette-connectors into Datasette. Then, "connectors" could be like other plugins and people could publish their data in a wider variety of ways.
I'm coming around to this idea now - originally I avoided having Datasette work with anything other than SQLite because maintaining a database abstraction layer is an enormous amount of work, but now that I have a couple of years experience with Datasette plugins I'm thinking it may be possible to delegate that complexity to plugins and still keep Datasette's core relative clean and straight-forward to maintain.
I'll absolutely let you know if this starts to come together. I'm planning to target PostgreSQL as the first plugin to help flesh out the integration points.
Don't miss out on the homepage's link to Datasette's ecosystem, which includes a variety of CLI tools that are just fantastically convenient for creating and working with sqlite databases:
I'm a dork who likes using the built-in SQLite interface (and Bash hacks) for bulk CSV import, but it's really hard to beat the convenience of csvs-to-sqlite: https://github.com/simonw/csvs-to-sqlite
I tested this out when its "datasette-ripgrep" plugin was posted and it is really cool for browsing a database. I wish it had a plugin for browsing through "roam-like" markdown notes or storing them in an sqlite database with the correct links!
What would be great, and unless I missed it, would be to be able to reference/cite a particular dataset with a Digital Object Identifier (DOI). It is a key feature of Zenodo for example.
I've not tried Datasette yet, but, with office hours, seems like now is a good time!
Does it make dataset discovery easier as well? I spend a lot of time looking for sources that are close to what I need— perhaps even more than I spend on formatting or displaying. Does datasette (or some other tool) make that easier than Googling around?
Today was the first day for calls, and it was fantastic. I spoke to five different people and got to see some wonderful applications of the tool - from analyzing hardware test results to exploring cemetery interment records.
If you're running an open source project and want to talk to people using your software this approach seems to work really well. I'm using Calendly for it: https://calendly.com/swillison/datasette-office-hours