Hacker News new | past | comments | ask | show | jobs | submit login
Datasette: An open source multi-tool for exploring and publishing data (datasette.io)
285 points by robin_reala on Dec 11, 2020 | hide | past | favorite | 22 comments



I've been trying out something new this week: I'm running "Datasette Office Hours" where people can book a 20 minute Zoom call on a Friday to to talk to me about the project.

Today was the first day for calls, and it was fantastic. I spoke to five different people and got to see some wonderful applications of the tool - from analyzing hardware test results to exploring cemetery interment records.

If you're running an open source project and want to talk to people using your software this approach seems to work really well. I'm using Calendly for it: https://calendly.com/swillison/datasette-office-hours


That's such an amazing idea, thank you for sharing your time!


that's a really cool idea, and a really cool way to do it.

thanks for sharing your project!


This is the new Datasette project website - the open source project is three years old now but I launched this website yesterday. Up until now there's just been the GitHub repo at https://github.com/simonw/datasette and the project documentation at https://docs.datasette.io/

The https://datasette.io/ site is running on Datasette itself, with a bunch of custom templates. If you're interested in seeing how it works the source code (and the GitHub Actions that build and deploy the underlying database) can be found here: https://github.com/simonw/datasette.io


I saw this earlier as it was used for the SBA Paycheck Protection project ... and was impressed. Thank you for this work you are doing.

If you have any use at all for a free-forever rsync.net account, just let us know - I would be very happy to give you one.


It'd be nice if Datasette provided support for other file formats than SQLite. I made https://github.com/pytables/datasette-connectors that gives a simple API for making "connectors" for other formats by monkey patching Datasette, but it'd be more efficient if Datasette included that API (or something similar) directly.

For example, https://github.com/pytables/datasette-pytables implements a connector for HDF5 files. You can read how to make a connector for your favorite file format in https://github.com/PyTables/datasette-connectors/blob/master...

Currently, datasette-connectors supports Datasette 0.51.1 but every few months an update is needed in order to be compatible with last versions.

I wonder if it'd be possible to merge datasette-connectors into Datasette. Then, "connectors" could be like other plugins and people could publish their data in a wider variety of ways.


I'm coming around to this idea now - originally I avoided having Datasette work with anything other than SQLite because maintaining a database abstraction layer is an enormous amount of work, but now that I have a couple of years experience with Datasette plugins I'm thinking it may be possible to delegate that complexity to plugins and still keep Datasette's core relative clean and straight-forward to maintain.

I'll absolutely let you know if this starts to come together. I'm planning to target PostgreSQL as the first plugin to help flesh out the integration points.


Don't miss out on the homepage's link to Datasette's ecosystem, which includes a variety of CLI tools that are just fantastically convenient for creating and working with sqlite databases:

https://docs.datasette.io/en/stable/ecosystem.html

I'm a dork who likes using the built-in SQLite interface (and Bash hacks) for bulk CSV import, but it's really hard to beat the convenience of csvs-to-sqlite: https://github.com/simonw/csvs-to-sqlite


I see this project get posted to HN regularly, and I upvote it every goddammed time, because it's that good. The creator has a great blog as well.

https://simonwillison.net/


I tested this out when its "datasette-ripgrep" plugin was posted and it is really cool for browsing a database. I wish it had a plugin for browsing through "roam-like" markdown notes or storing them in an sqlite database with the correct links!


There's a plugin called datasette-render-markdown which can be configured to render specific database records: https://github.com/simonw/datasette-render-markdown

You can see a demo of that on the https://datasette.io/ site here - it's rendering the release notes from the different plugin releases: https://datasette.io/content/recent_releases


There is a similar project for rapid publishing of datasets in R https://pins.rstudio.com/ and an under-development port for JavaScript https://pinsjs.github.io/


What would be great, and unless I missed it, would be to be able to reference/cite a particular dataset with a Digital Object Identifier (DOI). It is a key feature of Zenodo for example.


I'm thinking about ways to improve the metadata mechanism at the moment - currently it supports a license, source and "about" field.

I'm reading up on https://www.doi.org/ now - it looks like it could be a great fit for the project, so thanks for the suggestion!


strong +1 for DOI support.

and cheers for python user group talk a while back!


It's accepting issues and PRs: https://github.com/simonw/datasette


+1 thanks


I've not tried Datasette yet, but, with office hours, seems like now is a good time!

Does it make dataset discovery easier as well? I spend a lot of time looking for sources that are close to what I need— perhaps even more than I spend on formatting or displaying. Does datasette (or some other tool) make that easier than Googling around?


Data published using Datasette should have good SEO, but it's not nearly as widely used for publishing yet as I'd like.

I've been thinking about building a Datasette of Datasettes as usage grows though!


That not a Datasette, this is a Datasette: https://en.m.wikipedia.org/wiki/Commodore_Datasette


Made a video about Datasette recently: https://youtu.be/sDi1HSy-NHY


Orthogonal, but the website design is very refreshing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: