I've been working on something in the same space for the Belgian federal parliament. The Belgian parliament livestreams sessions and publishes a single (long, bloated, dual-language) PDF report[0] for each session and that's it.
This means no search across sessions, no details of which parties voted how, no API etc. The only view you get is from the perspective of a single session which is not very useful when you're trying to figure out who to vote for.
I made 'zij werken voor u' (TheyWorkForYou[1] in Dutch) by scraping the PDFs file and parsing it with a Rust script automatically.
The scraped data (votes, questions, topics, dossiers) get put into .parquet files. I also compute some additional things like voting patterns, attendance and which topics interest specific PMs the most.
These parquet files are then fed into a static site generator and a search index is built. I also sprinkle in some summarization using Mistral[2]
I also post new votes/questions on Bluesky[3]. The whole process (downloading, scraping, publishing, posting) is automated to run through GitHub Actions. I literally have to do nothing now.
I'm hoping the Belgian government will step up and improve their archaic and almost unusable site[4].
Thanks for sharing this project, I'm already getting inspired by it to improve zijwerkenvooru.be!
Edit: I’m thinking it might be good to have an overview of initiatives like these somewhere? Public initiatives to help with political transparency for each country?
This is fantastic! Love the automation and structure behind it, especially the .parquet approach and GitHub Actions pipeline. Super inspiring.
On my end, it’s a bit frustrating that our Parliament still only shares pdf reports weeks after sessions happen, likely compiled manually. No API, no transcript archive, and no structured metadata around bills, speakers, or topics.
That’s partly why I started building Bunge Bits: to sidestep the bottlenecks and make the information usable.
Appreciate you sharing zijwerkenvooru.be, bookmarking it for inspiration as I figure out what’s next.
Not at the moment but maybe in the future. While I'm sure the code could be of interest, the code definitely isn't a template that would fit for other countries. It is tightly coupled to the format of the Belgian parliament since most of it is parsing their PDF reports. In detail my tech stack is like this:
- Rust + reqwest crate + scraper crate to download reports and parse data from it into parquet files
- 11ty + @duckdb/node-api to parse the parquet files at build time and generating pages automatically, I have for example a 'meeting' template and then for each meeting a html page gets built
In my own projects I use JFA/SDF-based outlines the most because of their quality as well as the possibility to render distance-based effects like pulsating outlines.
Very nice! I'm wondering, what made you decide to start your own database from scratch instead of (re-)using and/or extending other databases/APIs (such as trefle, or the data sources that trefle uses)?
The goal of permapeople is not to gather as much plants as possible but focus on plants where humans currently see a use in, be it for food, medicinal or other uses.
Most data is curated by real people. We do import from the data sources trefle mentions but only to fill specific gaps in our db.
I use box/gaussian blurs often, but for rendering outlines/highlights of objects.
https://ameye.dev/notes/rendering-outlines/#blurred-buffer