Hacker Newsnew | past | comments | ask | show | jobs | submit | arxanas's commentslogin

Just gave a talk about this: https://blog.waleedkhan.name/what-if-sql-were-good/

- Recommend Ascent (Rust only, but supports targeting WASM)

- Soufflé: good, but too hard to integrate into existing systems; lots of ergonomic problems in comparison to Ascent (can elaborate)

- CozoDB: really cool but seems to be abandoned

- Logica: have not tried it yet


Would like to hear about the ergonomic problems you have with souffle. We integrate it into our rust tools quite well, and generate typesafe rust bindings to our souffle programs, allowing us to insert facts and iterate over outputs.


It's quite possible that I have different, smaller-scale problems than you have! So my feedback might not be as relevant

I wrote detailed commentary here: https://github.com/s-arash/ascent/discussions/72

Re Rust bindings and your specific comment:

- Deploying Soufflé and doing FFI is much more difficult for me in practice, just in terms of the overhead to set up a working build. (I'm not going to be able to justify setting up a Soufflé ruleset for Bazel, and then adding Rust-Soufflé binding generation, etc. at my workplace.)

- User-defined functors, or integrating normal data structures/functions/libraries into your Soufflé program, seems painful. If you're doing integrations with random existing systems, then reducing the friction here is essential. (In slide 16 of the talk, you can see how I embedded a constructive `Trace` type and a `GlobSet` into an actual Ascent value+lattice.)

- On the other hand, you might need Soufflé's component system for structuring larger programs whereas I might not (see above GitHub discussion).

Non-specifically:

- Several features like generative clauses, user-defined aggregations, lattices, etc. seem convenient in practice.

- I had worse performance with Soufflé than Ascent for my program for some query-planning reason that I couldn't figure out. I don't really know why; see https://github.com/souffle-lang/souffle/discussions/2557


> - I had worse performance with Soufflé than Ascent for my program for some query-planning reason that I couldn't figure out. I don't really know why; see https://github.com/souffle-lang/souffle/discussions/2557

I think the basic issue is that ADTs are simply not indexed--so to the degree that you write a query that would necessitate an index on a subtree of an ADT, you will face asymptotic blowup, as the way ADTs work will force you to scan-then-test across all ADTs (associated with that top-level tag). The issue is discussed in Section 5.2 of this paper here: https://arxiv.org/pdf/2411.14330


Ah, yes, but I think Ascent also doesn't index ADTs. In this case, based on some other information, it seems like Soufflé _can_ plan the queries better if it has profiling data. It seems like Ascent just happened to pick a better query plan in my case without the profiling data.

Thanks for the link to the paper!


It's true that Ascent does not index ADTs either, but there are some tricks that you can use when you control the container type to get similar performance by, e.g., storing a pre-computed hash. I believe Arash, the main author of Ascent, was exploiting this trick for Rc<...> members and seeing good performance gains. It is a bit nuanced, you're right that Ascent doesn't pervasively index ADTs out of the box for sure.


One trick for running tests in rebase-heavy workflows is to use the tree hash of the commit as the cache key, rather than attach metadata the commit itself.

- That way, tests will be skipped when the contents of the commit are the same, while remaining insensitive to things like changes to the commit message or squashes.

- But they'll re-run in situations like reordering commits (for just the reordered range, and then the cache will work again for any unchanged commits after that). I think that's important because notes will follow the commits around as they're rewritten, even if the logical contents are now different due to reordering? Amending a commit or squashing two non-adjacent commits may also have unexpected behavior if it merges the notes from both sides and fails to invalidate the cache?

- This is how my `git test` command works https://github.com/arxanas/git-branchless/wiki/Command:-git-...

---

I've also seen use-cases might prefer to use/add other things to the cache key:

- The commit message: my most recent workflow involves embedding certain test commands in the message, so I actually do want to re-run the tests when the test commands change.

- The patch ID: if you specifically don't want to re-run tests when you rebase/merge with the main branch, or otherwise reorder commits.

Unfortunately, I don't have a good solution for those at present.


In my case I wanted to invalidate the notes when rearranging commits, because changing order could introduce bugs, but also have a single command that ensures that all my commits are tested before sending pull request. I think maybe it used to work that way (remove notes upon rebasing) out of the box that time, but if not, then I suspect I simply added the git commit id to the note, which can be effectively used for validity checking.

Of course, if the notes mechanism didn't exist, then I could have just used a local file.. But it's nice to see the messages in the git log.

But yeah, both kinds of keys would be useful for this purpose, depending on the exact needs.


> To me it still all seems confusing. I don't have a good mental model for it. > > How does it handle things like `git rebase -x 'make fmt'` which might edit each rebased commit automatically, or `git rebase -x 'make lint'` which might fail but not leave a conflict marker. I didn't see any docs for exec in jj rebase so I guess this feature doesn't work

As a mental model, you can pretend that a commit with a conflict in it has suspended the subsequent rebase operation for descendant commits. When you resolve the conflicts in that commit, it's as if jj automatically resumed the rebase operation for the descendant commits. This process repeats until you've applied all of the rebased commits and resolved all of the conflicts (or not, if you decide to not resolve all of the conflicts).

Today, you can write something like

    for commit in "$(jj log -T ...)"; do jj edit "$commit" && make fmt; done
which would also edit the commits automatically. It's probably not too hard to break out of the loop when the first conflict is detected to match the Git workflow more directly. (I can figure it out if you end up being interested.)

> There's also no examples for how you work out which commit has a conflict marker. It's probably obvious in jj status but it would be nice to document, as I'm not yet interested in spending the time to test it.

Does this overview help? https://jj-vcs.github.io/jj/v0.26.0/tutorial/#conflicts

> And I imagine it all breaks in colocated repos too (I need submodule support, and I want to use gh cli etc)

It depends on the exact subset of features you need.

- jj can't update submodules, which works for some workflows but not others.

- Generally speaking, you can use the `gh` CLI reasonably well alongside jj. In jj, it's common to create commits not addressed by any branch, but `gh` doesn't support those well. You can either try to adopt your Git workflows directly and always use branches, or else you'll need to make sure to create the branch just-in-time before performing a `gh` invocation that would need a branch.


It's certainly true that jj's features won't appeal to everyone. I think a lot of its features are quality-of-life features (consistent commands and concepts, general undo), and a lot of its features don't help a certain class of users (flexible commit rewriting/rebasing), so it's not surprising that some seasoned Git users won't find it that helpful.

I think it's unfair to call it a "thin UI layer". My own project git-branchless https://github.com/arxanas/git-branchless might more legitimately be called a "thin UI layer", since it really is a wrapper around Git.

jj involves features like first-class conflicts, which are actually fairly hard to backport to Git in a useful way. But the presence of first-class conflicts also converts certain workflows from "untenable" to "usable".

Another comment also points out that it was originally a side-project, rather than a top-down Google mandate.


I don't think the cheat sheet in that thread demonstrates any of jj's improved expressive power, such as revsets or first-class conflicts. I gave an example of something difficult do in Git here: https://news.ycombinator.com/item?id=43022367


Merges can be undone. Either you can manually remediate by `jj abandon`ing the merge commit so that it's not visible anymore, or you can restore the entire repo state to a previous point in time with `jj undo` or `jj op restore`, or you can do some remediation in between those two extremes.

Off of the top of my head, `jj new` and occasionally `jj rebase` can create merge commits; I don't recall any others.


jj is pretty much just safer than Git in terms of the core architecture.

There's several things Git can't undo, such as if you delete a ref (in particular, for commits not observed by `HEAD`), or if you want to determine a global ordering between events from different reflogs: https://github.com/arxanas/git-branchless/wiki/Architecture#...

In contrast, jj snapshots the entire repo after each operation (including the assignment of refs to commits), so the above issues are naturally handled as part of the design. You can check the historical repo states with the operation log: https://jj-vcs.github.io/jj/latest/operation-log/ (That being said, there may be bugs in jj itself.)


It's a bit unfortunate because most of the listed commands are indeed equivalent to Git commands. To give an example of new capabilities, in Git, you can't do the equivalent of

    jj rebase -r 'mine() & diff_contains("TODO")' -d 'heads(@::)'
in any reasonable number of commands, which will

1) find all of the commits I authored with the string `TODO` in the diff

2) extract them from their locations in the commit graph

3) rebase all non-matching descendant commits onto their nearest unaffected ancestor commits

4) find the most recent work I'm doing on the current branch (topologically, the descendant commit of the current commit which has no descendants of its own, called a "head")

5) rebase all extracted commits onto that commit, while preserving the relative topological order and moving any branches.

Furthermore, any merge conflicts produced during the process are stored for later resolution; I don't have to resolve them until I'm ready to do so. (That kind of VCS workflow is not useful for some people, but it's incredibly useful for me.)


So you're taking every single TODO commit and rebasing it on top of current? Why would we do that?


I might do it if the commits in my stack are mostly independent and I want to commute the ones with `TODO` to be later. This might be so that I can

- run checks on the sequence of done commits, without getting wrong/unhelpful results because of the presence of an unfinished commit in the sequence

- put the done commits up for review now

(This is predicated on certain workflows being acceptable / useful for the user, such as modifying existing commits and reordering commits.)

It's also just an example. You can swap in your favorite query into `-r` if you don't have any need for moving `TODO` commits around. Regardless, Git is not easily able to replicate most such invocations.

I previously meant to link to the jj revset docs here: https://jj-vcs.github.io/jj/v0.26.0/revsets/


??.. wdym why, what?.

Also this is just an example, there are a lot of very complex (from the point of git) things that are trivial like that in jj.

Although if you're asking "why would we do that" then I don't even know what to tell you lol


I think it was a very valid question. I was, and still am, wondering the same thing


I guess you could question the specific example, but to me the question sounded like "why would you do rebases".

Like, for various reasons?.

The point was that it's way easier and way safer to do rebases in jj, and it straight up allows to do complex stuff like the mentioned example easily and inderstandably, idk


I believe `jj abandon` indeed operates on edges rather than nodes. It looks the diagram is updated now.

I believe that `jj squash` and `jj backout` also operate on edges rather than nodes, but the examples here don't make it clear. `jj squash` ought to combine the edge `r -> q` with the edge for `q -> parent(q)` (not depicted) and ultimately leave the `r -> q` edge as "empty", and `jj backout` ought to create an edge that has the inverse diff of another edge (which, in this case, is indistinguishable from `s`'s node changing to be equivalent to `q`'s node).


Yup, that's all correct. I drew `squash` and `backout` in terms of files in order to avoid needing notation for the opposite of an edit and the composition of two edits.


You can run the tests on each commit in parallel if you're okay with wasting CPU time to save wall-clock time. git-branchless can speculatively run linear or binary search in parallel up to a user-specified number of jobs [1], and I'd like it add it to jj someday, as it's one of the features I miss most.

(To run the tests in parallel, git-branchless provisions its own worktrees. For binary search, it speculatively executes the search for the potential success and failure cases; when the number of jobs is a power of 2, this partitions the search space evenly.)

[1]: https://github.com/arxanas/git-branchless/wiki/Command:-git-...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: