libre-man's comments

libre-man · on March 22, 2024

What you want to determine this is not an AST, you want a Program Dependence Graph (PDG), which does encode this information. Creating them is not close to as simple as creating a AST, and for many languages requires either assumptions that will be broken, or result in something very similar to an AST (every node has a dependency on the previous node).

thaumasiotes · on March 22, 2024

OK. What good is the AST? Why do I care about "structural diffs" that don't do this?

The page has several examples:

1. Understand what actually changed.

This appears to show that `guess(path, guess_src).map(tsp::from_language)` has been changed to `language_override.or_else(|| guess(path, guess_src)).map(tsp::from_language)`. The call to `map` is part of a single line of code in the old file, but has been split onto a line of its own in the new file to accommodate the greater complexity of the expression.

The bragging associated with the example is "Unlike a line-oriented text diff, difftastic understands that the inner expression hasn't changed here", but I don't really care about that. I need to pay close attention to which bits of the line have been manipulated into which positions anyway. I'm more impressed by ignoring the splitting of one line into several, which does seem to be a real benefit of basing the diff on an AST.

2. Ignore formatting changes.

This example shows that when I switch the source from which `mockable` is imported from "../common/mockable.js" to "./internal.js", the diff will actively obscure that information by highlighting `mockable` and pretending that `"./internal.js"` is uninteresting code that was there the whole time (because it was already the source of some other imports). This badly confuses a boring visual change ("let's use the syntax for importing several things, instead of one thing") with a very significant semantic change ("let's import this module from a completely different file"). I'm not comfortable with this; there must be a better way to present this information than by suggesting that I shouldn't be worried about it.

(A textual diff, in this case, has the same problem. But when the pitch is that your new tool is better than a textual diff because it understands the code, failing to highlight an important change to the code is worse than it used to be!)

3. Visualize wrapping changes.

This shows that when I change the type of some field from `String` to `Option<String>`, the diff will not highlight the text "String", because that part hasn't changed. This is a change from a textual diff, but it doesn't appear to add much value.

There's a second example to do with code that belongs both before and after other code, in this case an opening/closing tag pair in XML, but in that case the structural diff appears to be identical to a textual diff.

4. Real line numbers.

"Do you know how to read @@ -5,6 +5,7 @@ syntax? Difftastic shows the actual line numbers from your files, both before and after."

I agree that that's a real benefit, but again it doesn't seem to have anything to do with the difference between textual and structural diffs.

------

I think the conceptual appeal of a "structural diff" is that it fails to highlight changes to the code that don't change the behavior of the software. Difftastic clearly believes something different; in the second example, they are failing to highlight a change to the code that does change the behavior of the software. And in the other examples, they are failing to highlight things that haven't changed from some perspectives, but could be argued to have changed from other perspectives -- and that in either case don't derive much benefit from not being highlighted. If changing `String` to `Option<SpecialType>` produced a diff that highlighted `SpecialType` in a separate color from the surrounding `Option<>` wrapping, indicating that the one line of code contained two relevant changes, that might be interesting, but otherwise I don't see the point of not highlighting the inner `String` along with the new wrapping.

So... what is the appeal of structural diffs?

libre-man · on March 27, 2024

Honestly I agree that structural diffs don't solve a problem for me either. I care about formatting too much to only want to rely on them.

I was just replying that if you want to not get a diff for your example to which I replied you have to use a more advanced representation of the code, and AST won't be able to do it.

libre-man · on April 1, 2021

CodeGrade | Software Developer | Amsterdam, The Netherlands | Full Time | Onsite

CodeGrade is improving coding education, by giving teachers better tools so that they can provide more insightful feedback for their students. It all started as a university project at the University of Amsterdam to make our lives as TAs easier, and for students to get the feedback they need to be successful in computer science education. Today, we are helping many institutions worldwide.

Our tech stack includes python (including heavy use of type annotations), Vue and typescript, and we use AWS for our hosting.

We're a bootstrapped spinoff of the University of Amsterdam, and this is our first technical hire. In the role you'll be working on the entire product, from our custom autograding infrastructure to plagiarism detection.

Apply here [0], or email me directly (thomas at codegrade dot com).

[0] https://www.codegrade.com/jobs/software-developer

apexalpha · on April 2, 2021

Groetjes aan Olmo! En succes. :)

libre-man · on March 1, 2021

CodeGrade | Software Developer | Amsterdam, The Netherlands | Full Time | Onsite

CodeGrade is improving coding education, by giving teachers better tools so that they can provide more insightful feedback for their students. It all started as a university project at the University of Amsterdam to make our lives as TAs easier, and for students to get the feedback they need to be successful in computer science education. Today, we are helping many institutions worldwide.

Our tech stack includes python (including heavy use of type annotations), vue and typescript, and we use AWS for our hosting.

We're a bootstrapped spinoff of the University of Amsterdam, and this is our first technical hire. In the role you'll be working on the entire product, from our custom autograding infrastructure to plagiarism detection.

Apply here [0], or email me directly (thomas at codegrade dot com).

[0] https://www.codegrade.com/jobs/software-developer

libre-man · on April 25, 2018

So that the files can be downloaded before the html is parsed if I recall correctly.

libre-man · on March 19, 2018

Wat a completely terrible thing to say. This isn't about saving a life, this is about not killing somebody. Just because a company like Uber thinks it can make a lot of money doen't mean it can simply take risks like these.

cm2187 · on March 19, 2018

Every time you are selling food you take the risk of killing people if something goes wrong. And what about carrying people in planes. These risks are taken continuously, for profit. How is that different?

rabidrat · on March 19, 2018

Both of those industries have tremendous regulations in place to prevent accidents and injuries. If someone gets salmonella poisoning and it is traced back to a company, there is a massive recall at the company's expense. Air travel is one of the safest modes of transportation available (statistically) because of the NCTB and the rules/regulations put in place after each and every accident.

That's how it is profoundly different.

cm2187 · on March 19, 2018

Air travel is only safe because companies have taken these risks with people's life. A society that takes no risk is a society that will achieve nothing new.

thatcat · on March 19, 2018

Air travel and eating food at a resteraunt is an opt-in action. To avoid this risk, you would have to opt-out of using the public road system that you are required to use.

libre-man · on March 19, 2018

We wouldn't except it if people got killed by selling them poisoned food, at least not where I'm from. Plane crashes are investigated and licenses are suspended and blacklists are kept, furthermore software is tested and verified before it is used in production. We shouldn't except excessive risks, see regulations with truck and bus drivers, just because a profit can be made.

cm2187 · on March 19, 2018

But what makes you think Uber didn't test their software? When Boeing introduces carbon fuselage it is taking risks with people's life. They do reasonable testing but a technology isn't proven until it has been widely used for a long time. No risk = no innovation.

emodendroket · on March 19, 2018

Since there is absolutely no binding federal regulation I don't have a lot of confidence that the level of testing is comparable to what's done for airplanes.

azernik · on March 19, 2018

It's all well and good to not like the choice, but the choices still have to be made - how much are we willing to give up economically in order to reduce immediate risk to lives? Included in this must be the consideration that economic value can be used to save lives, through higher living standards and better health care.

Every regulatory system in the world has to consider these things, explicitly or (more commonly) implicitly. See e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1633324/ for the application of the idea to healthcare, or http://www.nytimes.com/2011/02/17/business/economy/17regulat... for the public policy implications for environmental regulation.

Or more relevant yet, this "guidance on valuing reduction of fatalities and injuries by regulations or investments" by the Department of Transportation: https://www.transportation.gov/sites/dot.gov/files/docs/VSL_...

mlindner · on March 19, 2018

Did you ever take an engineering ethics course? One of the things talked about is the monetary value placed on a human life. You can't make that value infinitely high or literally nothing can happen. You also don't want it super low.

libre-man · on April 29, 2017

Of course you need to drop to some sort of non GC language, or you would need a hardware GC. However look at Iota [0], that can compile LLVM to Common Lisp. So you can run GC and non GC languages within, kind of, Common Lisp. Or look at all the projects that compile a language to javascript, and there is no reason why the reverse could not be done.

[0] https://github.com/froggey/Iota

libre-man · on April 24, 2017

I tested Common Lisp. SBCL seems to be exponential while Clozure CL is not.

However it should be noted that it is non portable to do globbing in Common Lisp, so I expect most users implement it using something CL-FAD or OSICAT and CL-PPCRE, and CL-PPCRE is efficient.