> I would actually be more impressed if it understood enough about programs to be able to mimic higher-level structure without just memorizing text.
Agreed.
> Why do you think Python isn't good for comparing program structure?
From recent experience I prefer control flow analysis, or something that results in a graph structure. As you said, that's harder with dynamic languages. I also think some Python features (english-like syntax, f-strings, division converts int to float, whitespace indentation, loops vs generator expressions) make structural comparisons messy, but that may just be bias.
The ideal language would be one with minimal syntax, where we can target a decent range of programs, and obtain as much info as possible about program structure without actually running the program. I've come across LISP-without-macros in Dreamcoder (https://arxiv.org/abs/2006.08381), BF++ (https://arxiv.org/abs/2101.09571), and a couple of others which I can't remember right now. I think the APL family (APL/J/K) would interesting because fewer characters to generate, but each character has a lot of meaning.
Right now I'm looking at flow-based programming (FBP) for this: In FBP the control flow is explicit - the program code describes a directed acyclic graph (DAG), so comparing program structure becomes straightforward (subgraph isomorphism with some heuristics). I'm writing a toy FBP language that draws images (https://github.com/mayahq/flatland), with which I aim to test what these models understand.
Agreed.
> Why do you think Python isn't good for comparing program structure?
From recent experience I prefer control flow analysis, or something that results in a graph structure. As you said, that's harder with dynamic languages. I also think some Python features (english-like syntax, f-strings, division converts int to float, whitespace indentation, loops vs generator expressions) make structural comparisons messy, but that may just be bias.
The ideal language would be one with minimal syntax, where we can target a decent range of programs, and obtain as much info as possible about program structure without actually running the program. I've come across LISP-without-macros in Dreamcoder (https://arxiv.org/abs/2006.08381), BF++ (https://arxiv.org/abs/2101.09571), and a couple of others which I can't remember right now. I think the APL family (APL/J/K) would interesting because fewer characters to generate, but each character has a lot of meaning.
Right now I'm looking at flow-based programming (FBP) for this: In FBP the control flow is explicit - the program code describes a directed acyclic graph (DAG), so comparing program structure becomes straightforward (subgraph isomorphism with some heuristics). I'm writing a toy FBP language that draws images (https://github.com/mayahq/flatland), with which I aim to test what these models understand.