Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: CodeBuff – smart code formatter (github.com/antlr)
62 points by parrt on Sept 19, 2016 | hide | past | favorite | 21 comments



Very interesting! I have been thinking about something similar, but rather than learning from examples, it would have a generic "misaligned" cost function that would penalise lines which have similar content but in different columns, and minimise this by hillclimbing or similar.

The difficulty is tying it to a particular language's parser and whitespace rules.


If you're interested, Google Research recently published a paper about code formatting via dynamic programming: http://static.googleusercontent.com/media/research.google.co...


Interesting, though it is a framework very similar to previous work using Box combinations. Here, we don't require any work from a language expert. We simply sniff your project, and then make new files look like those. Handling a new language requires no coding.


CodeBuff is also learning from examples. Check out the paper at the SLE conf; it has all the nitty gritty details how it learns the whitespace rules from the example files.


Does this ML model assume that popular code style = correct code style? (had a cursory look)


I think you can take the code and train it on whatever corpus you like.


Yep. no definition of "good style". The tool simply makes new files look like the rest of your project.


Yes. They pick codebases that they think have good style to train the model on. Seems reasonable.


Love this approach. We need more ai applied to problems in writing code. Like an ai parser that auto corrects errors. Anyone know if something like this exists?


Not directly related, but there is a group in the University of Edinburgh that focuses on that (applying ML to source code). Their page is https://mast-group.github.io/


An interesting idea. Not sure anyone is working on that.


How does it deal with languages with significant whitespace? Can it format Python without breaking it?


haven't tried, but if you train it using only correct Python I bet it can only produce correct Python. Have to check this out to be totally sure of the absence of a weird corner case.


A small followup to Jurgen's post. Python's indentation is meaningful so any change to it would mean we changed the program. In that sense, python is not a good target for this tool. Also, as a simple implementation expedient for this version, I assume that '\n' is not significant.


Why do you need a grammar if this is supposed to learn it itself from the code?


The grammar is used to compute the code layout features that CodeBuff learns from: it parses each file and associates spacing and indentation features to trees' contextual features.

Then, when we parse a new file to pretty-print, we parse again with the _same parser_ and the features that were learned are matched to the tree at hand to recover the "right" spacing and indentation features for the given example code.


The hard part of building a code formatter by hand is coding all the formatting rules, not the parsing. All formatters are based upon parsers so that is a constant across them. Creating a grammar from exemplars is still an unsolved problem.


This would be a nice way for getting the code formatter configuration about right when switching to a new editor with an old project.


How does this improve over VSCode's golang formatting (which I think is excellent)?


It's language parametric, so it can work for any language you have a grammar for and a set of example files to train on. In this sense it helps the authors of formatting tools.

For the users for a specific language it could also be an improvement since configuration of a formatter can be done via completely arbitrary code examples. But we have to work a bit further to streamline that use case.


Golang formatting is mostly a solved problem because the community has agreed on whatever gofmt produces.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: