In a forum of CS people I'm surprised this is one of the top opinions. Our field is full of super surprising results like this -- that you don't have to actually understand the text at beyond basic grammar structures to reasonably accurately predict the score a human would give it.
Like this kind of thing should be cool, not insane. I mean wasn't it cool in your AI class when you learned that DFS could play Mario if you structured the search space right?
I came first in English for my school, many moons ago. Leading up to the finals, I regularly finished ahead of the hard core the English essay people, generally to my amusement. My exam essay responses were generally half the length (sometimes even shorter) than the prodigious writers. Although I've an ok vocabulary, I always made sure I made the right choice of word to hit a specific meaning, rather than choosing words with a high syllable count.
I'd find it highly interesting to see what kind of result I'd get using an automated system.
Why?
Because, I once asked a teacher (also an examiner) why I got good grades above the others, and the answer surprised me: my answers were generally unique /refreshingly different, to the point/ not too long and easy to read.
I suspect with this new system, I'd be an average student. It'd also be interesting to find out, several years down the road, if the automated system could be gamed at all -- I suspect it could, and teachers would help students 'maximise' their scores as a result of that.
It seems plausible that, under this system, you would eventually have learned to write longer essays.
To my mind, that would be a school teaching you to be worse.
In fact, throughout the article I kept being surprised by the idea that long is good. When writing, I tend to prefer being brief.
When I hear a result like "software which understands basic grammar structures can predict what grade a human would give an essay" I think my views are roughly:
* 5% - cool, we could make a company that grades essays
* 15% - cool, we could make a company that grades essays and sell our source code to the test-prep industry
* 80% - fascinating, it sounds like the exam designers need to reevaluate what they are trying to measure with essay questions
Whatever we decide to measure, it needs to scale to millions of essay responses each year in a way such that scores are consistent across entire states or countries. With that in mind I'd imagine it's difficult to do much more than grade on grammar and basic semantics.
And if you succeed you will simply be measuring an uninteresting but manageable subset of the problem which will then become in some people's eyes the definition of the problem.
Education is supposed to be about teaching people to think, to give them the tools with which to do it, to be able to evaluate, criticise, invent, etc.
"...that you don't have to actually understand the text at beyond basic grammar structures to reasonably accurately predict the score a human would give it"
That only really shows that the humans they're training on are terrible at grading essays.
This problem is a first class demonstration of the difference between "can we?" and "should we?"
The fact that it's being implemented in society is insane because anyone who is paying attention to the state of AI today already knows how it will go wrong: without reading the article I already guessed that it systematically discriminated against certain demographics. Which was in fact what the article claimed.
It's interesting that it's possible to predict what the scorer would decide, but the moment you actually implement it is when all of the known problems become relevant, and the intellectual wonder must take a backseat to the human problems.
Teaching human-human communication by removing human inputs and having computers decide about quality... call me a skeptic. I feel bad for the students. Essay grading was bad enough before this
Narrowly for grammar however - is even that a good thing? It probably helps scale grammar help to more students, but if those tools became ubiquitous in grading and editing then unique voices would just disappear and a lot of potentially “great writers” might choose different careers because the machines don’t like them
Adding further bias against the underprivileged is not "cool". Implenting this while avoiding publicity or providing a means to publically audit the results is doubly not cool.
It is fine to play with "cool" techniques when you are doing consequence free stuff like playing Mario. When you are creating systems that have significant and long term effects of people's lives a different standard applies.
Like how I felt when I was given low grades for my ugly handwriting. It was stupid to grade it, but it guaranteed that I will never get a top score on any literature class.
This is sort of like discovering the Excel spreadsheet at the heart of a system responsible for handling hundreds of millions of dollars of transactions for your bank.
Yeah, it's cool, but what about your savings account?
Like this kind of thing should be cool, not insane. I mean wasn't it cool in your AI class when you learned that DFS could play Mario if you structured the search space right?