You seem to disagree. Here's an interesting study where the researchers used an OpenAI-LLM-based tool to grade student papers and by grading them 10 times in a row, they got vastly different results:
Quote: "The results reveal significant shortcomings: The tool’s numerical grades and qualitative feedback are often random and do not improve even when its suggestions are incorporated."
Thanks for playing.