"Around five times the size of the notorious Books3 dataset that was used to tra...

miohtama · 2024-12-18T23:42:21 1734565341

Not sure if Shakespeare is going to be useful source for most of the tasks the users want to use LLMs for

a2128 · 2024-12-19T01:56:24 1734573384

Was Teletubbies a useful source for our current jobs? Probably not directly but these things help for general understanding of the world when starting off with zero understanding

defrost · 2024-12-18T23:47:13 1734565633

Not a lot of call to discern a hawk from a heronsaw (or handsaw) anymore, although that does say something about the power of LLM's, transcription errors and judgement.

azinman2 · 2024-12-19T01:53:46 1734573226

My thoughts exactly, except for either creative writing or essays on Shakespeare

ALittleLight · 2024-12-19T02:08:38 1734574118

It seems testable in the sense that you could train a small model, like GPT-2, on the whole dataset, then on the dataset - Shakespeare and compare the loss after training.

ks2048 · 2024-12-19T04:08:21 1734581301

In the long tail of questions people ask, I would think asking about Shakespeare is relatively common (explain a play, explain a character, identify a quote, ...).

gavmor · 2024-12-19T23:33:50 1734651230

Why not? Shakespeare's prose is exceptional, and all in the form of dialog. It seems an exemplary, even ideal source of language to emulate.

bkfunk · 2024-12-19T02:22:00 1734574920

Unless you want to “write a Shakespearean sonnet about Bayes’ rule”. Here’s what Claude sonnet (ahem) came up with:

"Bayes's Revelation"

  What knowledge held before new facts arrive,
  Our prior thoughts, though incomplete they be,
  Must yield and change as evidence does strive
  To shape posterior probability.

  Through multiplicative steps we find our way,
  As likelihood with prior beliefs combine,
  The normalizing constant holds its sway,
  While truth emerges, clearer line by line.

  O Reverend Bayes, thy theorem divine
  Shows how our minds should learn from what we see,
  Each observation helps our thoughts refine,
  Till posterior wisdom sets uncertainty free.

  In statistics' realm, this rule holds fast:
  Each present builds upon what's known from past.

  The sonnet weaves together the key components of Bayes' theorem - prior probability, likelihood, and how they combine to form the posterior probability. Would you like me to explain any of these statistical concepts in more detail?

Although it would be an easier change now that Bard is Gemini…