You can dump in 1gb of data (Unsloth supports "raw text training") but whether you'd get good results or a useless model is a different issue. I doubt you'd get a good result unless you combine that with question/answer training as well, assuming that feature is even useful at all for your scenario.
So it would seem the cost really becomes converting/curating the data into a usable format first.