Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Probably, but you might need something more sophisticated than cosine distance. For example, you might take a dataset of business letters, diary entries, and fiction stories and train some classifier on top of the embeddings of each of the three types of text, then run (embeddings --> your classifier) on new text. But at that point you might just want to ask an LLM directly with a prompt like - "Classify the style of the following text as business, personal, or fiction: $YOUR TEXT$"


You may get way more accurate results from relatively small models as well as logits for each class if you ask one question per class instead.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: