AFAIK nobody does that. They train on much much shorter text but with use tricks... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Ey7NFZ3P0nzAe 20 days ago \| parent \| context \| favorite \| on: Grok 4 Fast now has 2M context window AFAIK nobody does that. They train on much much shorter text but with use tricks in the position encoding steps that can be extrapolated by the LLMs. Lile ROPE and YARN etc.

ErikBjare 20 days ago [–]

AFAIK (not much) it definitely helps to train on longer sequences even with rope/yarn and is needed if you care about long context performance (and not just the long context capability).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact