Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can pass context in as another input, so encoder->decoder cross attention models. People are working on data storage and retrieval as well with transformers (so you would query out to load in different attention vectors).

So you can just view that as a memory bank with paging. You do however need to teach the network the task you want to solve. So you'd have to train it on the math part.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: