Right.I was thinking about it, you still need batch refill, however, Apple Core ...

		anemll 5 months ago \| parent \| context \| favorite \| on: Run LLMs on Apple Neural Engine (ANE) Right.I was thinking about it, you still need batch refill, however, Apple Core ML tools were failing for attention activations quantization. Long context, pre-fill is still compute bound.