GGML has a Metal (Apple's GPU interface layer) backend, yes, using MPS (Metal Performance Shaders), which are pre-baked shaders provided by Apple in a way similar to cuDNN. This is probably the most popular method for large-scale inference with modern bleeding-edge models.
There's also Apple CoreML, which is sort of like ONNX in that it provides a limited set of primitives but if you can compile your model into its format, it does good low-power edge inference using custom hardware (Neural Engine).
Apple also provide PyTorch with MPS, as well as a bunch of research libraries for training / development (axlearn, which is built on JAX/XLA, for example).
They also have a custom framework, Accelerate, which provides the usual linear algebra primitives using a custom matrix ISA (AMX), and on top of that, MLX, which is like fancy accelerated numpy with both Metal and AMX backends (and slower CPU backends for NEON and AVX).
Overall, there's a lot you can do with AI on Apple Silicon. Apple are clearly investing heavily in the space and the tools are pretty good.
There's also Apple CoreML, which is sort of like ONNX in that it provides a limited set of primitives but if you can compile your model into its format, it does good low-power edge inference using custom hardware (Neural Engine).
Apple also provide PyTorch with MPS, as well as a bunch of research libraries for training / development (axlearn, which is built on JAX/XLA, for example).
They also have a custom framework, Accelerate, which provides the usual linear algebra primitives using a custom matrix ISA (AMX), and on top of that, MLX, which is like fancy accelerated numpy with both Metal and AMX backends (and slower CPU backends for NEON and AVX).
Overall, there's a lot you can do with AI on Apple Silicon. Apple are clearly investing heavily in the space and the tools are pretty good.