I skimmed the paper but couldn't find it: What API did they use to write their kernels? I would have guessed SYCL since that's what Intel is pushing for GPU programming but I couldn't find any reference to SYCL in the paper.
Obelisk AGI lab, Astera Institute | Machine Learning Research Engineer | Full-Time | On-site with VISA in Berkeley, CA
General Intelligence laboratory that draws on neuroscience and brain architecture to create new models of adaptive intelligence. The Astera Institute is a non-profit dedicated to developing high leverage technologies that can lead to massive returns for humanity.
Visa: As a non-profit research institute, we are exempt from the H1B visa cap so we are willing and able to hire qualified applicants regardless of your nationality.
Compensation: We pay less than e.g., Meta. We pay more cash than the average early VC-backed startup, but there's no equity, since we're a non-profit.
Our stack:
- Infrastructure: Bare metal servers running Kubernetes and Ray
- Models written in PyTorch but we're experimenting with Jax
- Environment game engine written in C++
We wanted to use ONNX runtime for a "model driver" for MD simulations, where any ML model can be used for molecular dynamics simulations. Problem was it was way too immature. Like ceiling function will only work with single precision in ONNX. But the biggest issue was that we could not take derivatives in ONNX runtime, so any complicated model that uses derivatives inside was a nogo, is that limitation still exist? Do you know if it can take derivatives in training mode now?
> A carefully shaped money-backed lever over the market is absolutely part of reason you never see DisplayPort inputs on consumer TVs, where HDMI group reigns supreme
I believe both Google and Dropbox had a lot of Python code powering their products that they wanted to make faster. I don't think Microsoft has many large 1st party uses of Python. I think they're investing in it largely to gain developer mind-share.
So for Google and Dropbox "use another language" was an option, for Microsoft it's not.
This is cool but following some of the links it seems like there are a lot of immature parts of the ecosystem and things will not "just work". See for example this bug which I found from the blog post:
https://github.com/odsl-team/julia-ml-from-scratch/issues/2
Summarizing, they benchmark some machine learning code that uses KernelAbstractions.jl on different platforms and find:
* AMD GPU is slower than CPU
* Intel GPU doesn't finish / seems to leak memory
* Apple GPU doesn't finish / seems to leak memory
Would also be interesting to compare the benchmarks to hand-written CUDA kernels (both in Julia and C++) to quantify the cost of the KernelAbstractions layer.
Obelisk AGI lab, Astera Institute | Full-time | Site Reliability Engineer / Systems Administrator / DevOps Engineer | hybrid ONSITE VISA Berkeley, CA | https://astera.org/obelisk/ Obelisk is an Artificial General Intelligence laboratory that draws on neuroscience and brain architecture to create new models of adaptive intelligence.
The Astera Institute is a 501c3 non-profit dedicated to developing high leverage technologies that can lead to massive returns for humanity.
Visa: As a non-profit research institute, we are exempt from the H1B visa cap so we are willing and able to hire qualified applicants regardless of your nationality.
Compensation: We pay less than e.g., Meta. We pay more cash than the average early VC-backed startup, but there's no equity, since we're a non-profit.