I believe the authors original point is why not provide the shim support as part of OpenGL ES in the first place? Stick a big red sticker on it saying here be dragons, but it's obviously not an impossible task.
The funny thing is, his shim is actually useful for speeding up code (in theory, this may already be done) on normal OpenGL. (For anything using these interfaces)
Disclaimer: I've dabbled as a driver writer in a past life - but not OpenGL ES.
The problem is a 100% compatibility layer is not necessarily easy nor valuable. The makers of OpenGL ES don't want a lifetime of maintaining someone elses problem. Also there is a line where you cross and you lose hardware acceleration and the mapping breaks down.
Their charter is to make a new lightweight API that meets the needs of device manufacturers and low level app developers. As soon as they adopt 100% compatability at their core or even offering an additional adapation layer they will be taking time and effort from their focus.
In this instance any OpenGL shim is an Apple responsibility as they are the SDK and environment provider. Apple and Videologic need to nut that one out themselves.
As to a shim speeding up code its essentially comes down to any impedance mismatch that may occur between an application writer and the API. This is identical to buffered versus non buffered IO and whose responsibility is it to filter idempotent operations.
When you look at a typical call stack. You'll see an application (potentially caching and filtering state), calling a library shim (potentially caching and filtering state), queuing and batching calls to a device driver (potentially caching and filtering state), dispatching to a management layer (potentially caching and filtering state), and so on, eventually getting to a graphics card processor potentially caching and filtering state and finally to a pipeline or set of functional blocks (which may have some idempotent de-duping as well).
Again how this is communicated to the developer or structured is an issue of the platform provider.
Apple can choose to say we optimize nothing (ie add no fat, waste no extra cycles) its up to you to dispatch minimal state changes, or we optimize a,b & c... - don't repeat this work, but maybe add optimizations for d, e &f... Thats something they need to document and advise on for their platform. Its not part of most standards.
Warm fuzzies for calling us Videologic instead of Imagination or PowerVR. Your description of the layers between an application and execution on the graphics core on iOS is pretty good. There's nothing between driver and hardware though.
As for why OpenGL ES is different to OpenGL, it's documented in myriad places. The resulting API might be bad in many ways, but it was never designed to allow easy porting of OpenGL (at the same generational level). It was designed to be small, efficient and not bloated, to allow for small, less complicated drivers and execution on resource-constrained platforms. It mostly succeeds.
Long live mgl/sgl! The mention about hardware dedupe/filtering was more a hat tip to culling sub pixel triangles and early culling of obscured primitives that seems to happen on many chips these days :)
We tip our hat right back! It happens to be pixel-perfect for us in this context, and it's a large part of why we draw so efficiently. Oh, and I still have a working m3D-based system that plays SGL games under DOS!
There actually are PDFs out there for the various GPU IPs on how to write best for them (Adreno, PowerVR, etc.). Sometimes they even disagree, so using triangle strips with degenerate triangles to connect separate portions can be better than using all separate triangles on another, depending on their optimizations. Apple also has recommendations:
http://developer.apple.com/library/ios/#documentation/3DDraw...
Although I don't recall off hand if any of them have mentioned sorting commands by state and deduping, which I suppose is one of the most basic optimizations for OpenGL * APIs.
> I believe the authors original point is why not provide the shim support as part of OpenGL ES in the first place?
OpenGL ES isn't intended to be the same API as OpenGL, despite the shared "OpenGL" in the name. It was a new API created with the idea that it would be based on the lessons learned from OpenGL, but be completely modern and not bogged down with the need for embedded driver authors to waste time implementing tons of legacy crap calls that nobody in their right mind should have been using for the last 10 years anyways. It uses the opportunity afforded by building a new API for a different target environment from normal OpenGL as an excuse to make all of the breaking changes that everybody would love to make in regular OpenGL if only there weren't so much legacy software that depended on the presence of deprecated, decade out-of-date practices.
That's why OpenGL ES never contained all of the immediate mode cruft from OpenGL, and OpenGL ES 2.0 throws out the fixed-function pipeline altogether.
Why didn't the Kronos group define a shim to begin with? When your goal is to build a new API that throws out all of the shit legacy calls that are a bunch of pain to support for no benefit, what do you gain by then re-implementing all of those shit legacy calls again? Any number of people have built a fake immediate mode on top of OpenGL ES over the years; there's nothing new about what jwz did here. If you really want to write OpenGL ES as if it's 1998's OpenGL, there's nothing stoping you from doing so.
> And OpenGL ES only existed for 5 years before someone came along as was pissed off enough to do it!
Eh, he's hardly the first guy to do this. Appendix D of my copy of Graphics Shaders: Theory and Practice contains a simple reimplementation of Immediate Mode on top of VBOs for people with a burning desire to prototype their code as if it were 1998 again.
And in reality, in most modern OpenGL (non-ES) implementations, the actual hardware-backed bits basically look like the OpenGL ES API, and all of the legacy cruft is implemented in exactly the same kind of software shim.