The results on the chart aren't averaged by name; they're averaged by name, by processor, and by processor frequency. We're aware that many Android devices with the same name use different processors, and of the fact that many Android devices are both over- and under-clocked by enthusiasts.
Sorry if this wasn't clear from the preamble on the chart; I'll have to update it.
I'm confused then. The US S3 with the dual-core krait SoC doesn't appear on the list at all but searching for the S3 shows results for a 2 core version, strangely a 3 core version, and the dual-core version doesn't have the expected clock speed of the krait cpu (1.5ghz).
Are these just the quad-core model with cores disabled or something? If so are they included in the average? They appear to be based on a cursory look through the data as 4 core @ 1.4 seems to mostly score 1700-1900 but the displayed average is obviously much lower.
I also don't get the One X results. Its like a 50/50 split of scores of ~1500 or ~600 and the difference seems to be android 4.0.4 vs 4.0.3. No idea what the difference was but almost tripling the score on a minor OS version seems odd.
Sorry but I'm still saying the results here aren't very useful, there is far too much variation in these tests to assign any significance to them.
Not all handsets are included in the chart. The Geekbench Browser has a list of handsets which it uses to build the benchmark chart. This list contains model and processor information and is manually maintained; if a handset isn't in the list, it's not included in the chart. I thought all of the S3 models were included in this list, but apparently I'm wrong. I'll make sure this list is up to date.
Geekbench is built with the NDK (since all of the benchmarks are written in C or C++). There was a bug in Android 4.0.3 and earlier that caused Android to select the ARMv5 libraries instead of the ARMv7 libraries (which caused a massive drop in performance). This was fixed in 4.0.4 which is why there's a huge jump in performance between the two versions on the One X.
Shouldn't results from the One X on 4.0.3 (or even all results from 4.0.3) be excluded from the averages then? In the case of the One X the average is ~1000 but without the ~600 scores in there the average would be much closer to ~1500.
Even leaked versions of the 4.0.4 update to the WWE One X were in demand because of the large performance improvement on benchmarks and smoother interaction. That's actually a true case. Although partially due to things like less 3D in the launcher.
I don't know what was done in this case, but I do think I remember seeing a statement by NVIDIA on early benchmarks at release that they still had optimizations to do. So that could be involved.
It hasn't been unknown in the mobile world for wacky stuff to get done, like having a dual core processor where you almost never use the second core to save batteries as well. So software can make a big difference.
Sorry if this wasn't clear from the preamble on the chart; I'll have to update it.