They originally chose to use x265 to calibrate the bitrates, possibly something went wrong there and the 'Tiny', 'Big', etc. are somewhat meaningless.
At 'Large' and 'Big' settings of this image -- which are still in much less than 1 bpp bitrates, i.e., below internet image quality -- you can still observe significant differences in the clouds even if balloons are relatively well rendered.
Nothing went wrong there, it's just what you get if you configure an encoder using just some quantization setting and not a visual target. The same will happen if you would encode images with libjpeg quality 50 (and then derive all other bitrates from there). In some cases the image will look OK-ish at that setting, in other cases it will be complete garbage.
JPEG XL is the first codec to have a practical encoder that can be configured by saying "I want the worst visual difference to be X units of just-noticeable-difference". All other encoders are basically configured by saying "I want to use this scaling factor for the quantization tables, and let's hope that the result will look OK".
> All other encoders are basically configured by saying "I want to use this scaling factor for the quantization tables, and let's hope that the result will look OK".
crf in x264/x265 is smarter than that, but it's still a closed-form solution. That's probably easier to work with than optimizing for constant SSIM or whatever, it always takes one pass and those objective metrics are not actually very good.
JPEG XL isn't yet optimised for extremely low bpp. I thought the label for tiny, large and medium etc are sort of misleading without looking at bpp number.
It is a bit like looking at bitrate for Video quality without looking at video resolution.
The labels are indeed not very useful. It would have been better to use bitrates based on the jxl encoder, which has a perceptual-target based setting (--distance), as opposed to setting it based on absolute HEVC quantization settings (as was done here), which for some images causes 'Big' to be great and for others makes 'Big' still kind of low quality.
Part 1 and 2 define the codestream and file format, respectively. They are both finalized at the technical level (the ISO process is still ongoing, but there is no more opportunity for technical changes, the JPEG committee has approved the final draft). So it is ready for use now: the bitstream has been frozen since January, free and open source reference software is available.
Part 3 will describe conformance testing (how to verify that an alternative decoder implementation is in fact correct), and part 4 will just be just a snapshot of the reference software that gets archived by ISO, but for all practical purposes you should just get the most recent git version. Parts 3 and 4 are not at all needed to start using JPEG XL.
I've been watching developments here since FLIF days and I wanted to say thank for you for taking the serious time, effort, and tireless communication to see things through the standards processes. That takes perseverance!
The quality is normalized to x265 q24 setting. I believe this process/setting is either not working for images or something else went wrong there, because the observable quality as well as the bitrates vary from image to image.
Bitrates vary from 0.26 bpp (Nestor/AVIF) to 4+ bpp (205/AVIF) at the finest setting. Nestor at lowest setting is just 0.05 bpp, somewhat unusual for an internet image. A full HD image at 0.05 bpp transfers over average mobile speed in 5 ms and is 12 kB in size. I rather wait for a full 100 ms and get a proper 1 bpp image.
It seems to try really hard to preserve high frequencies, where WebP just gives up. Hopefully it's just a question of tuning the quantisation tables for low bitrate.